SEARCH

— 葡萄酒 | 威士忌 | 白兰地 | 啤酒 —

Too Simple! This Example Explains TCP Protocol in Detail—Something You Probably Didn’t Know

BLOG 610

TCP is one of the core protocols in the TCP/IP protocol suite. It uses the network layer IP protocol and provides support for application layer protocols such as HTTP, FTP, SMTP, POP3, SSH, and Telnet.

Today, let’s delve into the TCP protocol, detailing how it establishes and ends network conversations through the three-way handshake and four-way handshake processes. We’ll also use the analogy of sending a letter to explain how TCP works and ensures accurate message reception.

 

TCP Protocol in Detail

TCP Protocol in Detail

Today’s Article Reading Bonus: “TCP Protocol Detailed Explanation and Practical Analysis”

Send me a private message with the code “TCP” to get this exclusive material.

1 Introduction to TCP

The Transmission Control Protocol (TCP) is a connection-oriented, reliable, and byte-stream-based transport layer communication protocol, defined by RFC793 of the Internet Engineering Task Force (IETF). In the simplified OSI model of computer networks, it fulfills the functions specified by the transport layer.

01 What is Connection-Oriented?

Connection-oriented is in contrast to another transport layer protocol, UDP (User Datagram Protocol). TCP must first go through a three-way handshake to establish a connection before transmitting data, and it sends messages through this connection one-to-one. After data transmission, it disconnects through a four-way handshake.

UDP, on the other hand, is connectionless. The sender does not need to establish a connection with the receiver before sending data; data can be transmitted immediately. Each UDP packet is independent and unrelated to others, allowing UDP to send messages one-to-one, one-to-many, or many-to-many.

02 What is a Reliable Communication Protocol?

Reliability is also in contrast to UDP. TCP has mechanisms like the three-way handshake and timeout retransmission to ensure reliable data transmission. After sending a data packet, the sender waits for the receiver to send an acknowledgment (ACK) message.

If the sender does not receive an acknowledgment within a certain time, it assumes the data is lost and retransmits it. When the receiver receives duplicate packets, it sends redundant ACK messages to notify the sender, preventing data loss.

TCP also provides flow control and congestion control to maintain network stability and performance. Therefore, as long as the host is not down, a packet can reach the target host regardless of network changes.

In contrast to TCP’s reliable transmission, UDP is unreliable. UDP packets do not provide mechanisms like acknowledgment, retransmission, flow control, or congestion control, so they may be lost, duplicated, out of order, or corrupted.

03 What is Byte-Stream-Oriented?

TCP is byte-stream-oriented. Although the interaction between the application and TCP is in chunks of varying sizes, TCP views the application as a continuous stream of unstructured bytes. TCP has a buffer; if the application sends a data chunk that is too long, TCP can break it into shorter segments before transmitting. If the application sends one byte at a time, TCP can wait until enough bytes are accumulated to form a segment and send it out.

In contrast to byte-stream-oriented, UDP is message-oriented. UDP neither merges nor splits the messages passed down from the application layer; it retains the boundaries of these messages. Whatever length the application layer gives to UDP, UDP sends it as is, one message at a time. Therefore, the application must choose an appropriate message size. If the message is too long, the IP layer needs to fragment it, reducing efficiency. If it’s too short, the IP packet will be too small.

2 TCP Packet Format

Understanding the packet format is essential to grasp a communication protocol. A TCP packet consists of a TCP header and application data, with the TCP header being the core of the TCP protocol, and the application data part being the payload of the TCP packet, as shown below.

01 Source Port and Destination Port:

Each is 16 bits long, or 2 bytes, indicating the port number used by the sending application and the port number expected by the receiving application. Their lengths explain why the range of computer ports is 1-65535 (0 is not used, 2^16=65536, the maximum bit 65536 is not used). With the source port and destination port, plus the source IP and destination IP in the IP header, a connection can be uniquely determined.

02 Sequence Number (Sequence Number):

32 bits long, indicating the range of sequence numbers is [0, 2^32-1], or [0, 4294967295]. When the sequence number increases to 4294967295, the next sequence number will return to 0 and start again.

A random number generated by the computer is used as its initial value (ISN, Initial Sequence Number) during connection establishment, transmitted to the receiving host via a SYN packet. Each time data is sent, the size of the “data byte count” is incremented. The sequence number is used to solve the problem of network packet disorder and to achieve reliable data transmission and flow control.

03 Acknowledgment Number (Acknowledgment Number):

32 bits long, only valid when the ACK flag is set. It indicates the sequence number of the next byte expected (so this field is usually the sequence number of the last successfully received data byte plus 1), used to confirm the successful reception of data. After the TCP connection is established, the range of the acknowledgment number is usually a relative offset from the initial sequence number (ISN).

If the initial value of ISN is X, then the range of the acknowledgment number is [X+1, X+1+N-1], where N represents the number of bytes successfully received. After receiving this acknowledgment, the sender can assume that all data before this sequence number has been successfully received. The range of the acknowledgment number is [0, 2^32-1], or [0, 4294967295].

04 Data Offset (Data Offset):

4 bits long, indicating how far the “data” start of the TCP packet is from the start of the TCP packet, calculated in 4-byte units. Without options, this value is 5, or 20 bytes; the maximum integer that 4 bits can represent is 15, which means the data start position in the TCP packet is 60 bytes (4*15) from the packet start. This indicates that the TCP header length is 20-60 bytes.

05 Reserved (Reserved):

3 bits long, reserved for future use, currently should be set to zero.

06 Control Flags (Flags):

9 bits long, used to control and manage TCP connections. Each control flag is explained as follows:

NS (Nonce Sum): Used to support a TCP extension mechanism called ECN-nonce, which increases the security of congestion control and prevents congestion control information from being maliciously tampered with.

CWR (Congestion Window Reduced): Used to indicate that the sender should reduce the size of the congestion window (Congestion Window). The CWR flag is usually used with congestion control mechanisms to deal with network congestion.

ECE (ECN-Echo): The ECE flag is set to indicate that the sender supports the Explicit Congestion Notification (ECN) mechanism and requests the receiver to notify it about network congestion. After receiving a TCP segment with the ECE flag set, if the network is congested, the receiver can set the ECN-Echo flag in the reply TCP segment as a response. By using the ECE flag and ECN-Echo reply, the sender and receiver of the TCP connection can coordinate congestion control to improve network performance and stability.

URG (Urgent): Indicates that the segment contains urgent data. When URG=1, it indicates the urgent mode is turned on, notifying the receiver to pay special attention to the processing of urgent data. The URG flag is set together with the urgent pointer field (Urgent Pointer).

ACK (Acknowledgment): Indicates that the acknowledgment number field is valid. The acknowledgment number field is only valid when ACK=1, and invalid when ACK=0. TCP stipulates that after the connection is established, all transmitted segments must set ACK to 1.

PSH (Push): Indicates that the receiver should immediately push the data to the application, rather than waiting for the buffer to fill up. When two application processes are communicating interactively, sometimes one application process hopes to receive a response from the other immediately after typing a command. In this case, TCP can use the push operation. At this time, the sender TCP sets PSH to 1 and immediately creates a segment to send out. The receiver TCP receives the segment with PSH=1 and delivers it to the receiving application process as soon as possible (i.e., “push” forward), without waiting for the entire buffer to fill up before delivering it.

RST (Reset): Used to reset the connection, interrupting the current communication. When RST=1, it indicates that an abnormality (such as a host crash or other reasons) has occurred in the TCP connection, and the connection must be forcibly disconnected, then re-established for transmission. The RST flag is also used to reject an illegal segment or refuse to open a connection.

SYN (Synchronize): Used to establish a connection, initiating a connection request. It is used to synchronize sequence numbers during connection establishment. When SYN=1 and ACK=0, it indicates that this is a connection request segment. If the other party agrees to establish a connection, it should set SYN=1 and ACK=1 in the response segment. Therefore, SYN set to 1 indicates a connection request or connection acceptance segment.

FIN (Finish): Used to close the connection, requesting termination of the connection. When FIN=1, it indicates that the sender has no more data to transmit and requests to release the connection.

07 Window Size (Window Size):

16 bits long, indicating the receiver’s receive window size, used for flow control, with the maximum window size being 2^16-1=65535=64k. This is an early design, and for current network applications, it may not be enough, so a window expansion option can be added in the options to transmit more data. The window refers to the receive window of the party sending this segment (not its own send window).

The window value tells the other party: from the acknowledgment number in the header of this segment, the amount of data the receiver currently allows the other party to send (in bytes). This restriction is necessary because the receiver’s data buffer space is limited. In summary, the window value serves as the basis for the receiver to set the sender’s send window.

08 Checksum (Checksum):

16 bits long, used to detect whether errors have occurred in the TCP segment during transmission. The checksum calculation includes the header and data.

09 Urgent Pointer (Urgent Pointer):

16 bits long, only valid when the URG flag is set. It indicates the number of bytes of urgent data in this segment (after the urgent data, normal data follows).

Therefore, the urgent pointer points to the end of the urgent data in the segment. When all urgent data is processed, TCP tells the application to return to normal operation. It is worth noting that urgent data can still be sent even when the window is 0.

10 Options (Options):

Optional field, variable length, up to 40 bytes. When no “options” are used, the TCP header length is 20 bytes. The options field is used to provide additional functions and control, with each option starting with a 1-byte kind field, indicating the type of option. Some common options are as follows:

Maximum Segment Size (Maximum Segment Size, MSS): Occupies 4 bytes, usually specified in the packet with the SYN flag set during connection creation, indicating the maximum length of the segment that this end can receive. Usually, MSS is set to (MTU-40) bytes, so the length of the IP datagram carrying the TCP segment will not exceed MTU (MTU maximum length is 1518 bytes, minimum is 64 bytes), thereby avoiding IP fragmentation on the local machine. It can only appear in the synchronization segment, otherwise, it will be ignored.

Window Scale Factor (Window Scale Factor): Occupies 3 bytes, with values ranging from 0-14. Used to shift the value of the TCP window to the left, multiplying the window value. It can only appear in the synchronization segment, otherwise, it will be ignored. This is because the length of the TCP receive data buffer (receive window) is usually greater than 65535 bytes.

Timestamp Option (TCP Timestamps Option, TSopt): Occupies 10 bytes, with the main fields being the timestamp field (Timestamp Value field, TSval, 4 bytes) and the timestamp echo reply field (Timestamp Echo Reply field, TSecr, 4 bytes). The timestamp option allows both ends of the communication to include timestamp values in the TCP segment for some time-related operations and calculations.

Authentication Option (TCP Authentication Option, TCP Option): Used to provide data integrity and authentication functions. This option is used to protect the TCP segment, preventing data tampering and unauthorized access.

3 Address Resolution for Packet Transmission

In the article “Detailed Analysis of the IP Protocol,” we introduced the “source address” and “destination address” in the IP header, which, together with the “source port” and “destination port” in the TCP header, determine the address required during packet transmission, as shown below.

Analogy to Sending Letters in Daily Work

The letter in the envelope is equivalent to the data to be transmitted. The standard letter format requires writing the “recipient’s address” and “sender’s address” on the envelope, corresponding to the IP address. The “recipient’s address” corresponds to the “destination IP address” in the IP header of the packet, and the “sender’s address” corresponds to the “source IP address” in the IP header of the packet. Writing these two addresses ensures that the letter can be mailed to its destination.

But Who Receives the Letter After It Reaches the Destination Address?

From the recipient’s address of the above letter, we find that the address is located in the “Department B of Company A” in Zhangjiang, Pudong New Area, Shanghai. This department may have hundreds or thousands of people, and the recipient is not clear. Even if the letter is delivered to this address, it cannot be delivered to the specific recipient.

Therefore, mail letters need to fill in the combination of “recipient’s name,” “recipient’s address,” “sender’s name,” and “sender’s address” to ensure that the letter can be accurately delivered to the specific recipient. The recipient’s name here is equivalent to the destination port in the TCP header, and the sender’s name is equivalent to the source port in the TCP header.

Comparing the Transmission of Letters, Let’s Look at an Example of Network Packet Transmission Process

Li Si in Beijing (computer IP address: 106.54.28.25) sends a message to Zhang San in Shanghai (computer IP address: 114.92.67.193) via QQ (port: 80), as shown below:

First, Li Si’s computer packages the message into a TCP datagram, adds the IP header and Ethernet header to form a network packet, and sends it into the computer network. The computer network accurately delivers the packet to Zhang San’s computer through the destination IP address (114.92.67.193) in the IP header of the packet.

After Zhang San’s computer receives the packet sent by Li Si’s computer, since Zhang San’s computer is running multiple programs (such as QQ, WeChat, Foxmail, etc.), although Zhang San’s computer knows that this packet is transmitted to it, it does not know which program should handle the data in the packet.

To address this issue, the source port and destination port in the TCP header of the packet are used to determine the application and send and receive data according to different programs using different port numbers. This way, the packet can be accurately delivered to the specified program on the specific computer, just like mailing a letter. For example, if we specify that QQ, WeChat, and Foxmail on Zhang San’s computer use ports 80, 8900, and 110 respectively, then when a packet with a destination port of 80 is received, it is transmitted to QQ.

This example can also lead to the role of other fields in the packet structure. For example, after receiving a letter, we can simply check whether the envelope is intact to determine if the letter has been opened and tampered with during transmission. For network packets, the “checksum” (Checksum) in the TCP header can verify whether the received packet data has been tampered with during transmission.

04 Why Does TCP Need to Establish a TCP Connection?

First, the IP protocol is connectionless; IP does not maintain any state information about subsequent datagrams, and each datagram is processed independently. The advantage of this connectionless approach is that it does not occupy lines, reducing the requirements for network lines. Additionally, the IP protocol is unreliable; it cannot guarantee that IP datagrams will successfully reach their destination. It is a best-effort delivery service, and routers handle errors in IP packets by dropping them and sending ICMP (Internet Control Message Protocol) control messages to the source address.

Because the IP protocol is connectionless and unreliable, the upper-layer TCP is needed to establish connections and retransmit errors, achieving a connection-oriented, reliable, and byte-stream-based transport layer communication protocol.

 

The prev: The next:

Related recommendations

Expand more!

Mo