Some of the primary issues regarding the transport layer are listed in the following picture.
In Internet, there are two dominant transport layer protocols. One is User Datagram Protocol (UDP), an unreliable service that can only implement multiplexing/demultiplexing as well as bit error control (checksum). We are going to focus on the other one, Transmission Control Protocol (TCP), which is a reliable service that can implement effective errror control, flow control and congestion control.
由于 TCP 要提供比 UDP 更多的服务,故 TCP 段头的内容也要比 UDP 段头更丰富。
TCP 协议是面向连接的(connection-oriented)。一个 TCP 连接由两个端口(IP addr + port #)共同决定。同一时间,一个 TCP 端口只能被一个进程独享,但一个进程可以用同一 TCP 端口建立多个连接。TCP 连接的建立和释放采用 three-way handshaking 的方式,状态图如下(引自《TCP/IP详解卷》):
The TCP reliable data transfer adopts a hybrid of Go-Back-N and Selective Repeat, and is based on cumulative and piggyback ACKs as well as a single retransmission timer. TCP retransmissions are triggered by either timeout events, or 3 duplicate ACKs (fast retransmission), and everytime only ONE segment will be retransmitted. The TCP timeout value is caculated dynamically according to the following Jacobson‘s Algorithm:
$\text{RTT}=7/8\cdot\text{RTT}+1/8\cdot \text{measure}$
$\text{RTTVAR}=3/4\cdot \text{RTTVAR}+1/4\cdot|\text{ measure}-\text{RTT }|$
$\text{RTO}=\text{RTT}+4\cdot \text{RTTVAR}$
One should note that, according to Karn‘s Algorithm, RTT and RTTVAR are not updated when a timeout and retransmission occurs, and every time a timeout and retransmission occurs the value of RTO will be doubled (until the segment can get through).
As regards the TCP flow control, a receiver can only maintain a single buffer pool shared by all connections, and it will advertise to a sender its current window size in the TCP header. A sender who has got a zero window size announcement will not send another segment unless it is urgent data or a request for another window size announcement.
Rano is a well-known TCP congestion control algorithm derived from Tahoe, which adjusts the congestion window size conforming to AIMD (Additive Increase, Multiplicative Decrease):
(1) initially, a threshold is set as 64KB, and congWin grows from zero;
(2) when congWin<threshold, the sender stays in slow-start phase, and the window grows exponentially;
(3) when congWin>threshold, the sender stays in congestion-avoidance phase, and the window grows linearly;
(4) when timeout occurs, threshold will be set to congWin/2, congWin will be set to 1 MSS, and the sender returns to slow-start phase after a fast retransmission;
(5) when a triple duplicate ACK occurs, similar to a timeout event, but the discrepancy is that both threshold and congWin will be set to congWin/2, which is called fast recovery.
A substitue for Rano is TCP Westwood, which estimates available bandwidth according to the arrival rate of ACKs, and use the estimated bandwidth to set threshold when a timeout occurs.
References:
1. Kurose, James F., Keith W. Ross. Computer Networking: a top-down approach[M]. 北京:高等教育出版社, 2009-08
2. Tanenbaum, Andrew S., David J. Wetherall. Computer Networks 5th edition[M]. 北京:清华大学出版社, 2011