Network | TCP

Transmission Control Protocol, TCP是一种面向连接的、可靠的、基于字节流的传输层通信协议.

应用层向TCP层发送用于网间传输的、用8位字节表示的数据流,然后TCP把数据流分区成适当长度的报文段(通常受该计算机连接的网络的数据链路层的最大传输单元(MTU:Maximum
Transmission
Unit)的限制)。之后TCP把结果包传给IP层,由它来通过网络将包传送给接收端实体的TCP层。TCP为了保证不发生丢包,就给每个包一个序号,同时序号也保证了传送到接收端实体的包的按序接收。然后接收端实体对已成功收到的包发回一个相应的确认(ACK);如果发送端实体在合理的往返时延(RTT:Round-Trip
Time)内未收到确认,那么对应的数据包就被假设为已丢失将会被进行重传。TCP用一个校验和函数来检验数据是否有错误;在发送和接收时都要计算校验和。

TCP连接包括三个状态:连接创建、数据传送和连接终止。

MSS

The maximum segment size (MSS) is the largest amount of data, specified in
bytes, that TCP is willing to receive in a single segment. For best performance,
the MSS should be set small enough to avoid IP fragmentation, which can lead to
packet loss and excessive retransmissions. To try to accomplish this, typically
the MSS is announced by each side using the MSS option when the TCP connection
is established, in which case it is derived from the maximum transmission unit
(MTU) size of the data link layer of the networks to which the sender and
receiver are directly attached.

MSS announcement is also often called "MSS negotiation". Strictly speaking,
the MSS is not "negotiated" between the originator and the receiver, because
that would imply that both originator and receiver will negotiate and agree upon
a single, unified MSS that applies to all communication in both directions of
the connection. In fact, two completely
independent values of MSS are permitted for the two directions of data
flow in a TCP connection.

Connection establishment连接创建

TCP用三路握手(three-way
handshake)过程创建一个连接。在连接创建过程中,很多参数要被初始化,例如序号被初始化以保证按序传输和连接的强壮性。

一对终端同时初始化一个它们之间的连接是可能的。但通常是由一端打开一个套接字(socket)然后监听来自另一方的连接,这就是通常所指的被动打开(passive
open)。服务器端被被动打开以后,用户端就能开始创建主动打开(active open)。

  1. SYN: The active open is performed by the client sending a SYN to the
    server. The client sets the segment‘s sequence number to a random value
    A.

  2. SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment
    number is set to one more than the received sequence number i.e. A+1,
    and the sequence number that the server chooses for the packet is another
    random number, B.

  3. ACK: Finally, the client sends an ACK back to the server. The sequence
    number is set to the received acknowledgement value i.e. A+1, and the
    acknowledgement number is set to one more than the received sequence number
    i.e. B+1.

Data transfer数据传输

在TCP的数据传送状态,很多重要的机制保证了TCP的可靠性和强壮性。它们包括:使用序号,对收到的TCP报文段进行排序以及检测重复的数据;使用校验和来检测报文段的错误;使用确认和计时器来检测和纠正丢包或延时。

在TCP的连接创建状态,两个主机的TCP层间要交换初始序号(ISN:initial sequence
number)。这些序号用于标识字节流中的数据,并且还是对应用层的数据字节进行记数的整数。通常在每个TCP报文段中都有一对序号和确认号。TCP报文发送者认为自己的字节编号为序号,而认为接收者的字节编号为确认号。TCP报文的接收者为了确保可靠性,在接收到一定数量的连续字节流后才发送确认。这是对TCP的一种扩展,通常称为选择确认(Selective
Acknowledgement)。选择确认使得TCP接收者可以对乱序到达的数据块进行确认。每一个字节传输过后,ISN号都会递增1。

通过使用序号和确认号,TCP层可以把收到的报文段中的字节按正确的顺序交付给应用层。序号是32位的无符号数,在它增大到232-1时,便会回绕到0。对于ISN的选择是TCP中关键的一个操作,它可以确保强壮性和安全性。

TCP数据传输不同于UDP之处

  1. Ordered data transfer — the destination host rearranges according to
    sequence number

  2. Retransmission of lost packets — any cumulative stream not acknowledged is
    retransmitted

  3. Error-free data transfer

  4. Flow control — limits the rate a sender transfers data to guarantee
    reliable delivery. The receiver continually hints the sender on how much data
    can be received (controlled by the sliding window). When the receiving host‘s
    buffer fills, the next acknowledgment contains a 0 in the window size, to stop
    transfer and allow the data in the buffer to be processed.

  5. Congestion control

Connection termination通路的终结

连接终止使用了四路握手过程(four-way
handshake),在这个过程中每个终端的连接都能独立地被终止。因此,一个典型的拆接过程需要每个终端都提供一对FIN和ACK。

端口

TCP使用了端口号(Port
number)的概念来标识发送方和接收方的应用层。对每个TCP连接的一端都有一个相关的16位的无符号端口号分配给它们。

Port numbers are categorized into three basic categories: well-known,
registered, and dynamic/private. The well-known ports are assigned by the
Internet Assigned Numbers Authority (IANA) and are typically used by
system-level or root processes. Well-known applications running as servers and
passively listening for connections typically use these ports. Some examples
include: FTP (20 and 21), SSH (22), TELNET (23), SMTP (25), SSL (443) and HTTP
(80). Registered ports are typically used by end user applications as ephemeral
source ports when contacting servers, but they can also identify named services
that have been registered by a third party. Dynamic/private ports can also be
used by end user applications, but are less commonly so. Dynamic/private ports
do not contain any meaning outside of any particular TCP connection.

Flow control流量控制

TCP uses an end-to-end flow control protocol to avoid having the sender send
data too fast for the TCP receiver to receive and process it reliably. Having a
mechanism for flow control is essential in an environment where machines of
diverse network speeds communicate. For example, if a PC sends data to a
smartphone that is slowly processing received data, the smartphone must regulate
the data flow so as not to be overwhelmed.

TCP uses a sliding window flow control protocol. In each TCP segment, the
receiver specifies in the receive window field the amount of additionally
received data (in bytes) that it is willing to buffer for the connection. The
sending host can send only up to that amount of data before it must wait for an
acknowledgment and window update from the receiving host.

If a receiver is processing incoming data in small increments, it may
repeatedly advertise a small receive window. This is referred to as the silly
window syndrome, since it is inefficient to send only a few bytes of data in a
TCP segment, given the relatively large overhead of the TCP header.

Congestion control拥塞控制

Modern implementations of TCP contain four intertwined algorithms:
Slow-start, congestion avoidance, fast retransmit, and fast recovery.

总共只有两种模式:Slow-start, congestion avoidance.

Basic slow-start

The algorithm begins in the exponential growth phase initially with a
Congestion Window Size (CWND) of 1, 2 or 10 segments and increases it by one
Segment Size (SS) for each new ACK received. If the receiver sends an ACK for
every segment, this behavior effectively doubles the window size each round trip
of the network. If the receiver supports delayed ACKs, the rate of increase is
lower, but still increases by a minimum of one MSS each round-trip time. This
behavior continues until the congestion window size (CWND) reaches the size of
the receiver‘s advertised window or until a loss occurs.

When a loss occurs, half of the current
CWND is saved as a Slow Start Threshold (SSThresh) and slow start begins
again from its initial CWND. Once the CWND reaches the SSThresh,
TCP goes into congestion avoidance mode where each new ACK increases the
CWND by SS × SS / CWND. This results in a linear increase of the CWND.

慢启动->loss occur->set ssthresh -> 慢启动->congestion
avoidance,线性增

通过half threshold来实现乘性减。

Fast recovery

There is a variation to the
slow-start algorithm known as Fast Recovery, which uses fast retransmit followed
by Congestion Avoidance. In the Fast Recovery algorithm, during Congestion
Avoidance mode, when packets (detected through 3 duplicate ACKs) are not
received, the congestion window size is reduced to the slow-start threshold,
rather than the smaller initial value.

Fast Recovery也是一种慢启动->loss occur->set ssthresh

这个快一点,就是直接half。

congestion avoidance

When the congestion window exceeds SSThresh the algorithm enters a
new state, called congestion avoidance.

Transmission Control Protocol (TCP) uses a network congestion-avoidance
algorithm that includes various aspects of an additive
increase/multiplicative decrease (AIMD) scheme, with other schemes
such as slow-start to achieve congestion avoidance.

AIMD有许多变种实现。

As long as non-duplicate ACKs are received, the congestion window is
additively increased by one MSS every round trip time. When a packet is lost,
the likelihood of duplicate ACKs being received is very high (it‘s possible
though unlikely that the stream just underwent extreme packet reordering, which
would also prompt duplicate ACKs). The behavior of Tahoe and Reno differ in how
they detect and react to packet loss:

Tahoe: Triple duplicate ACKS are treated
the same as a timeout. Tahoe will perform "fast retransmit", set the slow
start threshold to half the current congestion window, reduce congestion window
to 1 MSS, and reset to slow-start state. (同Basic slow-start)
Reno: If three
duplicate ACKs are received (i.e., four ACKs acknowledging the same packet,
which are not piggybacked on data, and do not change the receiver‘s advertised
window), Reno will halve the congestion
window (instead of setting it to 1 MSS like Tahoe), set the slow start
threshold equal to the new congestion window, perform a fast retransmit, and
enter a phase called Fast Recovery. If an
ACK times out, slow start is used as it is with Tahoe.
Fast
Recovery. (Reno Only) In this state,
TCP retransmits the missing packet that was signaled by three duplicate ACKs,
and waits for an acknowledgment of the entire transmit window before returning
to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a
timeout and enters the slow-start state.

Both algorithms reduce congestion window to 1 MSS on a timeout event.

这两种方式的区别在于怎么处理loss。slow start是一种状态,fast recovery是Reno在处理loss时的策略。

时间: 2024-10-12 12:46:56

Network | TCP的相关文章

Optimizing Linux network TCP/IP kernel parameters

You can verify the Linux networking kernel parms from the root user with these commands::Many Oracle professionals do not note the required setting for optimizing Oracle*Net on Oracle 10g release 2.  Here is a review of the suggested TCP/IP buffer pa

Network | TCP congestion control

拥塞控制算法:1. 加性增.乘性减:2. 慢启动:3. 对超时事件作出反应: 整体过程如下: 慢启动->到达阈值->加性增(窗口+1个MSS), 这个阶段叫拥塞避免(CA)->3个冗余ack丢包(事件)->阈值和窗口都缩小为一半(乘性减), 然后加性增(CA)->快速重传,并等待确认,这个阶段叫快速恢复,如果没有确认,就当作超时事件处理了: ->超时(事件)->快速重传->tcp会重新回为原始状态,进入慢启动: 在发生丢包和超时时,tcp都会执行快速重传:

Congestion Avoidance in TCP

Congestion Avoidance in TCP Consequence of lack of congestion control When a popular resource is shared without regulation the result is always over-utilization With the introduction of TCP in 1983, users can write networking applications that requir

socket 学习笔记

#include <sys/socket.h> --------------------------------------------------------------------------------- 1. int socket(int domain, int type, int protocol) socket: return fd domain: AF_INET(IPv4), AF_INET6(IPv6) type: SOCK_DGRAM(udp), SOCK_STREAM(tc

计算机琐碎

1.WinXP 常用的网络命令:netstat(端口state).ipconfig(ip,dhcp,dns)如ipconfig /flushdns或 ipconfig /displaydns.nslookup(dns).ping(icmp).tracert(icmp).route(路由表), arp查看ip和mac地址,telnet或ssh进行远程登录,net start 显示系统正在运行的服务,如dhcp client.dns client.Plug and Play, 如文件服务的ftp等,

路由及路由器工作原理深入解析3:路由与端口

日志"路由及路由器工作原理深入解析1"http://user.qzone.qq.com/2756567163/blog/1438322342介绍了"为什么要使用路由器"和"TCP/IP V4 协议网络的分段原理"2个问题,日志"路由及路由器工作原理深入解析2"http://user.qzone.qq.com/2756567163/blog/1438329517介绍了路由的工作原理,并以一个具体实例的实现深入剖析了路由的实现过程

Mysql多个端口设置

一个Mysql(5.5版本)设置多个端口运行多个实例.搞了一天,终于弄好,先备忘一下! 一.设置mysqld_multi 复制一份my.cnf,重命名为my_multi.cnf 修改my_multi.cnf,主要配置项目如下 [mysqld_multi] mysqld     = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user       = root password   = 123 #默认的3306端口 [mysqld6]

路由及路由器工作原理深入解析3:路由与port

日志"路由及路由器工作原理深入解析1"http://user.qzone.qq.com/2756567163/blog/1438322342介绍了"为什么要使用路由器"和"TCP/IP V4 协议网络的分段原理"2个问题.日志"路由及路由器工作原理深入解析2"http://user.qzone.qq.com/2756567163/blog/1438329517介绍了路由的工作原理.并以一个详细实例的实现深入剖析了路由的实现过程

sysctl.conf

原文链接:https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php How To: Network / TCP / UDP Tuning This is a very basic step by step description of how to improve the performance networking (TCP & UDP) on Linux 2.4+ for high-bandwidth applications. Th