blob: 2000f311004ff045e2ebed24b3da55933f2e0e72 [file] [log] [blame]
Denys Vlasenko78011482013-07-25 14:00:37 +02001 Some less-widely known details of TCP connections.
2
3 Properly closing the connection.
4
5After this code sequence:
6
7 sock = socket(AF_INET, SOCK_STREAM, 0);
8 connect(sock, &remote, sizeof(remote));
9 write(sock, buffer, 1000000);
10
11a large block of data is only buffered by kernel, it can't be sent all at once.
12What will happen if we close the socket?
13
14"A host MAY implement a 'half-duplex' TCP close sequence, so that
15 an application that has called close() cannot continue to read
16 data from the connection. If such a host issues a close() call
17 while received data is still pending in TCP, or if new data is
18 received after close() is called, its TCP SHOULD send a RST
19 to show that data was lost."
20
Denys Vlasenkod7ea34e2013-09-17 16:24:01 +020021IOW: if we just close(sock) now, kernel can reset the TCP connection
22(send RST packet).
23
24This is problematic for two reasons: it discards some not-yet sent
25data, and it may be reported as error, not EOF, on peer's side.
Denys Vlasenko78011482013-07-25 14:00:37 +020026
27What can be done about it?
28
29Solution #1: block until sending is done:
30
31 /* When enabled, a close(2) or shutdown(2) will not return until
32 * all queued messages for the socket have been successfully sent
33 * or the linger timeout has been reached.
34 */
35 struct linger {
36 int l_onoff; /* linger active */
Denys Vlasenko982e87f2013-07-30 11:52:58 +020037 int l_linger; /* how many seconds to linger for */
Denys Vlasenko78011482013-07-25 14:00:37 +020038 } linger;
39 linger.l_onoff = 1;
40 linger.l_linger = SOME_NUM;
41 setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
42 close(sock);
43
44Solution #2: tell kernel that you are done sending.
Denys Vlasenkoad546ec2013-07-27 14:35:51 +020045This makes kernel send FIN after all data is written:
Denys Vlasenko78011482013-07-25 14:00:37 +020046
47 shutdown(sock, SHUT_WR);
48 close(sock);
49
Denys Vlasenkoad546ec2013-07-27 14:35:51 +020050However, experiments on Linux 3.9.4 show that kernel can return from
51shutdown() and from close() before all data is sent,
Denys Vlasenkod7ea34e2013-09-17 16:24:01 +020052and if peer sends any data to us after this, kernel still responds with
Denys Vlasenkoad546ec2013-07-27 14:35:51 +020053RST before all our data is sent.
54
55In practice the protocol in use often does not allow peer to send
56such data to us, in which case this solution is acceptable.
57
Denys Vlasenkod7ea34e2013-09-17 16:24:01 +020058Solution #3: if you know that peer is going to close its end after it sees
59our FIN (as EOF), it might be a good idea to perform a read after shutdown().
Denys Vlasenkoad546ec2013-07-27 14:35:51 +020060When read finishes with 0-sized result, we conclude that peer received all
61the data, saw EOF, and closed its end.
62
63However, this incurs small performance penalty (we run for a longer time)
64and requires safeguards (nonblocking reads, timeouts etc) against
65malicious peers which don't close the connection.
66
Denys Vlasenkod7ea34e2013-09-17 16:24:01 +020067Solutions #1 and #2 can be combined:
68
69 /* ...set up struct linger... then: */
70 setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
71 shutdown(sock, SHUT_WR);
72 /* At this point, kernel sent FIN packet, not RST, to the peer, */
73 /* even if there is buffered read data from the peer. */
74 close(sock);
Denys Vlasenko78011482013-07-25 14:00:37 +020075
76 Defeating Nagle.
77
78Method #1: manually control whether partial sends are allowed:
79
80This prevents partially filled packets being sent:
81
82 int state = 1;
83 setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
84
85and this forces last, partially filled packet (if any) to be sent:
86
87 int state = 0;
88 setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
89
90Method #2: make any write to immediately send data, even if it's partial:
91
92 int state = 1;
93 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state));