Denys Vlasenko | 7801148 | 2013-07-25 14:00:37 +0200 | [diff] [blame] | 1 | Some less-widely known details of TCP connections. |
| 2 | |
| 3 | Properly closing the connection. |
| 4 | |
| 5 | After this code sequence: |
| 6 | |
| 7 | sock = socket(AF_INET, SOCK_STREAM, 0); |
| 8 | connect(sock, &remote, sizeof(remote)); |
| 9 | write(sock, buffer, 1000000); |
| 10 | |
| 11 | a large block of data is only buffered by kernel, it can't be sent all at once. |
| 12 | What will happen if we close the socket? |
| 13 | |
| 14 | "A host MAY implement a 'half-duplex' TCP close sequence, so that |
| 15 | an application that has called close() cannot continue to read |
| 16 | data from the connection. If such a host issues a close() call |
| 17 | while received data is still pending in TCP, or if new data is |
| 18 | received after close() is called, its TCP SHOULD send a RST |
| 19 | to show that data was lost." |
| 20 | |
Denys Vlasenko | d7ea34e | 2013-09-17 16:24:01 +0200 | [diff] [blame] | 21 | IOW: if we just close(sock) now, kernel can reset the TCP connection |
| 22 | (send RST packet). |
| 23 | |
| 24 | This is problematic for two reasons: it discards some not-yet sent |
| 25 | data, and it may be reported as error, not EOF, on peer's side. |
Denys Vlasenko | 7801148 | 2013-07-25 14:00:37 +0200 | [diff] [blame] | 26 | |
| 27 | What can be done about it? |
| 28 | |
| 29 | Solution #1: block until sending is done: |
| 30 | |
| 31 | /* When enabled, a close(2) or shutdown(2) will not return until |
| 32 | * all queued messages for the socket have been successfully sent |
| 33 | * or the linger timeout has been reached. |
| 34 | */ |
| 35 | struct linger { |
| 36 | int l_onoff; /* linger active */ |
Denys Vlasenko | 982e87f | 2013-07-30 11:52:58 +0200 | [diff] [blame] | 37 | int l_linger; /* how many seconds to linger for */ |
Denys Vlasenko | 7801148 | 2013-07-25 14:00:37 +0200 | [diff] [blame] | 38 | } linger; |
| 39 | linger.l_onoff = 1; |
| 40 | linger.l_linger = SOME_NUM; |
| 41 | setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger)); |
| 42 | close(sock); |
| 43 | |
| 44 | Solution #2: tell kernel that you are done sending. |
Denys Vlasenko | ad546ec | 2013-07-27 14:35:51 +0200 | [diff] [blame] | 45 | This makes kernel send FIN after all data is written: |
Denys Vlasenko | 7801148 | 2013-07-25 14:00:37 +0200 | [diff] [blame] | 46 | |
| 47 | shutdown(sock, SHUT_WR); |
| 48 | close(sock); |
| 49 | |
Denys Vlasenko | ad546ec | 2013-07-27 14:35:51 +0200 | [diff] [blame] | 50 | However, experiments on Linux 3.9.4 show that kernel can return from |
| 51 | shutdown() and from close() before all data is sent, |
Denys Vlasenko | d7ea34e | 2013-09-17 16:24:01 +0200 | [diff] [blame] | 52 | and if peer sends any data to us after this, kernel still responds with |
Denys Vlasenko | ad546ec | 2013-07-27 14:35:51 +0200 | [diff] [blame] | 53 | RST before all our data is sent. |
| 54 | |
| 55 | In practice the protocol in use often does not allow peer to send |
| 56 | such data to us, in which case this solution is acceptable. |
| 57 | |
Denys Vlasenko | d7ea34e | 2013-09-17 16:24:01 +0200 | [diff] [blame] | 58 | Solution #3: if you know that peer is going to close its end after it sees |
| 59 | our FIN (as EOF), it might be a good idea to perform a read after shutdown(). |
Denys Vlasenko | ad546ec | 2013-07-27 14:35:51 +0200 | [diff] [blame] | 60 | When read finishes with 0-sized result, we conclude that peer received all |
| 61 | the data, saw EOF, and closed its end. |
| 62 | |
| 63 | However, this incurs small performance penalty (we run for a longer time) |
| 64 | and requires safeguards (nonblocking reads, timeouts etc) against |
| 65 | malicious peers which don't close the connection. |
| 66 | |
Denys Vlasenko | d7ea34e | 2013-09-17 16:24:01 +0200 | [diff] [blame] | 67 | Solutions #1 and #2 can be combined: |
| 68 | |
| 69 | /* ...set up struct linger... then: */ |
| 70 | setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger)); |
| 71 | shutdown(sock, SHUT_WR); |
| 72 | /* At this point, kernel sent FIN packet, not RST, to the peer, */ |
| 73 | /* even if there is buffered read data from the peer. */ |
| 74 | close(sock); |
Denys Vlasenko | 7801148 | 2013-07-25 14:00:37 +0200 | [diff] [blame] | 75 | |
| 76 | Defeating Nagle. |
| 77 | |
| 78 | Method #1: manually control whether partial sends are allowed: |
| 79 | |
| 80 | This prevents partially filled packets being sent: |
| 81 | |
| 82 | int state = 1; |
| 83 | setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state)); |
| 84 | |
| 85 | and this forces last, partially filled packet (if any) to be sent: |
| 86 | |
| 87 | int state = 0; |
| 88 | setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state)); |
| 89 | |
| 90 | Method #2: make any write to immediately send data, even if it's partial: |
| 91 | |
| 92 | int state = 1; |
| 93 | setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state)); |