Denis Vlasenko | d0bbbdc | 2007-12-04 09:48:40 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| 2 | <html><head> |
| 3 | <!-- saved from http://www.win.tue.nl/~aeb/linux/lk/lk-10.html --> |
| 4 | <meta name="GENERATOR" content="SGML-Tools 1.0.9"><title>The Linux kernel: Processes</title> |
| 5 | </head> |
| 6 | <body> |
| 7 | <hr> |
| 8 | <h2><a name="s10">10. Processes</a></h2> |
| 9 | |
| 10 | <p>Before looking at the Linux implementation, first a general Unix |
| 11 | description of threads, processes, process groups and sessions. |
| 12 | </p><p>A session contains a number of process groups, and a process group |
| 13 | contains a number of processes, and a process contains a number |
| 14 | of threads. |
| 15 | </p><p>A session can have a controlling tty. |
| 16 | At most one process group in a session can be a foreground process group. |
| 17 | An interrupt character typed on a tty ("Teletype", i.e., terminal) |
| 18 | causes a signal to be sent to all members of the foreground process group |
| 19 | in the session (if any) that has that tty as controlling tty. |
| 20 | </p><p>All these objects have numbers, and we have thread IDs, process IDs, |
| 21 | process group IDs and session IDs. |
| 22 | </p><p> |
| 23 | </p><h2><a name="ss10.1">10.1 Processes</a> |
| 24 | </h2> |
| 25 | |
| 26 | <p> |
| 27 | </p><h3>Creation</h3> |
| 28 | |
| 29 | <p>A new process is traditionally started using the <code>fork()</code> |
| 30 | system call: |
| 31 | </p><blockquote> |
| 32 | <pre>pid_t p; |
| 33 | |
| 34 | p = fork(); |
| 35 | if (p == (pid_t) -1) |
| 36 | /* ERROR */ |
| 37 | else if (p == 0) |
| 38 | /* CHILD */ |
| 39 | else |
| 40 | /* PARENT */ |
| 41 | </pre> |
| 42 | </blockquote> |
| 43 | <p>This creates a child as a duplicate of its parent. |
| 44 | Parent and child are identical in almost all respects. |
| 45 | In the code they are distinguished by the fact that the parent |
| 46 | learns the process ID of its child, while <code>fork()</code> |
| 47 | returns 0 in the child. (It can find the process ID of its |
| 48 | parent using the <code>getppid()</code> system call.) |
| 49 | </p><p> |
| 50 | </p><h3>Termination</h3> |
| 51 | |
| 52 | <p>Normal termination is when the process does |
| 53 | </p><blockquote> |
| 54 | <pre>exit(n); |
| 55 | </pre> |
| 56 | </blockquote> |
| 57 | |
| 58 | or |
| 59 | <blockquote> |
| 60 | <pre>return n; |
| 61 | </pre> |
| 62 | </blockquote> |
| 63 | |
| 64 | from its <code>main()</code> procedure. It returns the single byte <code>n</code> |
| 65 | to its parent. |
| 66 | <p>Abnormal termination is usually caused by a signal. |
| 67 | </p><p> |
| 68 | </p><h3>Collecting the exit code. Zombies</h3> |
| 69 | |
| 70 | <p>The parent does |
| 71 | </p><blockquote> |
| 72 | <pre>pid_t p; |
| 73 | int status; |
| 74 | |
| 75 | p = wait(&status); |
| 76 | </pre> |
| 77 | </blockquote> |
| 78 | |
| 79 | and collects two bytes: |
| 80 | <p> |
| 81 | <figure> |
| 82 | <eps file="absent"> |
| 83 | <img src="ctty_files/exit_status.png"> |
| 84 | </eps> |
| 85 | </figure></p><p>A process that has terminated but has not yet been waited for |
| 86 | is a <i>zombie</i>. It need only store these two bytes: |
| 87 | exit code and reason for termination. |
| 88 | </p><p>On the other hand, if the parent dies first, <code>init</code> (process 1) |
| 89 | inherits the child and becomes its parent. |
| 90 | </p><p> |
| 91 | </p><h3>Signals</h3> |
| 92 | |
| 93 | <p> |
| 94 | </p><h3>Stopping</h3> |
| 95 | |
| 96 | <p>Some signals cause a process to stop: |
| 97 | <code>SIGSTOP</code> (stop!), |
| 98 | <code>SIGTSTP</code> (stop from tty: probably ^Z was typed), |
| 99 | <code>SIGTTIN</code> (tty input asked by background process), |
| 100 | <code>SIGTTOU</code> (tty output sent by background process, and this was |
| 101 | disallowed by <code>stty tostop</code>). |
| 102 | </p><p>Apart from ^Z there also is ^Y. The former stops the process |
| 103 | when it is typed, the latter stops it when it is read. |
| 104 | </p><p>Signals generated by typing the corresponding character on some tty |
| 105 | are sent to all processes that are in the foreground process group |
| 106 | of the session that has that tty as controlling tty. (Details below.) |
| 107 | </p><p>If a process is being traced, every signal will stop it. |
| 108 | </p><p> |
| 109 | </p><h3>Continuing</h3> |
| 110 | |
| 111 | <p><code>SIGCONT</code>: continue a stopped process. |
| 112 | </p><p> |
| 113 | </p><h3>Terminating</h3> |
| 114 | |
| 115 | <p><code>SIGKILL</code> (die! now!), |
| 116 | <code>SIGTERM</code> (please, go away), |
| 117 | <code>SIGHUP</code> (modem hangup), |
| 118 | <code>SIGINT</code> (^C), |
| 119 | <code>SIGQUIT</code> (^\), etc. |
| 120 | Many signals have as default action to kill the target. |
| 121 | (Sometimes with an additional core dump, when such is |
| 122 | allowed by rlimit.) |
| 123 | The signals <code>SIGCHLD</code> and <code>SIGWINCH</code> |
| 124 | are ignored by default. |
| 125 | All except <code>SIGKILL</code> and <code>SIGSTOP</code> can be |
| 126 | caught or ignored or blocked. |
| 127 | For details, see <code>signal(7)</code>. |
| 128 | </p><p> |
| 129 | </p><h2><a name="ss10.2">10.2 Process groups</a> |
| 130 | </h2> |
| 131 | |
| 132 | <p>Every process is member of a unique <i>process group</i>, |
| 133 | identified by its <i>process group ID</i>. |
| 134 | (When the process is created, it becomes a member of the process group |
| 135 | of its parent.) |
| 136 | By convention, the process group ID of a process group |
| 137 | equals the process ID of the first member of the process group, |
| 138 | called the <i>process group leader</i>. |
| 139 | A process finds the ID of its process group using the system call |
| 140 | <code>getpgrp()</code>, or, equivalently, <code>getpgid(0)</code>. |
| 141 | One finds the process group ID of process <code>p</code> using |
| 142 | <code>getpgid(p)</code>. |
| 143 | </p><p>One may use the command <code>ps j</code> to see PPID (parent process ID), |
| 144 | PID (process ID), PGID (process group ID) and SID (session ID) |
| 145 | of processes. With a shell that does not know about job control, |
| 146 | like <code>ash</code>, each of its children will be in the same session |
| 147 | and have the same process group as the shell. With a shell that knows |
Denis Vlasenko | b44c790 | 2008-03-17 09:29:43 +0000 | [diff] [blame] | 148 | about job control, like <code>bash</code>, the processes of one pipeline, like |
Denis Vlasenko | d0bbbdc | 2007-12-04 09:48:40 +0000 | [diff] [blame] | 149 | </p><blockquote> |
| 150 | <pre>% cat paper | ideal | pic | tbl | eqn | ditroff > out |
| 151 | </pre> |
| 152 | </blockquote> |
| 153 | |
| 154 | form a single process group. |
| 155 | <p> |
| 156 | </p><h3>Creation</h3> |
| 157 | |
| 158 | <p>A process <code>pid</code> is put into the process group <code>pgid</code> by |
| 159 | </p><blockquote> |
| 160 | <pre>setpgid(pid, pgid); |
| 161 | </pre> |
| 162 | </blockquote> |
| 163 | |
| 164 | If <code>pgid == pid</code> or <code>pgid == 0</code> then this creates |
| 165 | a new process group with process group leader <code>pid</code>. |
| 166 | Otherwise, this puts <code>pid</code> into the already existing |
| 167 | process group <code>pgid</code>. |
| 168 | A zero <code>pid</code> refers to the current process. |
| 169 | The call <code>setpgrp()</code> is equivalent to <code>setpgid(0,0)</code>. |
| 170 | <p> |
| 171 | </p><h3>Restrictions on setpgid()</h3> |
| 172 | |
| 173 | <p>The calling process must be <code>pid</code> itself, or its parent, |
| 174 | and the parent can only do this before <code>pid</code> has done |
| 175 | <code>exec()</code>, and only when both belong to the same session. |
| 176 | It is an error if process <code>pid</code> is a session leader |
| 177 | (and this call would change its <code>pgid</code>). |
| 178 | </p><p> |
| 179 | </p><h3>Typical sequence</h3> |
| 180 | |
| 181 | <p> |
| 182 | </p><blockquote> |
| 183 | <pre>p = fork(); |
| 184 | if (p == (pid_t) -1) { |
| 185 | /* ERROR */ |
| 186 | } else if (p == 0) { /* CHILD */ |
| 187 | setpgid(0, pgid); |
| 188 | ... |
| 189 | } else { /* PARENT */ |
| 190 | setpgid(p, pgid); |
| 191 | ... |
| 192 | } |
| 193 | </pre> |
| 194 | </blockquote> |
| 195 | |
| 196 | This ensures that regardless of whether parent or child is scheduled |
| 197 | first, the process group setting is as expected by both. |
| 198 | <p> |
| 199 | </p><h3>Signalling and waiting</h3> |
| 200 | |
| 201 | <p>One can signal all members of a process group: |
| 202 | </p><blockquote> |
| 203 | <pre>killpg(pgrp, sig); |
| 204 | </pre> |
| 205 | </blockquote> |
| 206 | <p>One can wait for children in ones own process group: |
| 207 | </p><blockquote> |
| 208 | <pre>waitpid(0, &status, ...); |
| 209 | </pre> |
| 210 | </blockquote> |
| 211 | |
| 212 | or in a specified process group: |
| 213 | <blockquote> |
| 214 | <pre>waitpid(-pgrp, &status, ...); |
| 215 | </pre> |
| 216 | </blockquote> |
| 217 | <p> |
| 218 | </p><h3>Foreground process group</h3> |
| 219 | |
| 220 | <p>Among the process groups in a session at most one can be |
| 221 | the <i>foreground process group</i> of that session. |
| 222 | The tty input and tty signals (signals generated by ^C, ^Z, etc.) |
| 223 | go to processes in this foreground process group. |
| 224 | </p><p>A process can determine the foreground process group in its session |
| 225 | using <code>tcgetpgrp(fd)</code>, where <code>fd</code> refers to its |
| 226 | controlling tty. If there is none, this returns a random value |
| 227 | larger than 1 that is not a process group ID. |
| 228 | </p><p>A process can set the foreground process group in its session |
| 229 | using <code>tcsetpgrp(fd,pgrp)</code>, where <code>fd</code> refers to its |
Denis Vlasenko | b44c790 | 2008-03-17 09:29:43 +0000 | [diff] [blame] | 230 | controlling tty, and <code>pgrp</code> is a process group in |
Denis Vlasenko | d0bbbdc | 2007-12-04 09:48:40 +0000 | [diff] [blame] | 231 | its session, and this session still is associated to the controlling |
| 232 | tty of the calling process. |
| 233 | </p><p>How does one get <code>fd</code>? By definition, <code>/dev/tty</code> |
| 234 | refers to the controlling tty, entirely independent of redirects |
| 235 | of standard input and output. (There is also the function |
| 236 | <code>ctermid()</code> to get the name of the controlling terminal. |
| 237 | On a POSIX standard system it will return <code>/dev/tty</code>.) |
| 238 | Opening the name of the |
| 239 | controlling tty gives a file descriptor <code>fd</code>. |
| 240 | </p><p> |
| 241 | </p><h3>Background process groups</h3> |
| 242 | |
| 243 | <p>All process groups in a session that are not foreground |
| 244 | process group are <i>background process groups</i>. |
| 245 | Since the user at the keyboard is interacting with foreground |
| 246 | processes, background processes should stay away from it. |
| 247 | When a background process reads from the terminal it gets |
| 248 | a SIGTTIN signal. Normally, that will stop it, the job control shell |
| 249 | notices and tells the user, who can say <code>fg</code> to continue |
| 250 | this background process as a foreground process, and then this |
| 251 | process can read from the terminal. But if the background process |
| 252 | ignores or blocks the SIGTTIN signal, or if its process group |
| 253 | is orphaned (see below), then the read() returns an EIO error, |
| 254 | and no signal is sent. (Indeed, the idea is to tell the process |
| 255 | that reading from the terminal is not allowed right now. |
| 256 | If it wouldn't see the signal, then it will see the error return.) |
| 257 | </p><p>When a background process writes to the terminal, it may get |
| 258 | a SIGTTOU signal. May: namely, when the flag that this must happen |
| 259 | is set (it is off by default). One can set the flag by |
| 260 | </p><blockquote> |
| 261 | <pre>% stty tostop |
| 262 | </pre> |
| 263 | </blockquote> |
| 264 | |
| 265 | and clear it again by |
| 266 | <blockquote> |
| 267 | <pre>% stty -tostop |
| 268 | </pre> |
| 269 | </blockquote> |
| 270 | |
| 271 | and inspect it by |
| 272 | <blockquote> |
| 273 | <pre>% stty -a |
| 274 | </pre> |
| 275 | </blockquote> |
| 276 | |
| 277 | Again, if TOSTOP is set but the background process ignores or blocks |
| 278 | the SIGTTOU signal, or if its process group is orphaned (see below), |
| 279 | then the write() returns an EIO error, and no signal is sent. |
| 280 | <p> |
| 281 | </p><h3>Orphaned process groups</h3> |
| 282 | |
| 283 | <p>The process group leader is the first member of the process group. |
| 284 | It may terminate before the others, and then the process group is |
| 285 | without leader. |
| 286 | </p><p>A process group is called <i>orphaned</i> when <i>the |
| 287 | parent of every member is either in the process group |
| 288 | or outside the session</i>. |
| 289 | In particular, the process group of the session leader |
| 290 | is always orphaned. |
| 291 | </p><p>If termination of a process causes a process group to become |
| 292 | orphaned, and some member is stopped, then all are sent first SIGHUP |
| 293 | and then SIGCONT. |
| 294 | </p><p>The idea is that perhaps the parent of the process group leader |
| 295 | is a job control shell. (In the same session but a different |
| 296 | process group.) As long as this parent is alive, it can |
| 297 | handle the stopping and starting of members in the process group. |
| 298 | When it dies, there may be nobody to continue stopped processes. |
| 299 | Therefore, these stopped processes are sent SIGHUP, so that they |
| 300 | die unless they catch or ignore it, and then SIGCONT to continue them. |
| 301 | </p><p>Note that the process group of the session leader is already |
| 302 | orphaned, so no signals are sent when the session leader dies. |
| 303 | </p><p>Note also that a process group can become orphaned in two ways |
| 304 | by termination of a process: either it was a parent and not itself |
| 305 | in the process group, or it was the last element of the process group |
| 306 | with a parent outside but in the same session. |
| 307 | Furthermore, that a process group can become orphaned |
| 308 | other than by termination of a process, namely when some |
| 309 | member is moved to a different process group. |
| 310 | </p><p> |
| 311 | </p><h2><a name="ss10.3">10.3 Sessions</a> |
| 312 | </h2> |
| 313 | |
| 314 | <p>Every process group is in a unique <i>session</i>. |
| 315 | (When the process is created, it becomes a member of the session |
| 316 | of its parent.) |
| 317 | By convention, the session ID of a session |
| 318 | equals the process ID of the first member of the session, |
| 319 | called the <i>session leader</i>. |
| 320 | A process finds the ID of its session using the system call |
| 321 | <code>getsid()</code>. |
| 322 | </p><p>Every session may have a <i>controlling tty</i>, |
| 323 | that then also is called the controlling tty of each of |
| 324 | its member processes. |
| 325 | A file descriptor for the controlling tty is obtained by |
| 326 | opening <code>/dev/tty</code>. (And when that fails, there was no |
| 327 | controlling tty.) Given a file descriptor for the controlling tty, |
| 328 | one may obtain the SID using <code>tcgetsid(fd)</code>. |
| 329 | </p><p>A session is often set up by a login process. The terminal |
| 330 | on which one is logged in then becomes the controlling tty |
| 331 | of the session. All processes that are descendants of the |
| 332 | login process will in general be members of the session. |
| 333 | </p><p> |
| 334 | </p><h3>Creation</h3> |
| 335 | |
| 336 | <p>A new session is created by |
| 337 | </p><blockquote> |
| 338 | <pre>pid = setsid(); |
| 339 | </pre> |
| 340 | </blockquote> |
| 341 | |
| 342 | This is allowed only when the current process is not a process group leader. |
| 343 | In order to be sure of that we fork first: |
| 344 | <blockquote> |
| 345 | <pre>p = fork(); |
| 346 | if (p) exit(0); |
| 347 | pid = setsid(); |
| 348 | </pre> |
| 349 | </blockquote> |
| 350 | |
| 351 | The result is that the current process (with process ID <code>pid</code>) |
| 352 | becomes session leader of a new session with session ID <code>pid</code>. |
| 353 | Moreover, it becomes process group leader of a new process group. |
| 354 | Both session and process group contain only the single process <code>pid</code>. |
| 355 | Furthermore, this process has no controlling tty. |
| 356 | <p>The restriction that the current process must not be a process group leader |
| 357 | is needed: otherwise its PID serves as PGID of some existing process group |
| 358 | and cannot be used as the PGID of a new process group. |
| 359 | </p><p> |
| 360 | </p><h3>Getting a controlling tty</h3> |
| 361 | |
| 362 | <p>How does one get a controlling terminal? Nobody knows, |
| 363 | this is a great mystery. |
| 364 | </p><p>The System V approach is that the first tty opened by the process |
| 365 | becomes its controlling tty. |
| 366 | </p><p>The BSD approach is that one has to explicitly call |
| 367 | </p><blockquote> |
Denis Vlasenko | 2afabe8 | 2007-12-10 07:06:04 +0000 | [diff] [blame] | 368 | <pre>ioctl(fd, TIOCSCTTY, 0/1); |
Denis Vlasenko | d0bbbdc | 2007-12-04 09:48:40 +0000 | [diff] [blame] | 369 | </pre> |
| 370 | </blockquote> |
| 371 | |
| 372 | to get a controlling tty. |
| 373 | <p>Linux tries to be compatible with both, as always, and this |
| 374 | results in a very obscure complex of conditions. Roughly: |
| 375 | </p><p>The <code>TIOCSCTTY</code> ioctl will give us a controlling tty, |
| 376 | provided that (i) the current process is a session leader, |
| 377 | and (ii) it does not yet have a controlling tty, and |
| 378 | (iii) maybe the tty should not already control some other session; |
| 379 | if it does it is an error if we aren't root, or we steal the tty |
| 380 | if we are all-powerful. |
Denis Vlasenko | 2afabe8 | 2007-12-10 07:06:04 +0000 | [diff] [blame] | 381 | [vda: correction: third parameter controls this: if 1, we steal tty from |
| 382 | any such session, if 0, we don't steal] |
Denis Vlasenko | d0bbbdc | 2007-12-04 09:48:40 +0000 | [diff] [blame] | 383 | </p><p>Opening some terminal will give us a controlling tty, |
| 384 | provided that (i) the current process is a session leader, and |
| 385 | (ii) it does not yet have a controlling tty, and |
| 386 | (iii) the tty does not already control some other session, and |
| 387 | (iv) the open did not have the <code>O_NOCTTY</code> flag, and |
| 388 | (v) the tty is not the foreground VT, and |
| 389 | (vi) the tty is not the console, and |
| 390 | (vii) maybe the tty should not be master or slave pty. |
| 391 | </p><p> |
| 392 | </p><h3>Getting rid of a controlling tty</h3> |
| 393 | |
| 394 | <p>If a process wants to continue as a daemon, it must detach itself |
| 395 | from its controlling tty. Above we saw that <code>setsid()</code> |
| 396 | will remove the controlling tty. Also the ioctl TIOCNOTTY does this. |
| 397 | Moreover, in order not to get a controlling tty again as soon as it |
| 398 | opens a tty, the process has to fork once more, to assure that it |
| 399 | is not a session leader. Typical code fragment: |
| 400 | </p><p> |
| 401 | </p><pre> if ((fork()) != 0) |
| 402 | exit(0); |
| 403 | setsid(); |
| 404 | if ((fork()) != 0) |
| 405 | exit(0); |
| 406 | </pre> |
| 407 | <p>See also <code>daemon(3)</code>. |
| 408 | </p><p> |
| 409 | </p><h3>Disconnect</h3> |
| 410 | |
| 411 | <p>If the terminal goes away by modem hangup, and the line was not local, |
| 412 | then a SIGHUP is sent to the session leader. |
| 413 | Any further reads from the gone terminal return EOF. |
| 414 | (Or possibly -1 with <code>errno</code> set to EIO.) |
| 415 | </p><p>If the terminal is the slave side of a pseudotty, and the master side |
| 416 | is closed (for the last time), then a SIGHUP is sent to the foreground |
| 417 | process group of the slave side. |
| 418 | </p><p>When the session leader dies, a SIGHUP is sent to all processes |
| 419 | in the foreground process group. Moreover, the terminal stops being |
| 420 | the controlling terminal of this session (so that it can become |
| 421 | the controlling terminal of another session). |
| 422 | </p><p>Thus, if the terminal goes away and the session leader is |
| 423 | a job control shell, then it can handle things for its descendants, |
| 424 | e.g. by sending them again a SIGHUP. |
| 425 | If on the other hand the session leader is an innocent process |
| 426 | that does not catch SIGHUP, it will die, and all foreground processes |
| 427 | get a SIGHUP. |
| 428 | </p><p> |
| 429 | </p><h2><a name="ss10.4">10.4 Threads</a> |
| 430 | </h2> |
| 431 | |
| 432 | <p>A process can have several threads. New threads (with the same PID |
| 433 | as the parent thread) are started using the <code>clone</code> system |
| 434 | call using the <code>CLONE_THREAD</code> flag. Threads are distinguished |
| 435 | by a <i>thread ID</i> (TID). An ordinary process has a single thread |
| 436 | with TID equal to PID. The system call <code>gettid()</code> returns the |
| 437 | TID. The system call <code>tkill()</code> sends a signal to a single thread. |
| 438 | </p><p>Example: a process with two threads. Both only print PID and TID and exit. |
| 439 | (Linux 2.4.19 or later.) |
| 440 | </p><pre>% cat << EOF > gettid-demo.c |
| 441 | #include <unistd.h> |
| 442 | #include <sys/types.h> |
| 443 | #define CLONE_SIGHAND 0x00000800 |
| 444 | #define CLONE_THREAD 0x00010000 |
| 445 | #include <linux/unistd.h> |
| 446 | #include <errno.h> |
| 447 | _syscall0(pid_t,gettid) |
| 448 | |
| 449 | int thread(void *p) { |
| 450 | printf("thread: %d %d\n", gettid(), getpid()); |
| 451 | } |
| 452 | |
| 453 | main() { |
| 454 | unsigned char stack[4096]; |
| 455 | int i; |
| 456 | |
| 457 | i = clone(thread, stack+2048, CLONE_THREAD | CLONE_SIGHAND, NULL); |
| 458 | if (i == -1) |
| 459 | perror("clone"); |
| 460 | else |
| 461 | printf("clone returns %d\n", i); |
| 462 | printf("parent: %d %d\n", gettid(), getpid()); |
| 463 | } |
| 464 | EOF |
| 465 | % cc -o gettid-demo gettid-demo.c |
| 466 | % ./gettid-demo |
| 467 | clone returns 21826 |
| 468 | parent: 21825 21825 |
| 469 | thread: 21826 21825 |
| 470 | % |
| 471 | </pre> |
| 472 | <p> |
| 473 | </p><p> |
| 474 | </p><hr> |
| 475 | |
| 476 | </body></html> |