Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 1 | Daemontools and runit |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 2 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 3 | Tired of PID files, needing root access, and writing init scripts just |
| 4 | to have your UNIX apps start when your server boots? Want a simpler, |
| 5 | better alternative that will also restart them if they crash? If so, |
| 6 | this is an introduction to process supervision with runit/daemontools. |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 7 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 8 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 9 | Background |
| 10 | |
| 11 | Classic init scripts, e.g. /etc/init.d/apache, are widely used for |
| 12 | starting processes at system boot time, when they are executed by init. |
| 13 | Sadly, init scripts are cumbersome and error-prone to write, they must |
| 14 | typically be edited and run as root, and the processes they launch do |
| 15 | not get restarted automatically if they crash. |
| 16 | |
| 17 | In an alternative scheme called "process supervision", each important |
| 18 | process is looked after by a tiny supervising process, which deals with |
| 19 | starting and stopping the important process on request, and re-starting |
| 20 | it when it exits unexpectedly. Those supervising processes can in turn |
| 21 | be supervised by other supervising processes. |
| 22 | |
| 23 | Dan Bernstein wrote the process supervision toolkit, "daemontools", |
| 24 | which is a set of small, reliable programs that cooperate in the |
| 25 | UNIX tradition to manage process supervision trees. |
| 26 | |
| 27 | Runit is a more conveniently licensed and more actively maintained |
| 28 | reimplementation of daemontools, written by Gerrit Pape. |
| 29 | |
| 30 | Here I’ll use runit, however, the ideas are the same for other |
| 31 | daemontools-like projects (there are several). |
| 32 | |
| 33 | |
| 34 | Service directories and scripts |
| 35 | |
| 36 | In runit parlance a "service" is simply a directory containing a script |
| 37 | named "run". |
| 38 | |
| 39 | There are just two key programs in runit. Firstly, runsv supervises the |
| 40 | process for an individual service. Service directories themselves sit |
| 41 | inside a containing directory, and the runsvdir program supervises that |
| 42 | directory, running one child runsv process for the service in each |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 43 | subdirectory. A typical choice is to start an instance of runsvdir |
| 44 | which supervises services in subdirectories of /var/service/. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 45 | |
| 46 | If /var/service/log/ exists, runsv will supervise two services, |
| 47 | and will connect stdout of main service to the stdin of log service. |
| 48 | This is primarily used for logging. |
| 49 | |
| 50 | You can debug an individual service by running its SERVICE_DIR/run script. |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 51 | In this case, its stdout and stderr go to your terminal. |
| 52 | |
| 53 | You can also run "runsv SERVICE_DIR", which runs both the service |
| 54 | and its logger service (SERVICE_DIR/log/run) if logger service exists. |
| 55 | If logger service exists, the output will go to it instead of the terminal. |
| 56 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 57 | "runsvdir /var/service" merely runs "runsv SERVICE_DIR" for every subdirectory |
| 58 | in /var/service. |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 59 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 60 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 61 | Examples |
| 62 | |
| 63 | This directory contains some examples of services: |
| 64 | |
| 65 | var_service/getty_<tty> |
| 66 | |
| 67 | Runs a getty on <tty>. (run script looks at $PWD and extracts suffix |
| 68 | after "_" as tty name). Create copies (or symlinks) of this directory |
| 69 | with different names to run many gettys on many ttys. |
| 70 | |
| 71 | var_service/gpm |
| 72 | |
| 73 | Runs gpm, the cut and paste utility and mouse server for text consoles. |
| 74 | |
| 75 | var_service/inetd |
| 76 | |
| 77 | Runs inetd. This is an example of a service with log. Log service |
| 78 | writes timestamped, rotated log data to /var/log/service/inetd/* |
| 79 | using "svlogd -tt". p_log and w_log scripts demonstrage how you can |
| 80 | "page log" and "watch log". |
| 81 | |
| 82 | Other services which have logs handle them in the same way. |
| 83 | |
| 84 | var_service/nmeter |
| 85 | |
| 86 | Runs nmeter '%t %c ....' with output to /dev/tty9. This gives you |
| 87 | a 1-second sampling of server load and health on a dedicated text console. |
| 88 | |
| 89 | |
| 90 | Networking examples |
| 91 | |
| 92 | In many cases, network configuration makes it necessary to run several daemons: |
| 93 | dhcp, zeroconf, ppp, openvpn and such. They need to be controlled, |
| 94 | and in many cases you also want to babysit them. |
| 95 | |
| 96 | They present a case where different services need to control (start, stop, |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 97 | restart) each other. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 98 | |
| 99 | var_service/dhcp_if |
| 100 | |
Denys Vlasenko | 2e01eec | 2017-07-27 12:53:20 +0200 | [diff] [blame^] | 101 | controls a udhcpc instance which provides DHCP-assigned IP |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 102 | address on interface named "if". Copy/rename this directory as needed to run |
| 103 | udhcpc on other interfaces (var_service/dhcp_if/run script uses _foo suffix |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 104 | of the parent directory as interface name). |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 105 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 106 | When IP address is obtained or lost, var_service/dhcp_if/dhcp_handler is run. |
| 107 | It saves new config data to /var/run/service/fw/dhcp_if.ipconf and (re)starts |
| 108 | /var/service/fw service. This example can be used as a template for other |
| 109 | dynamic network link services (ppp/vpn/zcip). |
| 110 | |
| 111 | This is an example of service with has a "finish" script. If downed ("sv d"), |
| 112 | "finish" is executed. For this service, it removes DHCP address from |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 113 | the interface. This is useful when ifplugd detects that the the link is dead |
| 114 | (cable is no longer attached anywhere) and downs us - keeping DHCP configured |
| 115 | addresses on the interface would make kernel still try to use it. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 116 | |
| 117 | var_service/zcip_if |
| 118 | |
| 119 | Zeroconf IP service: assigns a 169.254.x.y/16 address to interface "if". |
Denys Vlasenko | e43000f | 2016-10-14 18:48:05 +0200 | [diff] [blame] | 120 | This allows to talk to other devices on a network without DHCP server |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 121 | (if they also assign 169.254 addresses to themselves). |
| 122 | |
| 123 | var_service/ifplugd_if |
| 124 | |
| 125 | Watches link status of interface "if". Downs and ups /var/service/dhcp_if |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 126 | service accordingly. In effect, it allows you to unplug/plug-to-different-network |
| 127 | and have your IP properly re-negotiated at once. |
| 128 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 129 | var_service/dhcp_if_pinger |
| 130 | |
Denys Vlasenko | 1a1cfed | 2015-10-24 14:58:58 +0200 | [diff] [blame] | 131 | Uses var_service/dhcp_if's data to determine router IP. Pings it. |
| 132 | If ping fails, restarts /var/service/dhcp_if service. |
| 133 | Basically, an example of watchdog service for networks which are not reliable |
| 134 | and need babysitting. |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 135 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 136 | var_service/supplicant_if |
| 137 | |
| 138 | Wireless supplicant (wifi association and encryption daemon) service for |
Denys Vlasenko | e43000f | 2016-10-14 18:48:05 +0200 | [diff] [blame] | 139 | interface "if". |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 140 | |
| 141 | var_service/fw |
| 142 | |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 143 | "Firewall" script, although it is tasked with much more than setting up firewall. |
| 144 | It is responsible for all aspects of network configuration. |
| 145 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 146 | This is an example of *one-shot* service. |
| 147 | |
| 148 | It reconfigures network based on current known state of ALL interfaces. |
| 149 | Uses conf/*.ipconf (static config) and /var/run/service/fw/*.ipconf |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 150 | (dynamic config from dhcp/ppp/vpn/etc) to determine what to do. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 151 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 152 | One-shot-ness of this service means that it shuts itself off after single run. |
Denys Vlasenko | bc3cdf8 | 2010-12-06 15:42:44 +0100 | [diff] [blame] | 153 | IOW: it is not a constantly running daemon sort of thing. |
| 154 | It starts, it configures the network, it shuts down, all done |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 155 | (unlike infamous NetworkManagers which sit in RAM forever). |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 156 | |
| 157 | However, any dhcp/ppp/vpn or similar service can restart it anytime |
| 158 | when it senses the change in network configuration. |
| 159 | This even works while fw service runs: if dhcp signals fw to (re)start |
| 160 | while fw runs, fw will not stop after its execution, but will re-execute once, |
| 161 | picking up dhcp's new configuration. |
| 162 | This is achieved very simply by having |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 163 | # Make ourself one-shot |
| 164 | sv o . |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 165 | at the very beginning of fw/run script, not at the end. |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 166 | |
Denys Vlasenko | 2e01eec | 2017-07-27 12:53:20 +0200 | [diff] [blame^] | 167 | Therefore, any "sv u fw" command by any other script "undoes" o(ne-shot) |
| 168 | command if fw still runs, thus runsv will rerun it; or start it |
| 169 | in a normal way if fw is not running. |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 170 | |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame] | 171 | This mechanism is the reason why fw is a service, not just a script. |
| 172 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 173 | System administrators are expected to edit fw/run script, since |
| 174 | network configuration needs are likely to be very complex and different |
| 175 | for non-trivial installations. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 176 | |
| 177 | var_service/ftpd |
| 178 | var_service/httpd |
| 179 | var_service/tftpd |
| 180 | var_service/ntpd |
| 181 | |
| 182 | Examples of typical network daemons. |
Denys Vlasenko | 6bbb48f | 2016-10-14 19:02:11 +0200 | [diff] [blame] | 183 | |
| 184 | |
| 185 | Process tree |
| 186 | |
| 187 | Here is an example of the process tree from a live system with these services |
| 188 | (and a few others). An interesting detail are ftpd and vpnc services, where |
| 189 | you can see only logger process. These services are "downed" at the moment: |
| 190 | their daemons are not launched. |
| 191 | |
| 192 | PID TIME COMMAND |
| 193 | 553 0:04 runsvdir -P /var/service |
| 194 | 561 0:00 runsv sshd |
| 195 | 576 0:00 svlogd -tt /var/log/service/sshd |
| 196 | 589 0:00 /usr/sbin/sshd -D -e -p22 -u0 -h /var/service/sshd/ssh_host_rsa_key |
| 197 | 562 0:00 runsv dhcp_eth0 |
| 198 | 568 0:00 svlogd -tt /var/log/service/dhcp_eth0 |
| 199 | 850 0:00 udhcpc -vv --foreground --interface=eth0 |
| 200 | --pidfile=/var/service/dhcp_eth0/udhcpc.pid |
Denys Vlasenko | 2e01eec | 2017-07-27 12:53:20 +0200 | [diff] [blame^] | 201 | --script=/var/service/dhcp_eth0/dhcp_handler |
| 202 | -x hostname bbox |
Denys Vlasenko | 6bbb48f | 2016-10-14 19:02:11 +0200 | [diff] [blame] | 203 | 563 0:00 runsv ntpd |
| 204 | 573 0:01 svlogd -tt /var/log/service/ntpd |
| 205 | 845 0:00 busybox ntpd -dddnNl -S ./ntp.script -p 10.x.x.x -p 10.x.x.x |
| 206 | 564 0:00 runsv ifplugd_wlan0 |
| 207 | 598 0:00 svlogd -tt /var/log/service/ifplugd_wlan0 |
| 208 | 614 0:05 ifplugd -apqns -t3 -u0 -d0 -i wlan0 |
| 209 | -r /var/service/ifplugd_wlan0/ifplugd_handler |
| 210 | 565 0:08 runsv dhcp_wlan0_pinger |
| 211 | 911 0:00 sleep 67 |
| 212 | 566 0:00 runsv unscd |
| 213 | 583 0:03 svlogd -tt /var/log/service/unscd |
| 214 | 599 0:02 nscd -dddd |
| 215 | 567 0:00 runsv dhcp_wlan0 |
| 216 | 591 0:00 svlogd -tt /var/log/service/dhcp_wlan0 |
| 217 | 802 0:00 udhcpc -vv -C -o -V --foreground --interface=wlan0 |
| 218 | --pidfile=/var/service/dhcp_wlan0/udhcpc.pid |
| 219 | --script=/var/service/dhcp_wlan0/dhcp_handler |
| 220 | 569 0:00 runsv fw |
| 221 | 570 0:00 runsv ifplugd_eth0 |
| 222 | 597 0:00 svlogd -tt /var/log/service/ifplugd_eth0 |
| 223 | 612 0:05 ifplugd -apqns -t3 -u8 -d8 -i eth0 |
| 224 | -r /var/service/ifplugd_eth0/ifplugd_handler |
| 225 | 571 0:00 runsv zcip_eth0 |
| 226 | 590 0:00 svlogd -tt /var/log/service/zcip_eth0 |
| 227 | 607 0:01 zcip -fvv eth0 /var/service/zcip_eth0/zcip_handler |
| 228 | 572 0:00 runsv ftpd |
| 229 | 604 0:00 svlogd -tt /var/log/service/ftpd |
| 230 | 574 0:00 runsv vpnc |
| 231 | 603 0:00 svlogd -tt /var/log/service/vpnc |
| 232 | 575 0:00 runsv httpd |
| 233 | 602 0:00 svlogd -tt /var/log/service/httpd |
| 234 | 622 0:00 busybox httpd -p80 -vvv -f -h /home/httpd_root |
| 235 | 577 0:00 runsv supplicant_wlan0 |
| 236 | 627 0:00 svlogd -tt /var/log/service/supplicant_wlan0 |
Denys Vlasenko | 2e01eec | 2017-07-27 12:53:20 +0200 | [diff] [blame^] | 237 | 638 0:03 wpa_supplicant -i wlan0 |
| 238 | -c /var/service/supplicant_wlan0/wpa_supplicant.conf -d |