Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 1 | Daemontools and runit |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 2 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 3 | Tired of PID files, needing root access, and writing init scripts just |
| 4 | to have your UNIX apps start when your server boots? Want a simpler, |
| 5 | better alternative that will also restart them if they crash? If so, |
| 6 | this is an introduction to process supervision with runit/daemontools. |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 7 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 8 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 9 | Background |
| 10 | |
| 11 | Classic init scripts, e.g. /etc/init.d/apache, are widely used for |
| 12 | starting processes at system boot time, when they are executed by init. |
| 13 | Sadly, init scripts are cumbersome and error-prone to write, they must |
| 14 | typically be edited and run as root, and the processes they launch do |
| 15 | not get restarted automatically if they crash. |
| 16 | |
| 17 | In an alternative scheme called "process supervision", each important |
| 18 | process is looked after by a tiny supervising process, which deals with |
| 19 | starting and stopping the important process on request, and re-starting |
| 20 | it when it exits unexpectedly. Those supervising processes can in turn |
| 21 | be supervised by other supervising processes. |
| 22 | |
| 23 | Dan Bernstein wrote the process supervision toolkit, "daemontools", |
| 24 | which is a set of small, reliable programs that cooperate in the |
| 25 | UNIX tradition to manage process supervision trees. |
| 26 | |
| 27 | Runit is a more conveniently licensed and more actively maintained |
| 28 | reimplementation of daemontools, written by Gerrit Pape. |
| 29 | |
| 30 | Here I’ll use runit, however, the ideas are the same for other |
| 31 | daemontools-like projects (there are several). |
| 32 | |
| 33 | |
| 34 | Service directories and scripts |
| 35 | |
| 36 | In runit parlance a "service" is simply a directory containing a script |
| 37 | named "run". |
| 38 | |
| 39 | There are just two key programs in runit. Firstly, runsv supervises the |
| 40 | process for an individual service. Service directories themselves sit |
| 41 | inside a containing directory, and the runsvdir program supervises that |
| 42 | directory, running one child runsv process for the service in each |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 43 | subdirectory. A typical choice is to start an instance of runsvdir |
| 44 | which supervises services in subdirectories of /var/service/. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 45 | |
| 46 | If /var/service/log/ exists, runsv will supervise two services, |
| 47 | and will connect stdout of main service to the stdin of log service. |
| 48 | This is primarily used for logging. |
| 49 | |
| 50 | You can debug an individual service by running its SERVICE_DIR/run script. |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 51 | In this case, its stdout and stderr go to your terminal. |
| 52 | |
| 53 | You can also run "runsv SERVICE_DIR", which runs both the service |
| 54 | and its logger service (SERVICE_DIR/log/run) if logger service exists. |
| 55 | If logger service exists, the output will go to it instead of the terminal. |
| 56 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 57 | "runsvdir /var/service" merely runs "runsv SERVICE_DIR" for every subdirectory |
| 58 | in /var/service. |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 59 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 60 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 61 | Examples |
| 62 | |
| 63 | This directory contains some examples of services: |
| 64 | |
| 65 | var_service/getty_<tty> |
| 66 | |
| 67 | Runs a getty on <tty>. (run script looks at $PWD and extracts suffix |
| 68 | after "_" as tty name). Create copies (or symlinks) of this directory |
| 69 | with different names to run many gettys on many ttys. |
| 70 | |
| 71 | var_service/gpm |
| 72 | |
| 73 | Runs gpm, the cut and paste utility and mouse server for text consoles. |
| 74 | |
| 75 | var_service/inetd |
| 76 | |
| 77 | Runs inetd. This is an example of a service with log. Log service |
| 78 | writes timestamped, rotated log data to /var/log/service/inetd/* |
| 79 | using "svlogd -tt". p_log and w_log scripts demonstrage how you can |
| 80 | "page log" and "watch log". |
| 81 | |
| 82 | Other services which have logs handle them in the same way. |
| 83 | |
| 84 | var_service/nmeter |
| 85 | |
| 86 | Runs nmeter '%t %c ....' with output to /dev/tty9. This gives you |
| 87 | a 1-second sampling of server load and health on a dedicated text console. |
| 88 | |
| 89 | |
| 90 | Networking examples |
| 91 | |
| 92 | In many cases, network configuration makes it necessary to run several daemons: |
| 93 | dhcp, zeroconf, ppp, openvpn and such. They need to be controlled, |
| 94 | and in many cases you also want to babysit them. |
| 95 | |
| 96 | They present a case where different services need to control (start, stop, |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 97 | restart) each other. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 98 | |
| 99 | var_service/dhcp_if |
| 100 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 101 | controls a udhcpc instance which provides dhpc-assigned IP |
| 102 | address on interface named "if". Copy/rename this directory as needed to run |
| 103 | udhcpc on other interfaces (var_service/dhcp_if/run script uses _foo suffix |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 104 | of the parent directory as interface name). |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 105 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 106 | When IP address is obtained or lost, var_service/dhcp_if/dhcp_handler is run. |
| 107 | It saves new config data to /var/run/service/fw/dhcp_if.ipconf and (re)starts |
| 108 | /var/service/fw service. This example can be used as a template for other |
| 109 | dynamic network link services (ppp/vpn/zcip). |
| 110 | |
| 111 | This is an example of service with has a "finish" script. If downed ("sv d"), |
| 112 | "finish" is executed. For this service, it removes DHCP address from |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 113 | the interface. This is useful when ifplugd detects that the the link is dead |
| 114 | (cable is no longer attached anywhere) and downs us - keeping DHCP configured |
| 115 | addresses on the interface would make kernel still try to use it. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 116 | |
| 117 | var_service/zcip_if |
| 118 | |
| 119 | Zeroconf IP service: assigns a 169.254.x.y/16 address to interface "if". |
| 120 | This allows to talk to other divices on a network without DHCP server |
| 121 | (if they also assign 169.254 addresses to themselves). |
| 122 | |
| 123 | var_service/ifplugd_if |
| 124 | |
| 125 | Watches link status of interface "if". Downs and ups /var/service/dhcp_if |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 126 | service accordingly. In effect, it allows you to unplug/plug-to-different-network |
| 127 | and have your IP properly re-negotiated at once. |
| 128 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 129 | var_service/dhcp_if_pinger |
| 130 | |
Denys Vlasenko | 1a1cfed | 2015-10-24 14:58:58 +0200 | [diff] [blame] | 131 | Uses var_service/dhcp_if's data to determine router IP. Pings it. |
| 132 | If ping fails, restarts /var/service/dhcp_if service. |
| 133 | Basically, an example of watchdog service for networks which are not reliable |
| 134 | and need babysitting. |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 135 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 136 | var_service/supplicant_if |
| 137 | |
| 138 | Wireless supplicant (wifi association and encryption daemon) service for |
| 139 | inteface "if". |
| 140 | |
| 141 | var_service/fw |
| 142 | |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 143 | "Firewall" script, although it is tasked with much more than setting up firewall. |
| 144 | It is responsible for all aspects of network configuration. |
| 145 | |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 146 | This is an example of *one-shot* service. |
| 147 | |
| 148 | It reconfigures network based on current known state of ALL interfaces. |
| 149 | Uses conf/*.ipconf (static config) and /var/run/service/fw/*.ipconf |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 150 | (dynamic config from dhcp/ppp/vpn/etc) to determine what to do. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 151 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 152 | One-shot-ness of this service means that it shuts itself off after single run. |
Denys Vlasenko | bc3cdf8 | 2010-12-06 15:42:44 +0100 | [diff] [blame] | 153 | IOW: it is not a constantly running daemon sort of thing. |
| 154 | It starts, it configures the network, it shuts down, all done |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 155 | (unlike infamous NetworkManagers which sit in RAM forever). |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 156 | |
| 157 | However, any dhcp/ppp/vpn or similar service can restart it anytime |
| 158 | when it senses the change in network configuration. |
| 159 | This even works while fw service runs: if dhcp signals fw to (re)start |
| 160 | while fw runs, fw will not stop after its execution, but will re-execute once, |
| 161 | picking up dhcp's new configuration. |
| 162 | This is achieved very simply by having |
Denys Vlasenko | 192c14b | 2014-02-21 12:55:43 +0100 | [diff] [blame] | 163 | # Make ourself one-shot |
| 164 | sv o . |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 165 | at the very beginning of fw/run script, not at the end. |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 166 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 167 | Therefore, any "sv u /var/run/service/fw" command by any other |
| 168 | script "undoes" o(ne-shot) command if fw still runs, thus |
| 169 | runsv will rerun it; or start it in a normal way if fw is not running. |
| 170 | |
Denys Vlasenko | 93ff2b4 | 2016-10-14 18:38:08 +0200 | [diff] [blame^] | 171 | This mechanism is the reason why fw is a service, not just a script. |
| 172 | |
Denys Vlasenko | 75bb332 | 2010-12-06 15:13:58 +0100 | [diff] [blame] | 173 | System administrators are expected to edit fw/run script, since |
| 174 | network configuration needs are likely to be very complex and different |
| 175 | for non-trivial installations. |
Denys Vlasenko | ee2d194 | 2016-10-14 18:22:50 +0200 | [diff] [blame] | 176 | |
| 177 | var_service/ftpd |
| 178 | var_service/httpd |
| 179 | var_service/tftpd |
| 180 | var_service/ntpd |
| 181 | |
| 182 | Examples of typical network daemons. |