blob: 6dbe6931fb687d09b6e57848193706b4b6b8327b [file] [log] [blame]
Rob Landleyaaffef42006-01-22 01:44:29 +00001<!--#include file="header.html" -->
2
3<h2>Rob's notes on programming busybox.</h2>
4
5<ul>
6 <li><a href="#goals">What are the goals of busybox?</a></li>
7 <li><a href="#design">What is the design of busybox?</a></li>
8 <li><a href="#source">How is the source code organized?</a></li>
9 <ul>
10 <li><a href="#source_applets">The applet directories.</a></li>
11 <li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
12 </ul>
13 <li><a href="#adding">Adding an applet to busybox</a></li>
14 <li><a href="#standards">What standards does busybox adhere to?</a></li>
Rob Landleyb1b3cee2006-01-29 06:29:01 +000015 <li><a href="#tips">Tips and tricks.</a></li>
16 <ul>
17 <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
18 <li><a href="#tips_vfork">Fork and vfork</a></li>
19 </ul>
Rob Landleyaaffef42006-01-22 01:44:29 +000020</ul>
21
22<h2><b><a name="goals" />What are the goals of busybox?</b></h2>
23
24<p>Busybox aims to be the smallest and simplest correct implementation of the
25standard Linux command line tools. First and foremost, this means the
26smallest executable size we can manage. We also want to have the simplest
27and cleanest implementation we can manage, be <a href="#standards">standards
28compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
29take over the world.</p>
30
31<h2><b><a name="design" />What is the design of busybox?</b></h2>
32
33<p>Busybox is like a swiss army knife: one thing with many functions.
34The busybox executable can act like many different programs depending on
35the name used to invoke it. Normal practice is to create a bunch of symlinks
36pointing to the busybox binary, each of which triggers a different busybox
37function. (See <a href="FAQ.html#getting_started">getting started</a> in the
38FAQ for more information on usage, and <a href="BusyBox.html">the
39busybox documentation</a> for a list of symlink names and what they do.)
40
41<p>The "one binary to rule them all" approach is primarily for size reasons: a
42single multi-purpose executable is smaller then many small files could be.
43This way busybox only has one set of ELF headers, it can easily share code
44between different apps even when statically linked, it has better packing
45efficiency by avoding gaps between files or compression dictionary resets,
46and so on.</p>
47
48<p>Work is underway on new options such as "make standalone" to build separate
49binaries for each applet, and a "libbb.so" to make the busybox common code
50available as a shared library. Neither is ready yet at the time of this
51writing.</p>
52
53<a name="source" />
54
55<h2><a name="source_applets" /><b>The applet directories</b></h2>
56
57<p>The directory "applets" contains the busybox startup code (applets.c and
58busybox.c), and several subdirectories containing the code for the individual
59applets.</p>
60
61<p>Busybox execution starts with the main() function in applets/busybox.c,
62which sets the global variable bb_applet_name to argv[0] and calls
63run_applet_by_name() in applets/applets.c. That uses the applets[] array
64(defined in include/busybox.h and filled out in include/applets.h) to
65transfer control to the appropriate APPLET_main() function (such as
66cat_main() or sed_main()). The individual applet takes it from there.</p>
67
68<p>This is why calling busybox under a different name triggers different
69functionality: main() looks up argv[0] in applets[] to get a function pointer
70to APPLET_main().</p>
71
72<p>Busybox applets may also be invoked through the multiplexor applet
73"busybox" (see busybox_main() in applets/busybox.c), and through the
74standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
75See <a href="FAQ.html#getting_started">getting started</a> in the
76FAQ for more information on these alternate usage mechanisms, which are
77just different ways to reach the relevant APPLET_main() function.</p>
78
79<p>The applet subdirectories (archival, console-tools, coreutils,
80debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
81modutils, networking, procps, shell, sysklogd, and util-linux) correspond
82to the configuration sub-menus in menuconfig. Each subdirectory contains the
83code to implement the applets in that sub-menu, as well as a Config.in
84file defining that configuration sub-menu (with dependencies and help text
85for each applet), and the makefile segment (Makefile.in) for that
86subdirectory.</p>
87
88<p>The run-time --help is stored in usage_messages[], which is initialized at
89the start of applets/applets.c and gets its help text from usage.h. During the
90build this help text is also used to generate the BusyBox documentation (in
91html, txt, and man page formats) in the docs directory. See
92<a href="#adding">adding an applet to busybox</a> for more
93information.</p>
94
95<h2><a name="source_libbb" /><b>libbb</b></h2>
96
97<p>Most non-setup code shared between busybox applets lives in the libbb
98directory. It's a mess that evolved over the years without much auditing
99or cleanup. For anybody looking for a great project to break into busybox
100development with, documenting libbb would be both incredibly useful and good
101experience.</p>
102
103<p>Common themes in libbb include allocation functions that test
104for failure and abort the program with an error message so the caller doesn't
105have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
106of open(), close(), read(), and write() that test for their own failures
107and/or retry automatically, linked list management functions (llist.c),
108command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>
109
110<h2><a name="adding" /><b>Adding an applet to busybox</b></h2>
111
112<p>To add a new applet to busybox, first pick a name for the applet and
113a corresponding CONFIG_NAME. Then do this:</p>
114
115<ul>
116<li>Figure out where in the busybox source tree your applet best fits,
117and put your source code there. Be sure to use APPLET_main() instead
118of main(), where APPLET is the name of your applet.</li>
119
120<li>Add your applet to the relevant Config.in file (which file you add
121it to determines where it shows up in "make menuconfig"). This uses
122the same general format as the linux kernel's configuration system.</li>
123
124<li>Add your applet to the relevant Makefile.in file (in the same
125directory as the Config.in you chose), using the existing entries as a
126template and the same CONFIG symbol as you used for Config.in. (Don't
127forget "needlibm" or "needcrypt" if your applet needs libm or
128libcrypt.)</li>
129
130<li>Add your applet to "include/applets.h", using one of the existing
131entries as a template. (Note: this is in alphabetical order. Applets
132are found via binary search, and if you add an applet out of order it
133won't work.)</li>
134
135<li>Add your applet's runtime help text to "include/usage.h". You need
136at least appname_trivial_usage (the minimal help text, always included
137in the busybox binary when this applet is enabled) and appname_full_usage
138(extra help text included in the busybox binary with
139CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
140The other two help entry types (appname_example_usage and
141appname_notes_usage) are optional. They don't take up space in the binary,
142but instead show up in the generated documentation (BusyBox.html,
143BusyBox.txt, and the man page BusyBox.1).</li>
144
145<li>Run menuconfig, switch your applet on, compile, test, and fix the
146bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and
147"allbareconfig" if relevant).</li>
148
149</ul>
150
151<h2><a name="standards" />What standards does busybox adhere to?</a></h2>
152
153<p>The standard we're paying attention to is the "Shell and Utilities"
154portion of the <a href=http://www.opengroup.org/onlinepubs/009695399/>Open
155Group Base Standards</a> (also known as the Single Unix Specification version
1563 or SUSv3). Note that paying attention isn't necessarily the same thing as
157following it.</p>
158
159<p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
160commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is
161driven by what real users actually need, not the fact the standard believes
162we should implement ed or sccs. For size reasons, we're unlikely to include
163much internationalization support beyond UTF-8, and on top of all that, our
164configuration menu lets developers chop out features to produce smaller but
165very non-standard utilities.</p>
166
167<p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting
168because Linux tries to adhere to them, but portability to dozens of platforms
169is only interesting in terms of offering a restricted feature set that works
170everywhere, not growing dozens of platform-specific extensions. Busybox
171should be portable to all hardware platforms Linux supports, and any other
172similar operating systems that are easy to do and won't require much
173maintenance.</p>
174
175<p>In practice, standards compliance tends to be a clean-up step once an
176applet is otherwise finished. When polishing and testing a busybox applet,
177we ensure we have at least the option of full standards compliance, or else
178document where we (intentionally) fall short.</p>
179
Rob Landleyb1b3cee2006-01-29 06:29:01 +0000180<h2><a name="tips" />Programming tips and tricks.</a></h2>
181
182<p>Various things busybox uses that aren't particularly well documented
183elsewhere.</p>
184
185<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
186
187<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
188If the first character isn't '$', then it's an old DES style password. If
189the first character is '$' then the password is actually three fields
190separated by '$' characters:</p>
191<pre>
192 <b>$type$salt$encrypted_password</b>
193</pre>
194
195<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
196
197<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
198algorithm uses to perturb the password in a known and reproducible way (such
199as by appending the random data to the unencrypted password, or combining
200them with exclusive or). Salt is randomly generated when setting a password,
201and then the same salt value is re-used when checking the password. (Salt is
202thus stored unencrypted.)</p>
203
204<p>The advantage of using salt is that the same cleartext password encrypted
205with a different salt value produces a different encrypted value.
206If each encrypted password uses a different salt value, an attacker is forced
207to do the cryptographic math all over again for each password they want to
208check. Without salt, they could simply produce a big dictionary of commonly
209used passwords ahead of time, and look up each password in a stolen password
210file to see if it's a known value. (Even if there are billions of possible
211passwords in the dictionary, checking each one is just a binary search against
212a file only a few gigabytes long.) With salt they can't even tell if two
213different users share the same password without guessing what that password
214is and decrypting it. They also can't precompute the attack dictionary for
215a specific password until they know what the salt value is.</p>
216
217<p>The third field is the encrypted password (plus the salt). For md5 this
218is 22 bytes.</p>
219
220<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
221"libbb/pw_encrypt.c". The first argument is the clear text password to be
222encrypted, and the second is a string in "$type$salt$password" format, from
223which the "type" and "salt" fields will be extracted to produce an encrypted
224value. (Only the first two fields are needed, the third $ is equivalent to
225the end of the string.) The return value is an encrypted password in
226/etc/passwd format, with all three $ separated fields. It's stored in
227a static buffer, 128 bytes long.</p>
228
229<p>So when checking an existing password, if pw_encrypt(text,
230old_encrypted_password) returns a string that compares identical to
231old_encrypted_password, you've got the right password. When setting a new
232password, generate a random 8 character salt string, put it in the right
233format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
234second argument to pw_encrypt(text,buffer).</p>
235
236<h2><a name="tips_vfork">Fork and vfork</a></h2>
237
Rob Landley03628c82006-01-29 06:45:38 +0000238<p>Busybox hides the difference between fork() and vfork() in
239libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec()
240(which returns a pid and takes the same arguments as execve(), although in
241this case envp can be NULL) and don't worry about it. This description is
242here in case you want to know why that does what it does.</p>
243
Rob Landleyb1b3cee2006-01-29 06:29:01 +0000244<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
245expensive to implement, so a less capable function called vfork() is used
246instead.</p>
247
248<p>The reason vfork() exists is that if you haven't got an MMU then you can't
249simply set up a second set of page tables and share the physical memory via
250copy-on-write, which is what fork() normally does. This means that actually
251forking has to copy all the parent's memory (which could easily be tens of
252megabytes). And you have to do this even though that memory gets freed again
253as soon as the exec happens, so it's probably all a big waste of time.</p>
254
255<p>This is not only slow and a waste of space, it also causes totally
256unnecessary memory usage spikes based on how big the _parent_ process is (not
257the child), and these spikes are quite likely to trigger an out of memory
258condition on small systems (which is where nommu is common anyway). So
259although you _can_ emulate a real fork on a nommu system, you really don't
260want to.</p>
261
262<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
263rather than copying it (so what one process writes the other one sees). In
264practice, vfork() has to suspend the parent process until the child does exec,
265at which point the parent wakes up and resumes by returning from the call to
266vfork(). All modern kernel/libc combinations implement vfork() to put the
267parent to sleep until the child does its exec. There's just no other way to
268make it work: they're sharing the same stack, so if either one returns from its
269function it stomps on the callstack so that when the other process returns,
270hilarity ensues. In fact without suspending the parent there's no way to even
271store separate copies of the return value (the pid) from the vfork() call
272itself: both assignments write into the same memory location.</p>
273
274<p>One way to understand (and in fact implement) vfork() is this: imagine
275the parent does a setjmp and then continues on (pretending to be the child)
276until the exec() comes around, then the _exec_ does the actual fork, and the
277parent does a longjmp back to the original vfork call and continues on from
278there. (It thus becomes obvious why the child can't return, or modify
279local variables it doesn't want the parent to see changed when it resumes.)
280
281<p>Note a common mistake: the need for vfork doesn't mean you can't have two
282processes running at the same time. It means you can't have two processes
283sharing the same memory without stomping all over each other. As soon as
284the child calls exec(), the parent resumes.</p>
285
Rob Landley03628c82006-01-29 06:45:38 +0000286<p>If the child's attempt to call exec() fails, the child should call _exit()
287rather than a normal exit(). This avoids any atexit() code that might confuse
288the parent. (The parent should never call _exit(), only a vforked child that
289failed to exec.)</p>
290
Rob Landleyb1b3cee2006-01-29 06:29:01 +0000291<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
292(which presumably is much shorter than the heap), and leave the heap shared.
293In practice, you've just wound up in a multi-threaded situation and you can't
294do a malloc() or free() on your heap without freeing the other process's memory
295(and if you don't have the proper locking for being threaded, corrupting the
296heap if both of you try to do it at the same time and wind up stomping on
297each other while traversing the free memory lists). The thing about vfork is
298that it's a big red flag warning "there be dragons here" rather than
299something subtle and thus even more dangerous.)</p>
300
Rob Landleyaaffef42006-01-22 01:44:29 +0000301<br>
302<br>
303<br>
304
305<!--#include file="footer.html" -->