Blame - docs/busybox.net/programming.html - codeaurora/busybox

blob: 7afa53e27095aba4036d8f3999d7aa50ce7a665e [file] [log] [blame]

Rob Landley	aaffef4	2006-01-22 01:44:29 +0000	[diff] [blame]	1	<!--#include file="header.html" -->
				2
				3	<h2>Rob's notes on programming busybox.</h2>
				4
				5	<ul>
				6	<li><a href="#goals">What are the goals of busybox?</a></li>
				7	<li><a href="#design">What is the design of busybox?</a></li>
				8	<li><a href="#source">How is the source code organized?</a></li>
				9	<ul>
				10	<li><a href="#source_applets">The applet directories.</a></li>
				11	<li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
				12	</ul>
				13	<li><a href="#adding">Adding an applet to busybox</a></li>
				14	<li><a href="#standards">What standards does busybox adhere to?</a></li>
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	15	<li><a href="#tips">Tips and tricks.</a></li>
				16	<ul>
				17	<li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
				18	<li><a href="#tips_vfork">Fork and vfork</a></li>
Rob Landley	c29a0f3	2006-02-12 00:45:39 +0000	[diff] [blame]	19	<li><a href="#tips_short_read">Short reads and writes</a></li>
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	20	</ul>
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	21	<li><a href="#who">Who are the BusyBox developers?</a></li>
Rob Landley	aaffef4	2006-01-22 01:44:29 +0000	[diff] [blame]	22	</ul>
				23
				24	<h2><b><a name="goals" />What are the goals of busybox?</b></h2>
				25
				26	<p>Busybox aims to be the smallest and simplest correct implementation of the
				27	standard Linux command line tools. First and foremost, this means the
				28	smallest executable size we can manage. We also want to have the simplest
				29	and cleanest implementation we can manage, be <a href="#standards">standards
				30	compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
				31	take over the world.</p>
				32
				33	<h2><b><a name="design" />What is the design of busybox?</b></h2>
				34
				35	<p>Busybox is like a swiss army knife: one thing with many functions.
				36	The busybox executable can act like many different programs depending on
				37	the name used to invoke it. Normal practice is to create a bunch of symlinks
				38	pointing to the busybox binary, each of which triggers a different busybox
				39	function. (See <a href="FAQ.html#getting_started">getting started</a> in the
				40	FAQ for more information on usage, and <a href="BusyBox.html">the
				41	busybox documentation</a> for a list of symlink names and what they do.)
				42
				43	<p>The "one binary to rule them all" approach is primarily for size reasons: a
				44	single multi-purpose executable is smaller then many small files could be.
				45	This way busybox only has one set of ELF headers, it can easily share code
				46	between different apps even when statically linked, it has better packing
				47	efficiency by avoding gaps between files or compression dictionary resets,
				48	and so on.</p>
				49
				50	<p>Work is underway on new options such as "make standalone" to build separate
				51	binaries for each applet, and a "libbb.so" to make the busybox common code
				52	available as a shared library. Neither is ready yet at the time of this
				53	writing.</p>
				54
				55	<a name="source" />
				56
				57	<h2><a name="source_applets" /><b>The applet directories</b></h2>
				58
				59	<p>The directory "applets" contains the busybox startup code (applets.c and
				60	busybox.c), and several subdirectories containing the code for the individual
				61	applets.</p>
				62
				63	<p>Busybox execution starts with the main() function in applets/busybox.c,
				64	which sets the global variable bb_applet_name to argv[0] and calls
				65	run_applet_by_name() in applets/applets.c. That uses the applets[] array
				66	(defined in include/busybox.h and filled out in include/applets.h) to
				67	transfer control to the appropriate APPLET_main() function (such as
				68	cat_main() or sed_main()). The individual applet takes it from there.</p>
				69
				70	<p>This is why calling busybox under a different name triggers different
				71	functionality: main() looks up argv[0] in applets[] to get a function pointer
				72	to APPLET_main().</p>
				73
				74	<p>Busybox applets may also be invoked through the multiplexor applet
				75	"busybox" (see busybox_main() in applets/busybox.c), and through the
				76	standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
				77	See <a href="FAQ.html#getting_started">getting started</a> in the
				78	FAQ for more information on these alternate usage mechanisms, which are
				79	just different ways to reach the relevant APPLET_main() function.</p>
				80
				81	<p>The applet subdirectories (archival, console-tools, coreutils,
				82	debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
				83	modutils, networking, procps, shell, sysklogd, and util-linux) correspond
				84	to the configuration sub-menus in menuconfig. Each subdirectory contains the
				85	code to implement the applets in that sub-menu, as well as a Config.in
				86	file defining that configuration sub-menu (with dependencies and help text
				87	for each applet), and the makefile segment (Makefile.in) for that
				88	subdirectory.</p>
				89
				90	<p>The run-time --help is stored in usage_messages[], which is initialized at
				91	the start of applets/applets.c and gets its help text from usage.h. During the
				92	build this help text is also used to generate the BusyBox documentation (in
				93	html, txt, and man page formats) in the docs directory. See
				94	<a href="#adding">adding an applet to busybox</a> for more
				95	information.</p>
				96
				97	<h2><a name="source_libbb" /><b>libbb</b></h2>
				98
				99	<p>Most non-setup code shared between busybox applets lives in the libbb
				100	directory. It's a mess that evolved over the years without much auditing
				101	or cleanup. For anybody looking for a great project to break into busybox
				102	development with, documenting libbb would be both incredibly useful and good
				103	experience.</p>
				104
				105	<p>Common themes in libbb include allocation functions that test
				106	for failure and abort the program with an error message so the caller doesn't
				107	have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
				108	of open(), close(), read(), and write() that test for their own failures
				109	and/or retry automatically, linked list management functions (llist.c),
				110	command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>
				111
				112	<h2><a name="adding" /><b>Adding an applet to busybox</b></h2>
				113
				114	<p>To add a new applet to busybox, first pick a name for the applet and
				115	a corresponding CONFIG_NAME. Then do this:</p>
				116
				117	<ul>
				118	<li>Figure out where in the busybox source tree your applet best fits,
				119	and put your source code there. Be sure to use APPLET_main() instead
				120	of main(), where APPLET is the name of your applet.</li>
				121
				122	<li>Add your applet to the relevant Config.in file (which file you add
				123	it to determines where it shows up in "make menuconfig"). This uses
				124	the same general format as the linux kernel's configuration system.</li>
				125
				126	<li>Add your applet to the relevant Makefile.in file (in the same
				127	directory as the Config.in you chose), using the existing entries as a
				128	template and the same CONFIG symbol as you used for Config.in. (Don't
				129	forget "needlibm" or "needcrypt" if your applet needs libm or
				130	libcrypt.)</li>
				131
				132	<li>Add your applet to "include/applets.h", using one of the existing
				133	entries as a template. (Note: this is in alphabetical order. Applets
				134	are found via binary search, and if you add an applet out of order it
				135	won't work.)</li>
				136
				137	<li>Add your applet's runtime help text to "include/usage.h". You need
				138	at least appname_trivial_usage (the minimal help text, always included
				139	in the busybox binary when this applet is enabled) and appname_full_usage
				140	(extra help text included in the busybox binary with
				141	CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
				142	The other two help entry types (appname_example_usage and
				143	appname_notes_usage) are optional. They don't take up space in the binary,
				144	but instead show up in the generated documentation (BusyBox.html,
				145	BusyBox.txt, and the man page BusyBox.1).</li>
				146
				147	<li>Run menuconfig, switch your applet on, compile, test, and fix the
				148	bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and
				149	"allbareconfig" if relevant).</li>
				150
				151	</ul>
				152
				153	<h2><a name="standards" />What standards does busybox adhere to?</a></h2>
				154
				155	<p>The standard we're paying attention to is the "Shell and Utilities"
				156	portion of the <a href=http://www.opengroup.org/onlinepubs/009695399/>Open
				157	Group Base Standards</a> (also known as the Single Unix Specification version
				158	3 or SUSv3). Note that paying attention isn't necessarily the same thing as
				159	following it.</p>
				160
				161	<p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
				162	commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is
				163	driven by what real users actually need, not the fact the standard believes
				164	we should implement ed or sccs. For size reasons, we're unlikely to include
				165	much internationalization support beyond UTF-8, and on top of all that, our
				166	configuration menu lets developers chop out features to produce smaller but
				167	very non-standard utilities.</p>
				168
				169	<p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting
				170	because Linux tries to adhere to them, but portability to dozens of platforms
				171	is only interesting in terms of offering a restricted feature set that works
				172	everywhere, not growing dozens of platform-specific extensions. Busybox
				173	should be portable to all hardware platforms Linux supports, and any other
				174	similar operating systems that are easy to do and won't require much
				175	maintenance.</p>
				176
				177	<p>In practice, standards compliance tends to be a clean-up step once an
				178	applet is otherwise finished. When polishing and testing a busybox applet,
				179	we ensure we have at least the option of full standards compliance, or else
				180	document where we (intentionally) fall short.</p>
				181
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	182	<h2><a name="tips" />Programming tips and tricks.</a></h2>
				183
				184	<p>Various things busybox uses that aren't particularly well documented
				185	elsewhere.</p>
				186
				187	<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
				188
				189	<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
				190	If the first character isn't '$', then it's an old DES style password. If
				191	the first character is '$' then the password is actually three fields
				192	separated by '$' characters:</p>
				193	<pre>
				194	<b>$type$salt$encrypted_password</b>
				195	</pre>
				196
				197	<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
				198
				199	<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
				200	algorithm uses to perturb the password in a known and reproducible way (such
				201	as by appending the random data to the unencrypted password, or combining
				202	them with exclusive or). Salt is randomly generated when setting a password,
				203	and then the same salt value is re-used when checking the password. (Salt is
				204	thus stored unencrypted.)</p>
				205
				206	<p>The advantage of using salt is that the same cleartext password encrypted
				207	with a different salt value produces a different encrypted value.
				208	If each encrypted password uses a different salt value, an attacker is forced
				209	to do the cryptographic math all over again for each password they want to
				210	check. Without salt, they could simply produce a big dictionary of commonly
				211	used passwords ahead of time, and look up each password in a stolen password
				212	file to see if it's a known value. (Even if there are billions of possible
				213	passwords in the dictionary, checking each one is just a binary search against
				214	a file only a few gigabytes long.) With salt they can't even tell if two
				215	different users share the same password without guessing what that password
				216	is and decrypting it. They also can't precompute the attack dictionary for
				217	a specific password until they know what the salt value is.</p>
				218
				219	<p>The third field is the encrypted password (plus the salt). For md5 this
				220	is 22 bytes.</p>
				221
				222	<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
				223	"libbb/pw_encrypt.c". The first argument is the clear text password to be
				224	encrypted, and the second is a string in "$type$salt$password" format, from
				225	which the "type" and "salt" fields will be extracted to produce an encrypted
				226	value. (Only the first two fields are needed, the third $ is equivalent to
				227	the end of the string.) The return value is an encrypted password in
				228	/etc/passwd format, with all three $ separated fields. It's stored in
				229	a static buffer, 128 bytes long.</p>
				230
				231	<p>So when checking an existing password, if pw_encrypt(text,
				232	old_encrypted_password) returns a string that compares identical to
				233	old_encrypted_password, you've got the right password. When setting a new
				234	password, generate a random 8 character salt string, put it in the right
				235	format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
				236	second argument to pw_encrypt(text,buffer).</p>
				237
				238	<h2><a name="tips_vfork">Fork and vfork</a></h2>
				239
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	240	<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
				241	expensive to implement (and sometimes even impossible), so a less capable
				242	function called vfork() is used instead. (Using vfork() on a system with an
				243	MMU is like pounding a nail with a wrench. Not the best tool for the job, but
				244	it works.)</p>
				245
Rob Landley	03628c8	2006-01-29 06:45:38 +0000	[diff] [blame]	246	<p>Busybox hides the difference between fork() and vfork() in
				247	libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec()
				248	(which returns a pid and takes the same arguments as execve(), although in
				249	this case envp can be NULL) and don't worry about it. This description is
				250	here in case you want to know why that does what it does.</p>
				251
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	252	<p>Implementing fork() depends on having a Memory Management Unit. With an
				253	MMU then you can simply set up a second set of page tables and share the
				254	physical memory via copy-on-write. So a fork() followed quickly by exec()
				255	only copies a few pages of the parent's memory, just the ones it changes
				256	before freeing them.</p>
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	257
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	258	<p>With a very primitive MMU (using a base pointer plus length instead of page
				259	tables, which can provide virtual addresses and protect processes from each
				260	other, but no copy on write) you can still implement fork. But it's
				261	unreasonably expensive, because you have to copy all the parent process's
				262	memory into the new process (which could easily be several megabytes per fork).
				263	And you have to do this even though that memory gets freed again as soon as the
				264	exec happens. (This is not just slow and a waste of space but causes memory
				265	usage spikes that can easily cause the system to run out of memory.)</p>
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	266
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	267	<p>Without even a primitive MMU, you have no virtual addresses. Every process
				268	can reach out and touch any other process's memory, because all pointers are to
				269	physical addresses with no protection. Even if you copy a process's memory to
				270	new physical addresses, all of its pointers point to the old objects in the
				271	old process. (Searching through the new copy's memory for pointers and
				272	redirect them to the new locations is not an easy problem.)</p>
				273
				274	<p>So with a primitive or missing MMU, fork() is just not a good idea.</p>
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	275
				276	<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
				277	rather than copying it (so what one process writes the other one sees). In
				278	practice, vfork() has to suspend the parent process until the child does exec,
				279	at which point the parent wakes up and resumes by returning from the call to
				280	vfork(). All modern kernel/libc combinations implement vfork() to put the
				281	parent to sleep until the child does its exec. There's just no other way to
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	282	make it work: the parent has to know the child has done its exec() or exit()
				283	before it's safe to return from the function it's in, so it has to block
				284	until that happens. In fact without suspending the parent there's no way to
				285	even store separate copies of the return value (the pid) from the vfork() call
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	286	itself: both assignments write into the same memory location.</p>
				287
				288	<p>One way to understand (and in fact implement) vfork() is this: imagine
				289	the parent does a setjmp and then continues on (pretending to be the child)
				290	until the exec() comes around, then the _exec_ does the actual fork, and the
				291	parent does a longjmp back to the original vfork call and continues on from
				292	there. (It thus becomes obvious why the child can't return, or modify
				293	local variables it doesn't want the parent to see changed when it resumes.)
				294
				295	<p>Note a common mistake: the need for vfork doesn't mean you can't have two
				296	processes running at the same time. It means you can't have two processes
				297	sharing the same memory without stomping all over each other. As soon as
				298	the child calls exec(), the parent resumes.</p>
				299
Rob Landley	03628c8	2006-01-29 06:45:38 +0000	[diff] [blame]	300	<p>If the child's attempt to call exec() fails, the child should call _exit()
				301	rather than a normal exit(). This avoids any atexit() code that might confuse
				302	the parent. (The parent should never call _exit(), only a vforked child that
				303	failed to exec.)</p>
				304
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	305	<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
				306	(which presumably is much shorter than the heap), and leave the heap shared.
Rob Landley	b218377	2006-02-23 19:59:34 +0000	[diff] [blame]	307	Even with no MMU at all
Rob Landley	b1b3cee	2006-01-29 06:29:01 +0000	[diff] [blame]	308	In practice, you've just wound up in a multi-threaded situation and you can't
				309	do a malloc() or free() on your heap without freeing the other process's memory
				310	(and if you don't have the proper locking for being threaded, corrupting the
				311	heap if both of you try to do it at the same time and wind up stomping on
				312	each other while traversing the free memory lists). The thing about vfork is
				313	that it's a big red flag warning "there be dragons here" rather than
				314	something subtle and thus even more dangerous.)</p>
				315
Rob Landley	c29a0f3	2006-02-12 00:45:39 +0000	[diff] [blame]	316	<h2><a name="tips_sort_read">Short reads and writes</a></h2>
				317
				318	<p>Busybox has special functions, bb_full_read() and bb_full_write(), to
				319	check that all the data we asked for got read or written. Is this a real
				320	world consideration? Try the following:</p>
				321
				322	<pre>while true; do echo hello; sleep 1; done \| tee out.txt</pre>
				323
				324	<p>If tee is implemented with bb_full_read(), tee doesn't display output
				325	in real time but blocks until its entire input buffer (generally a couple
				326	kilobytes) is read, then displays it all at once. In that case, we _want_
				327	the short read, for user interface reasons. (Note that read() should never
				328	return 0 unless it has hit the end of input, and an attempt to write 0
				329	bytes should be ignored by the OS.)</p>
				330
				331	<p>As for short writes, play around with two processes piping data to each
				332	other on the command line (cat bigfile \| gzip > out.gz) and suspend and
				333	resume a few times (ctrl-z to suspend, "fg" to resume). The writer can
				334	experience short writes, which are especially dangerous because if you don't
				335	notice them you'll discard data. They can also happen when a system is under
				336	load and a fast process is piping to a slower one. (Such as an xterm waiting
				337	on x11 when the scheduler decides X is being a CPU hog with all that
				338	text console scrolling...)</p>
				339
				340	<p>So will data always be read from the far end of a pipe at the
				341	same chunk sizes it was written in? Nope. Don't rely on that. For one
				342	counterexample, see <a href="http://www.faqs.org/rfcs/rfc896.html">rfc 896</p>
				343	for Nagle's algorithm</a>, which waits a fraction of a second or so before
				344	sending out small amounts of data through a TCP/IP connection in case more
				345	data comes in that can be merged into the same packet. (In case you were
				346	wondering why action games that use TCP/IP set TCP_NODELAY to lower the latency
				347	on their their sockets, now you know.)</p>
				348
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	349	<h2><a name="who">Who are the BusyBox developers?</a></h2>
				350
				351	<p>The following login accounts currently exist on busybox.net. (I.E. these
				352	people can commit <a href="http://busybox.net/downloads/patches">patches</a>
Rob Landley	27cd85b	2006-02-17 02:38:00 +0000	[diff] [blame]	353	into subversion for the BusyBox, uClibc, and buildroot projects.)</p>
				354
				355	<pre>
				356	aldot :Bernhard Fischer
				357	andersen :Erik Andersen <- uClibc and BuildRoot maintainer.
				358	bug1 :Glenn McGrath
				359	davidm :David McCullough
				360	gkajmowi :Garrett Kajmowicz <- uClibc++ maintainer
				361	jbglaw :Jan-Benedict Glaw
				362	jocke :Joakim Tjernlund
				363	landley :Rob Landley <- BusyBox maintainer
				364	lethal :Paul Mundt
				365	mjn3 :Manuel Novoa III
				366	osuadmin :osuadmin
				367	pgf :Paul Fox
				368	pkj :Peter Kjellerstedt
				369	prpplague :David Anders
				370	psm :Peter S. Mazinger
				371	russ :Russ Dill
				372	sandman :Robert Griebl
				373	sjhill :Steven J. Hill
				374	solar :Ned Ludd
				375	timr :Tim Riker
				376	tobiasa :Tobias Anderberg
				377	vapier :Mike Frysinger
				378	vodz :Vladimir N. Oleynik
				379	</pre>
				380
				381	<p>The following accounts used to exist on busybox.net, but don't anymore so
				382	I can't ask /etc/passwd for their names. (If anybody would like to make
				383	a stab at it...)</p>
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	384
				385	<pre>
				386	aaronl
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	387	beppu
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	388	dwhedon
				389	erik : Also Erik Andersen?
				390	gfeldman
				391	jimg
				392	kraai
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	393	markw
				394	miles
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	395	proski
				396	rjune
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	397	tausq
Rob Landley	d1e38c0	2006-02-16 03:21:44 +0000	[diff] [blame]	398	</pre>
				399
				400
Rob Landley	aaffef4	2006-01-22 01:44:29 +0000	[diff] [blame]	401	<br>
				402	<br>
				403	<br>
				404
				405	<!--#include file="footer.html" -->