docs/busybox.net/programming.html - codeaurora/busybox - Gitiles

 <!--#include file="header.html" -->

 <h2>Rob's notes on programming busybox.</h2>

 <ul>
   <li><a href="#goals">What are the goals of busybox?</a></li>
   <li><a href="#design">What is the design of busybox?</a></li>
   <li><a href="#source">How is the source code organized?</a></li>
   <ul>
     <li><a href="#source_applets">The applet directories.</a></li>
     <li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
   </ul>
   <li><a href="#adding">Adding an applet to busybox</a></li>
   <li><a href="#standards">What standards does busybox adhere to?</a></li>
   <li><a href="#tips">Tips and tricks.</a></li>
   <ul>
     <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
     <li><a href="#tips_vfork">Fork and vfork</a></li>
   </ul>
 </ul>

 <h2><b><a name="goals" />What are the goals of busybox?</b></h2>

 <p>Busybox aims to be the smallest and simplest correct implementation of the
 standard Linux command line tools.  First and foremost, this means the
 smallest executable size we can manage.  We also want to have the simplest
 and cleanest implementation we can manage, be <a href="#standards">standards
 compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
 take over the world.</p>

 <h2><b><a name="design" />What is the design of busybox?</b></h2>

 <p>Busybox is like a swiss army knife: one thing with many functions.
 The busybox executable can act like many different programs depending on
 the name used to invoke it.  Normal practice is to create a bunch of symlinks
 pointing to the busybox binary, each of which triggers a different busybox
 function.  (See <a href="FAQ.html#getting_started">getting started</a> in the
 FAQ for more information on usage, and <a href="BusyBox.html">the
 busybox documentation</a> for a list of symlink names and what they do.)

 <p>The "one binary to rule them all" approach is primarily for size reasons: a
 single multi-purpose executable is smaller then many small files could be.
 This way busybox only has one set of ELF headers, it can easily share code
 between different apps even when statically linked, it has better packing
 efficiency by avoding gaps between files or compression dictionary resets,
 and so on.</p>

 <p>Work is underway on new options such as "make standalone" to build separate
 binaries for each applet, and a "libbb.so" to make the busybox common code
 available as a shared library.  Neither is ready yet at the time of this
 writing.</p>

 <a name="source" />

 <h2><a name="source_applets" /><b>The applet directories</b></h2>

 <p>The directory "applets" contains the busybox startup code (applets.c and
 busybox.c), and several subdirectories containing the code for the individual
 applets.</p>

 <p>Busybox execution starts with the main() function in applets/busybox.c,
 which sets the global variable bb_applet_name to argv[0] and calls
 run_applet_by_name() in applets/applets.c.  That uses the applets[] array
 (defined in include/busybox.h and filled out in include/applets.h) to
 transfer control to the appropriate APPLET_main() function (such as
 cat_main() or sed_main()).  The individual applet takes it from there.</p>

 <p>This is why calling busybox under a different name triggers different
 functionality: main() looks up argv[0] in applets[] to get a function pointer
 to APPLET_main().</p>

 <p>Busybox applets may also be invoked through the multiplexor applet
 "busybox" (see busybox_main() in applets/busybox.c), and through the
 standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
 See <a href="FAQ.html#getting_started">getting started</a> in the
 FAQ for more information on these alternate usage mechanisms, which are
 just different ways to reach the relevant APPLET_main() function.</p>

 <p>The applet subdirectories (archival, console-tools, coreutils,
 debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
 modutils, networking, procps, shell, sysklogd, and util-linux) correspond
 to the configuration sub-menus in menuconfig.  Each subdirectory contains the
 code to implement the applets in that sub-menu, as well as a Config.in
 file defining that configuration sub-menu (with dependencies and help text
 for each applet), and the makefile segment (Makefile.in) for that
 subdirectory.</p>

 <p>The run-time --help is stored in usage_messages[], which is initialized at
 the start of applets/applets.c and gets its help text from usage.h.  During the
 build this help text is also used to generate the BusyBox documentation (in
 html, txt, and man page formats) in the docs directory.  See
 <a href="#adding">adding an applet to busybox</a> for more
 information.</p>

 <h2><a name="source_libbb" /><b>libbb</b></h2>

 <p>Most non-setup code shared between busybox applets lives in the libbb
 directory.  It's a mess that evolved over the years without much auditing
 or cleanup.  For anybody looking for a great project to break into busybox
 development with, documenting libbb would be both incredibly useful and good
 experience.</p>

 <p>Common themes in libbb include allocation functions that test
 for failure and abort the program with an error message so the caller doesn't
 have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
 of open(), close(), read(), and write() that test for their own failures
 and/or retry automatically, linked list management functions (llist.c),
 command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>

 <h2><a name="adding" /><b>Adding an applet to busybox</b></h2>

 <p>To add a new applet to busybox, first pick a name for the applet and
 a corresponding CONFIG_NAME.  Then do this:</p>

 <ul>
 <li>Figure out where in the busybox source tree your applet best fits,
 and put your source code there.  Be sure to use APPLET_main() instead
 of main(), where APPLET is the name of your applet.</li>

 <li>Add your applet to the relevant Config.in file (which file you add
 it to determines where it shows up in "make menuconfig").  This uses
 the same general format as the linux kernel's configuration system.</li>

 <li>Add your applet to the relevant Makefile.in file (in the same
 directory as the Config.in you chose), using the existing entries as a
 template and the same CONFIG symbol as you used for Config.in.  (Don't
 forget "needlibm" or "needcrypt" if your applet needs libm or
 libcrypt.)</li>

 <li>Add your applet to "include/applets.h", using one of the existing
 entries as a template.  (Note: this is in alphabetical order.  Applets
 are found via binary search, and if you add an applet out of order it
 won't work.)</li>

 <li>Add your applet's runtime help text to "include/usage.h".  You need
 at least appname_trivial_usage (the minimal help text, always included
 in the busybox binary when this applet is enabled) and appname_full_usage
 (extra help text included in the busybox binary with
 CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
 The other two help entry types (appname_example_usage and
 appname_notes_usage) are optional.  They don't take up space in the binary,
 but instead show up in the generated documentation (BusyBox.html,
 BusyBox.txt, and the man page BusyBox.1).</li>

 <li>Run menuconfig, switch your applet on, compile, test, and fix the
 bugs.  Be sure to try both "allyesconfig" and "allnoconfig" (and
 "allbareconfig" if relevant).</li>

 </ul>

 <h2><a name="standards" />What standards does busybox adhere to?</a></h2>

 <p>The standard we're paying attention to is the "Shell and Utilities"
 portion of the <a href=http://www.opengroup.org/onlinepubs/009695399/>Open
 Group Base Standards</a> (also known as the Single Unix Specification version
 3 or SUSv3).  Note that paying attention isn't necessarily the same thing as
 following it.</p>

 <p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
 commonly used options like echo's '-e' and '-n', or sed's '-i'.  Busybox is
 driven by what real users actually need, not the fact the standard believes
 we should implement ed or sccs.  For size reasons, we're unlikely to include
 much internationalization support beyond UTF-8, and on top of all that, our
 configuration menu lets developers chop out features to produce smaller but
 very non-standard utilities.</p>

 <p>Also, Busybox is aimed primarily at Linux.  Unix standards are interesting
 because Linux tries to adhere to them, but portability to dozens of platforms
 is only interesting in terms of offering a restricted feature set that works
 everywhere, not growing dozens of platform-specific extensions.  Busybox
 should be portable to all hardware platforms Linux supports, and any other
 similar operating systems that are easy to do and won't require much
 maintenance.</p>

 <p>In practice, standards compliance tends to be a clean-up step once an
 applet is otherwise finished.  When polishing and testing a busybox applet,
 we ensure we have at least the option of full standards compliance, or else
 document where we (intentionally) fall short.</p>

 <h2><a name="tips" />Programming tips and tricks.</a></h2>

 <p>Various things busybox uses that aren't particularly well documented
 elsewhere.</p>

 <h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>

 <p>Password fields in /etc/passwd and /etc/shadow are in a special format.
 If the first character isn't '$', then it's an old DES style password.  If
 the first character is '$' then the password is actually three fields
 separated by '$' characters:</p>
 <pre>
   <b>$type$salt$encrypted_password</b>
 </pre>

 <p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>

 <p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
 algorithm uses to perturb the password in a known and reproducible way (such
 as by appending the random data to the unencrypted password, or combining
 them with exclusive or).  Salt is randomly generated when setting a password,
 and then the same salt value is re-used when checking the password.  (Salt is
 thus stored unencrypted.)</p>

 <p>The advantage of using salt is that the same cleartext password encrypted
 with a different salt value produces a different encrypted value.
 If each encrypted password uses a different salt value, an attacker is forced
 to do the cryptographic math all over again for each password they want to
 check.  Without salt, they could simply produce a big dictionary of commonly
 used passwords ahead of time, and look up each password in a stolen password
 file to see if it's a known value.  (Even if there are billions of possible
 passwords in the dictionary, checking each one is just a binary search against
 a file only a few gigabytes long.)  With salt they can't even tell if two
 different users share the same password without guessing what that password
 is and decrypting it.  They also can't precompute the attack dictionary for
 a specific password until they know what the salt value is.</p>

 <p>The third field is the encrypted password (plus the salt).  For md5 this
 is 22 bytes.</p>

 <p>The busybox function to handle all this is pw_encrypt(clear, salt) in
 "libbb/pw_encrypt.c".  The first argument is the clear text password to be
 encrypted, and the second is a string in "$type$salt$password" format, from
 which the "type" and "salt" fields will be extracted to produce an encrypted
 value.  (Only the first two fields are needed, the third $ is equivalent to
 the end of the string.)  The return value is an encrypted password in
 /etc/passwd format, with all three $ separated fields.  It's stored in
 a static buffer, 128 bytes long.</p>

 <p>So when checking an existing password, if pw_encrypt(text,
 old_encrypted_password) returns a string that compares identical to
 old_encrypted_password, you've got the right password.  When setting a new
 password, generate a random 8 character salt string, put it in the right
 format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
 second argument to pw_encrypt(text,buffer).</p>

 <h2><a name="tips_vfork">Fork and vfork</a></h2>

 <p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
 expensive to implement, so a less capable function called vfork() is used
 instead.</p>

 <p>The reason vfork() exists is that if you haven't got an MMU then you can't
 simply set up a second set of page tables and share the physical memory via
 copy-on-write, which is what fork() normally does.  This means that actually
 forking has to copy all the parent's memory (which could easily be tens of
 megabytes).  And you have to do this even though that memory gets freed again
 as soon as the exec happens, so it's probably all a big waste of time.</p>

 <p>This is not only slow and a waste of space, it also causes totally
 unnecessary memory usage spikes based on how big the _parent_ process is (not
 the child), and these spikes are quite likely to trigger an out of memory
 condition on small systems (which is where nommu is common anyway).  So
 although you _can_ emulate a real fork on a nommu system, you really don't
 want to.</p>

 <p>In theory, vfork() is just a fork() that writeably shares the heap and stack
 rather than copying it (so what one process writes the other one sees).  In
 practice, vfork() has to suspend the parent process until the child does exec,
 at which point the parent wakes up and resumes by returning from the call to
 vfork().  All modern kernel/libc combinations implement vfork() to put the
 parent to sleep until the child does its exec.  There's just no other way to
 make it work: they're sharing the same stack, so if either one returns from its
 function it stomps on the callstack so that when the other process returns,
 hilarity ensues.  In fact without suspending the parent there's no way to even
 store separate copies of the return value (the pid) from the vfork() call
 itself: both assignments write into the same memory location.</p>

 <p>One way to understand (and in fact implement) vfork() is this: imagine
 the parent does a setjmp and then continues on (pretending to be the child)
 until the exec() comes around, then the _exec_ does the actual fork, and the
 parent does a longjmp back to the original vfork call and continues on from
 there.  (It thus becomes obvious why the child can't return, or modify
 local variables it doesn't want the parent to see changed when it resumes.)

 <p>Note a common mistake: the need for vfork doesn't mean you can't have two
 processes running at the same time.  It means you can't have two processes
 sharing the same memory without stomping all over each other.  As soon as
 the child calls exec(), the parent resumes.</p>

 <p>(Now in theory, a nommu system could just copy the _stack_ when it forks
 (which presumably is much shorter than the heap), and leave the heap shared.
 In practice, you've just wound up in a multi-threaded situation and you can't
 do a malloc() or free() on your heap without freeing the other process's memory
 (and if you don't have the proper locking for being threaded, corrupting the
 heap if both of you try to do it at the same time and wind up stomping on
 each other while traversing the free memory lists).  The thing about vfork is
 that it's a big red flag warning "there be dragons here" rather than
 something subtle and thus even more dangerous.)</p>

 <br>
 <br>
 <br>

 <!--#include file="footer.html" -->
	<!--#include file="header.html" -->

	<h2>Rob's notes on programming busybox.</h2>

	<ul>
	<li><a href="#goals">What are the goals of busybox?</a></li>
	<li><a href="#design">What is the design of busybox?</a></li>
	<li><a href="#source">How is the source code organized?</a></li>
	<ul>
	<li><a href="#source_applets">The applet directories.</a></li>
	<li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
	</ul>
	<li><a href="#adding">Adding an applet to busybox</a></li>
	<li><a href="#standards">What standards does busybox adhere to?</a></li>
	<li><a href="#tips">Tips and tricks.</a></li>
	<ul>
	<li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
	<li><a href="#tips_vfork">Fork and vfork</a></li>
	</ul>
	</ul>

	<h2><b><a name="goals" />What are the goals of busybox?</b></h2>

	<p>Busybox aims to be the smallest and simplest correct implementation of the
	standard Linux command line tools. First and foremost, this means the
	smallest executable size we can manage. We also want to have the simplest
	and cleanest implementation we can manage, be <a href="#standards">standards
	compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
	take over the world.</p>

	<h2><b><a name="design" />What is the design of busybox?</b></h2>

	<p>Busybox is like a swiss army knife: one thing with many functions.
	The busybox executable can act like many different programs depending on
	the name used to invoke it. Normal practice is to create a bunch of symlinks
	pointing to the busybox binary, each of which triggers a different busybox
	function. (See <a href="FAQ.html#getting_started">getting started</a> in the
	FAQ for more information on usage, and <a href="BusyBox.html">the
	busybox documentation</a> for a list of symlink names and what they do.)

	<p>The "one binary to rule them all" approach is primarily for size reasons: a
	single multi-purpose executable is smaller then many small files could be.
	This way busybox only has one set of ELF headers, it can easily share code
	between different apps even when statically linked, it has better packing
	efficiency by avoding gaps between files or compression dictionary resets,
	and so on.</p>

	<p>Work is underway on new options such as "make standalone" to build separate
	binaries for each applet, and a "libbb.so" to make the busybox common code
	available as a shared library. Neither is ready yet at the time of this
	writing.</p>

	<a name="source" />

	<h2><a name="source_applets" /><b>The applet directories</b></h2>

	<p>The directory "applets" contains the busybox startup code (applets.c and
	busybox.c), and several subdirectories containing the code for the individual
	applets.</p>

	<p>Busybox execution starts with the main() function in applets/busybox.c,
	which sets the global variable bb_applet_name to argv[0] and calls
	run_applet_by_name() in applets/applets.c. That uses the applets[] array
	(defined in include/busybox.h and filled out in include/applets.h) to
	transfer control to the appropriate APPLET_main() function (such as
	cat_main() or sed_main()). The individual applet takes it from there.</p>

	<p>This is why calling busybox under a different name triggers different
	functionality: main() looks up argv[0] in applets[] to get a function pointer
	to APPLET_main().</p>

	<p>Busybox applets may also be invoked through the multiplexor applet
	"busybox" (see busybox_main() in applets/busybox.c), and through the
	standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
	See <a href="FAQ.html#getting_started">getting started</a> in the
	FAQ for more information on these alternate usage mechanisms, which are
	just different ways to reach the relevant APPLET_main() function.</p>

	<p>The applet subdirectories (archival, console-tools, coreutils,
	debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
	modutils, networking, procps, shell, sysklogd, and util-linux) correspond
	to the configuration sub-menus in menuconfig. Each subdirectory contains the
	code to implement the applets in that sub-menu, as well as a Config.in
	file defining that configuration sub-menu (with dependencies and help text
	for each applet), and the makefile segment (Makefile.in) for that
	subdirectory.</p>

	<p>The run-time --help is stored in usage_messages[], which is initialized at
	the start of applets/applets.c and gets its help text from usage.h. During the
	build this help text is also used to generate the BusyBox documentation (in
	html, txt, and man page formats) in the docs directory. See
	<a href="#adding">adding an applet to busybox</a> for more
	information.</p>

	<h2><a name="source_libbb" /><b>libbb</b></h2>

	<p>Most non-setup code shared between busybox applets lives in the libbb
	directory. It's a mess that evolved over the years without much auditing
	or cleanup. For anybody looking for a great project to break into busybox
	development with, documenting libbb would be both incredibly useful and good
	experience.</p>

	<p>Common themes in libbb include allocation functions that test
	for failure and abort the program with an error message so the caller doesn't
	have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
	of open(), close(), read(), and write() that test for their own failures
	and/or retry automatically, linked list management functions (llist.c),
	command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>

	<h2><a name="adding" /><b>Adding an applet to busybox</b></h2>

	<p>To add a new applet to busybox, first pick a name for the applet and
	a corresponding CONFIG_NAME. Then do this:</p>

	<ul>
	<li>Figure out where in the busybox source tree your applet best fits,
	and put your source code there. Be sure to use APPLET_main() instead
	of main(), where APPLET is the name of your applet.</li>

	<li>Add your applet to the relevant Config.in file (which file you add
	it to determines where it shows up in "make menuconfig"). This uses
	the same general format as the linux kernel's configuration system.</li>

	<li>Add your applet to the relevant Makefile.in file (in the same
	directory as the Config.in you chose), using the existing entries as a
	template and the same CONFIG symbol as you used for Config.in. (Don't
	forget "needlibm" or "needcrypt" if your applet needs libm or
	libcrypt.)</li>

	<li>Add your applet to "include/applets.h", using one of the existing
	entries as a template. (Note: this is in alphabetical order. Applets
	are found via binary search, and if you add an applet out of order it
	won't work.)</li>

	<li>Add your applet's runtime help text to "include/usage.h". You need
	at least appname_trivial_usage (the minimal help text, always included
	in the busybox binary when this applet is enabled) and appname_full_usage
	(extra help text included in the busybox binary with
	CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
	The other two help entry types (appname_example_usage and
	appname_notes_usage) are optional. They don't take up space in the binary,
	but instead show up in the generated documentation (BusyBox.html,
	BusyBox.txt, and the man page BusyBox.1).</li>

	<li>Run menuconfig, switch your applet on, compile, test, and fix the
	bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and
	"allbareconfig" if relevant).</li>

	</ul>

	<h2><a name="standards" />What standards does busybox adhere to?</a></h2>

	<p>The standard we're paying attention to is the "Shell and Utilities"
	portion of the <a href=http://www.opengroup.org/onlinepubs/009695399/>Open
	Group Base Standards</a> (also known as the Single Unix Specification version
	3 or SUSv3). Note that paying attention isn't necessarily the same thing as
	following it.</p>

	<p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
	commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is
	driven by what real users actually need, not the fact the standard believes
	we should implement ed or sccs. For size reasons, we're unlikely to include
	much internationalization support beyond UTF-8, and on top of all that, our
	configuration menu lets developers chop out features to produce smaller but
	very non-standard utilities.</p>

	<p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting
	because Linux tries to adhere to them, but portability to dozens of platforms
	is only interesting in terms of offering a restricted feature set that works
	everywhere, not growing dozens of platform-specific extensions. Busybox
	should be portable to all hardware platforms Linux supports, and any other
	similar operating systems that are easy to do and won't require much
	maintenance.</p>

	<p>In practice, standards compliance tends to be a clean-up step once an
	applet is otherwise finished. When polishing and testing a busybox applet,
	we ensure we have at least the option of full standards compliance, or else
	document where we (intentionally) fall short.</p>

	<h2><a name="tips" />Programming tips and tricks.</a></h2>

	<p>Various things busybox uses that aren't particularly well documented
	elsewhere.</p>

	<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>

	<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
	If the first character isn't '$', then it's an old DES style password. If
	the first character is '$' then the password is actually three fields
	separated by '$' characters:</p>
	<pre>
	<b>$type$salt$encrypted_password</b>
	</pre>

	<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>

	<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
	algorithm uses to perturb the password in a known and reproducible way (such
	as by appending the random data to the unencrypted password, or combining
	them with exclusive or). Salt is randomly generated when setting a password,
	and then the same salt value is re-used when checking the password. (Salt is
	thus stored unencrypted.)</p>

	<p>The advantage of using salt is that the same cleartext password encrypted
	with a different salt value produces a different encrypted value.
	If each encrypted password uses a different salt value, an attacker is forced
	to do the cryptographic math all over again for each password they want to
	check. Without salt, they could simply produce a big dictionary of commonly
	used passwords ahead of time, and look up each password in a stolen password
	file to see if it's a known value. (Even if there are billions of possible
	passwords in the dictionary, checking each one is just a binary search against
	a file only a few gigabytes long.) With salt they can't even tell if two
	different users share the same password without guessing what that password
	is and decrypting it. They also can't precompute the attack dictionary for
	a specific password until they know what the salt value is.</p>

	<p>The third field is the encrypted password (plus the salt). For md5 this
	is 22 bytes.</p>

	<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
	"libbb/pw_encrypt.c". The first argument is the clear text password to be
	encrypted, and the second is a string in "$type$salt$password" format, from
	which the "type" and "salt" fields will be extracted to produce an encrypted
	value. (Only the first two fields are needed, the third $ is equivalent to
	the end of the string.) The return value is an encrypted password in
	/etc/passwd format, with all three $ separated fields. It's stored in
	a static buffer, 128 bytes long.</p>

	<p>So when checking an existing password, if pw_encrypt(text,
	old_encrypted_password) returns a string that compares identical to
	old_encrypted_password, you've got the right password. When setting a new
	password, generate a random 8 character salt string, put it in the right
	format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
	second argument to pw_encrypt(text,buffer).</p>

	<h2><a name="tips_vfork">Fork and vfork</a></h2>

	<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
	expensive to implement, so a less capable function called vfork() is used
	instead.</p>

	<p>The reason vfork() exists is that if you haven't got an MMU then you can't
	simply set up a second set of page tables and share the physical memory via
	copy-on-write, which is what fork() normally does. This means that actually
	forking has to copy all the parent's memory (which could easily be tens of
	megabytes). And you have to do this even though that memory gets freed again
	as soon as the exec happens, so it's probably all a big waste of time.</p>

	<p>This is not only slow and a waste of space, it also causes totally
	unnecessary memory usage spikes based on how big the _parent_ process is (not
	the child), and these spikes are quite likely to trigger an out of memory
	condition on small systems (which is where nommu is common anyway). So
	although you _can_ emulate a real fork on a nommu system, you really don't
	want to.</p>

	<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
	rather than copying it (so what one process writes the other one sees). In
	practice, vfork() has to suspend the parent process until the child does exec,
	at which point the parent wakes up and resumes by returning from the call to
	vfork(). All modern kernel/libc combinations implement vfork() to put the
	parent to sleep until the child does its exec. There's just no other way to
	make it work: they're sharing the same stack, so if either one returns from its
	function it stomps on the callstack so that when the other process returns,
	hilarity ensues. In fact without suspending the parent there's no way to even
	store separate copies of the return value (the pid) from the vfork() call
	itself: both assignments write into the same memory location.</p>

	<p>One way to understand (and in fact implement) vfork() is this: imagine
	the parent does a setjmp and then continues on (pretending to be the child)
	until the exec() comes around, then the _exec_ does the actual fork, and the
	parent does a longjmp back to the original vfork call and continues on from
	there. (It thus becomes obvious why the child can't return, or modify
	local variables it doesn't want the parent to see changed when it resumes.)

	<p>Note a common mistake: the need for vfork doesn't mean you can't have two
	processes running at the same time. It means you can't have two processes
	sharing the same memory without stomping all over each other. As soon as
	the child calls exec(), the parent resumes.</p>

	<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
	(which presumably is much shorter than the heap), and leave the heap shared.
	In practice, you've just wound up in a multi-threaded situation and you can't
	do a malloc() or free() on your heap without freeing the other process's memory
	(and if you don't have the proper locking for being threaded, corrupting the
	heap if both of you try to do it at the same time and wind up stomping on
	each other while traversing the free memory lists). The thing about vfork is
	that it's a big red flag warning "there be dragons here" rather than
	something subtle and thus even more dangerous.)</p>

	<br>
	<br>
	<br>

	<!--#include file="footer.html" -->