Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 1 | Keeping data small |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 2 | |
| 3 | When many applets are compiled into busybox, all rw data and |
| 4 | bss for each applet are concatenated. Including those from libc, |
| 5 | if static bbox is built. When bbox is started, _all_ this data |
| 6 | is allocated, not just that one part for selected applet. |
| 7 | |
| 8 | What "allocated" exactly means, depends on arch. |
| 9 | On nommu it's probably bites the most, actually using real |
| 10 | RAM for rwdata and bss. On i386, bss is lazily allocated |
| 11 | by COWed zero pages. Not sure about rwdata - also COW? |
| 12 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 13 | In order to keep bbox NOMMU and small-mem systems friendly |
| 14 | we should avoid large global data in our applets, and should |
| 15 | minimize usage of libc functions which implicitly use |
| 16 | such structures in libc. |
| 17 | |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 18 | Small experiment measures "parasitic" bbox memory consumption. |
| 19 | Here we start 1000 "busybox sleep 10" in parallel. |
| 20 | bbox binary is practically allyesconfig static one, |
| 21 | built against uclibc: |
| 22 | |
| 23 | bash-3.2# nmeter '%t %c %b %m %p %[pn]' |
Denis Vlasenko | 9616aff | 2007-03-14 11:50:34 +0000 | [diff] [blame] | 24 | 23:17:28 .......... 0 0 168M 0 147 |
| 25 | 23:17:29 .......... 0 0 168M 0 147 |
| 26 | 23:17:30 U......... 0 0 168M 1 147 |
| 27 | 23:17:31 SU........ 0 188k 181M 244 391 |
| 28 | 23:17:32 SSSSUUU... 0 0 223M 757 1147 |
| 29 | 23:17:33 UUU....... 0 0 223M 0 1147 |
| 30 | 23:17:34 U......... 0 0 223M 1 1147 |
| 31 | 23:17:35 .......... 0 0 223M 0 1147 |
| 32 | 23:17:36 .......... 0 0 223M 0 1147 |
| 33 | 23:17:37 S......... 0 0 223M 0 1147 |
| 34 | 23:17:38 .......... 0 0 223M 1 1147 |
| 35 | 23:17:39 .......... 0 0 223M 0 1147 |
| 36 | 23:17:40 .......... 0 0 223M 0 1147 |
| 37 | 23:17:41 .......... 0 0 210M 0 906 |
| 38 | 23:17:42 .......... 0 0 168M 1 147 |
| 39 | 23:17:43 .......... 0 0 168M 0 147 |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 40 | |
| 41 | This requires 55M of memory. Thus 1 trivial busybox applet |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 42 | takes 55k of memory. |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 43 | |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 44 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 45 | Example 1 |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 46 | |
| 47 | One example how to reduce global data usage is in |
| 48 | archival/libunarchive/decompress_unzip.c: |
| 49 | |
| 50 | /* This is somewhat complex-looking arrangement, but it allows |
| 51 | * to place decompressor state either in bss or in |
| 52 | * malloc'ed space simply by changing #defines below. |
| 53 | * Sizes on i386: |
| 54 | * text data bss dec hex |
| 55 | * 5256 0 108 5364 14f4 - bss |
| 56 | * 4915 0 0 4915 1333 - malloc |
| 57 | */ |
| 58 | #define STATE_IN_BSS 0 |
| 59 | #define STATE_IN_MALLOC 1 |
| 60 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 61 | (see the rest of the file to get the idea) |
| 62 | |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 63 | This example completely eliminates globals in that module. |
| 64 | Required memory is allocated in inflate_gunzip() [its main module] |
Denis Vlasenko | 972288e | 2007-03-15 00:57:01 +0000 | [diff] [blame] | 65 | and then passed down to all subroutines which need to access 'globals' |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 66 | as a parameter. |
| 67 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 68 | |
| 69 | Example 2 |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 70 | |
| 71 | In case you don't want to pass this additional parameter everywhere, |
| 72 | take a look at archival/gzip.c. Here all global data is replaced by |
Denis Vlasenko | 972288e | 2007-03-15 00:57:01 +0000 | [diff] [blame] | 73 | single global pointer (ptr_to_globals) to allocated storage. |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 74 | |
| 75 | In order to not duplicate ptr_to_globals in every applet, you can |
| 76 | reuse single common one. It is defined in libbb/messages.c |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 77 | as struct globals *const ptr_to_globals, but the struct globals is |
Denis Vlasenko | 972288e | 2007-03-15 00:57:01 +0000 | [diff] [blame] | 78 | NOT defined in libbb.h. You first define your own struct: |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 79 | |
Denis Vlasenko | 972288e | 2007-03-15 00:57:01 +0000 | [diff] [blame] | 80 | struct globals { int a; char buf[1000]; }; |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 81 | |
| 82 | and then declare that ptr_to_globals is a pointer to it: |
| 83 | |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 84 | #define G (*ptr_to_globals) |
| 85 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 86 | ptr_to_globals is declared as constant pointer. |
| 87 | This helps gcc understand that it won't change, resulting in noticeably |
| 88 | smaller code. In order to assign it, use PTR_TO_GLOBALS macro: |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 89 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 90 | PTR_TO_GLOBALS = xzalloc(sizeof(G)); |
Denis Vlasenko | 7560578 | 2007-03-14 00:07:51 +0000 | [diff] [blame] | 91 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 92 | Typically it is done in <applet>_main(). |
Denis Vlasenko | 972288e | 2007-03-15 00:57:01 +0000 | [diff] [blame] | 93 | |
Denis Vlasenko | 4efeaee | 2007-03-15 19:52:42 +0000 | [diff] [blame] | 94 | Now you can reference "globals" by G.a, G.buf and so on, in any function. |
| 95 | |
| 96 | |
| 97 | bb_common_bufsiz1 |
| 98 | |
| 99 | There is one big common buffer in bss - bb_common_bufsiz1. It is a much |
| 100 | earlier mechanism to reduce bss usage. Each applet can use it for |
| 101 | its needs. Library functions are prohibited from using it. |
| 102 | |
| 103 | 'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer: |
| 104 | |
| 105 | #define G (*(struct globals*)&bb_common_bufsiz1) |
| 106 | |
| 107 | Be careful, though, and use it only if |
| 108 | sizeof(struct globals) <= sizeof(bb_common_bufsiz1). |
| 109 | |
| 110 | |
| 111 | Drawbacks |
| 112 | |
| 113 | You have to initialize it by hand. xzalloc() can be helpful in clearing |
| 114 | allocated storage to 0, but anything more must be done by hand. |
| 115 | |
| 116 | All global variables are prefixed by 'G.' now. If this makes code |
| 117 | less readable, use #defines: |
| 118 | |
| 119 | #define dev_fd (G.dev_fd) |
| 120 | #define sector (G.sector) |
| 121 | |
| 122 | |
| 123 | Word of caution |
| 124 | |
Bernhard Reutner-Fischer | 486e7ca | 2007-03-16 11:14:38 +0000 | [diff] [blame] | 125 | If applet doesn't use much of global data, converting it to use |
| 126 | one of above methods is not worth the resulting code obfuscation. |
| 127 | If you have less than ~300 bytes of global data - don't bother. |
Denis Vlasenko | 3d101dd | 2007-03-19 16:04:11 +0000 | [diff] [blame] | 128 | |
| 129 | |
| 130 | gcc's data alignment problem |
| 131 | |
| 132 | The following attribute added in vi.c: |
| 133 | |
| 134 | static int tabstop; |
| 135 | static struct termios term_orig __attribute__ ((aligned (4))); |
| 136 | static struct termios term_vi __attribute__ ((aligned (4))); |
| 137 | |
| 138 | reduced bss size by 32 bytes, because gcc sometimes aligns structures to |
| 139 | ridiculously large values. asm output diff for above example: |
| 140 | |
| 141 | tabstop: |
| 142 | .zero 4 |
| 143 | .section .bss.term_orig,"aw",@nobits |
| 144 | - .align 32 |
| 145 | + .align 4 |
| 146 | .type term_orig, @object |
| 147 | .size term_orig, 60 |
| 148 | term_orig: |
| 149 | .zero 60 |
| 150 | .section .bss.term_vi,"aw",@nobits |
| 151 | - .align 32 |
| 152 | + .align 4 |
| 153 | .type term_vi, @object |
| 154 | .size term_vi, 60 |
| 155 | |
| 156 | gcc doesn't seem to have options for altering this behaviour. |