blob: f88fe07b05f2f551bf2179bc6a24655e424762d9 [file] [log] [blame]
Denis Vlasenko4efeaee2007-03-15 19:52:42 +00001 Keeping data small
Denis Vlasenko75605782007-03-14 00:07:51 +00002
3When many applets are compiled into busybox, all rw data and
4bss for each applet are concatenated. Including those from libc,
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +00005if static busybox is built. When busybox is started, _all_ this data
Denis Vlasenko75605782007-03-14 00:07:51 +00006is allocated, not just that one part for selected applet.
7
8What "allocated" exactly means, depends on arch.
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +00009On NOMMU it's probably bites the most, actually using real
Denis Vlasenko75605782007-03-14 00:07:51 +000010RAM for rwdata and bss. On i386, bss is lazily allocated
11by COWed zero pages. Not sure about rwdata - also COW?
12
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000013In order to keep busybox NOMMU and small-mem systems friendly
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000014we should avoid large global data in our applets, and should
15minimize usage of libc functions which implicitly use
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000016such structures.
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000017
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000018Small experiment to measure "parasitic" bbox memory consumption:
19here we start 1000 "busybox sleep 10" in parallel.
20busybox binary is practically allyesconfig static one,
21built against uclibc. Run on x86-64 machine with 64-bit kernel:
Denis Vlasenko75605782007-03-14 00:07:51 +000022
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000023bash-3.2# nmeter '%t %c %m %p %[pn]'
2423:17:28 .......... 168M 0 147
2523:17:29 .......... 168M 0 147
2623:17:30 U......... 168M 1 147
2723:17:31 SU........ 181M 244 391
2823:17:32 SSSSUUU... 223M 757 1147
2923:17:33 UUU....... 223M 0 1147
3023:17:34 U......... 223M 1 1147
3123:17:35 .......... 223M 0 1147
3223:17:36 .......... 223M 0 1147
3323:17:37 S......... 223M 0 1147
3423:17:38 .......... 223M 1 1147
3523:17:39 .......... 223M 0 1147
3623:17:40 .......... 223M 0 1147
3723:17:41 .......... 210M 0 906
3823:17:42 .......... 168M 1 147
3923:17:43 .......... 168M 0 147
Denis Vlasenko75605782007-03-14 00:07:51 +000040
41This requires 55M of memory. Thus 1 trivial busybox applet
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000042takes 55k of memory on 64-bit x86 kernel.
43
44On 32-bit kernel we need ~26k per applet.
45
46(Data from NOMMU arches are sought. Provide 'size busybox' output too)
Denis Vlasenko75605782007-03-14 00:07:51 +000047
Denis Vlasenko75605782007-03-14 00:07:51 +000048
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000049 Example 1
Denis Vlasenko75605782007-03-14 00:07:51 +000050
51One example how to reduce global data usage is in
52archival/libunarchive/decompress_unzip.c:
53
54/* This is somewhat complex-looking arrangement, but it allows
55 * to place decompressor state either in bss or in
56 * malloc'ed space simply by changing #defines below.
57 * Sizes on i386:
58 * text data bss dec hex
59 * 5256 0 108 5364 14f4 - bss
60 * 4915 0 0 4915 1333 - malloc
61 */
62#define STATE_IN_BSS 0
63#define STATE_IN_MALLOC 1
64
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000065(see the rest of the file to get the idea)
66
Denis Vlasenko75605782007-03-14 00:07:51 +000067This example completely eliminates globals in that module.
Denis Vlasenkoc14d39e2007-06-08 13:05:39 +000068Required memory is allocated in unpack_gz_stream() [its main module]
Denis Vlasenko972288e2007-03-15 00:57:01 +000069and then passed down to all subroutines which need to access 'globals'
Denis Vlasenko75605782007-03-14 00:07:51 +000070as a parameter.
71
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000072
73 Example 2
Denis Vlasenko75605782007-03-14 00:07:51 +000074
75In case you don't want to pass this additional parameter everywhere,
76take a look at archival/gzip.c. Here all global data is replaced by
Denis Vlasenko972288e2007-03-15 00:57:01 +000077single global pointer (ptr_to_globals) to allocated storage.
Denis Vlasenko75605782007-03-14 00:07:51 +000078
79In order to not duplicate ptr_to_globals in every applet, you can
80reuse single common one. It is defined in libbb/messages.c
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000081as struct globals *const ptr_to_globals, but the struct globals is
Denis Vlasenko972288e2007-03-15 00:57:01 +000082NOT defined in libbb.h. You first define your own struct:
Denis Vlasenko75605782007-03-14 00:07:51 +000083
Denis Vlasenko972288e2007-03-15 00:57:01 +000084struct globals { int a; char buf[1000]; };
Denis Vlasenko75605782007-03-14 00:07:51 +000085
86and then declare that ptr_to_globals is a pointer to it:
87
Denis Vlasenko75605782007-03-14 00:07:51 +000088#define G (*ptr_to_globals)
89
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000090ptr_to_globals is declared as constant pointer.
91This helps gcc understand that it won't change, resulting in noticeably
92smaller code. In order to assign it, use PTR_TO_GLOBALS macro:
Denis Vlasenko75605782007-03-14 00:07:51 +000093
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000094 PTR_TO_GLOBALS = xzalloc(sizeof(G));
Denis Vlasenko75605782007-03-14 00:07:51 +000095
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000096Typically it is done in <applet>_main().
Denis Vlasenko972288e2007-03-15 00:57:01 +000097
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000098Now you can reference "globals" by G.a, G.buf and so on, in any function.
99
100
101 bb_common_bufsiz1
102
103There is one big common buffer in bss - bb_common_bufsiz1. It is a much
104earlier mechanism to reduce bss usage. Each applet can use it for
105its needs. Library functions are prohibited from using it.
106
107'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
108
109#define G (*(struct globals*)&bb_common_bufsiz1)
110
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000111Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
112Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
113from one libc to another, you have to add compile-time check for it:
114
Denis Vlasenko17a15262007-03-26 20:48:46 +0000115if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000116 BUG_<applet>_globals_too_big();
Denis Vlasenko4efeaee2007-03-15 19:52:42 +0000117
118
119 Drawbacks
120
121You have to initialize it by hand. xzalloc() can be helpful in clearing
122allocated storage to 0, but anything more must be done by hand.
123
124All global variables are prefixed by 'G.' now. If this makes code
125less readable, use #defines:
126
127#define dev_fd (G.dev_fd)
128#define sector (G.sector)
129
130
131 Word of caution
132
Bernhard Reutner-Fischer486e7ca2007-03-16 11:14:38 +0000133If applet doesn't use much of global data, converting it to use
134one of above methods is not worth the resulting code obfuscation.
135If you have less than ~300 bytes of global data - don't bother.
Denis Vlasenko3d101dd2007-03-19 16:04:11 +0000136
137
138 gcc's data alignment problem
139
140The following attribute added in vi.c:
141
142static int tabstop;
143static struct termios term_orig __attribute__ ((aligned (4)));
144static struct termios term_vi __attribute__ ((aligned (4)));
145
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000146reduces bss size by 32 bytes, because gcc sometimes aligns structures to
Denis Vlasenko3d101dd2007-03-19 16:04:11 +0000147ridiculously large values. asm output diff for above example:
148
149 tabstop:
150 .zero 4
151 .section .bss.term_orig,"aw",@nobits
152- .align 32
153+ .align 4
154 .type term_orig, @object
155 .size term_orig, 60
156 term_orig:
157 .zero 60
158 .section .bss.term_vi,"aw",@nobits
159- .align 32
160+ .align 4
161 .type term_vi, @object
162 .size term_vi, 60
163
164gcc doesn't seem to have options for altering this behaviour.
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000165
Denis Vlasenkof3630652007-03-20 15:53:11 +0000166gcc 3.4.3 and 4.1.1 tested:
167char c = 1;
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000168// gcc aligns to 32 bytes if sizeof(struct) >= 32
Denis Vlasenkof3630652007-03-20 15:53:11 +0000169struct {
170 int a,b,c,d;
171 int i1,i2,i3;
172} s28 = { 1 }; // struct will be aligned to 4 bytes
173struct {
174 int a,b,c,d;
175 int i1,i2,i3,i4;
176} s32 = { 1 }; // struct will be aligned to 32 bytes
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000177// same for arrays
178char vc31[31] = { 1 }; // unaligned
179char vc32[32] = { 1 }; // aligned to 32 bytes
Denis Vlasenkof3630652007-03-20 15:53:11 +0000180
Denis Vlasenkob8e72fd2007-03-21 10:07:01 +0000181-fpack-struct=1 reduces alignment of s28 to 1 (but probably
182will break layout of many libc structs) but s32 and vc32
183are still aligned to 32 bytes.
184
185I will try to cook up a patch to add a gcc option for disabling it.
186Meanwhile, this is where it can be disabled in gcc source:
187
188gcc/config/i386/i386.c
189int
190ix86_data_alignment (tree type, int align)
191{
192#if 0
193 if (AGGREGATE_TYPE_P (type)
194 && TYPE_SIZE (type)
195 && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
196 && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
197 || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
198 return 256;
199#endif
200
201Result (non-static busybox built against glibc):
202
203# size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
204 text data bss dec hex filename
205 634416 2736 23856 661008 a1610 busybox
206 632580 2672 22944 658196 a0b14 busybox_noalign