blob: 3ced1a61da0dbdbab1f68abec7fad6d982adb1a2 [file] [log] [blame]
Denis Vlasenko4efeaee2007-03-15 19:52:42 +00001 Keeping data small
Denis Vlasenko75605782007-03-14 00:07:51 +00002
3When many applets are compiled into busybox, all rw data and
4bss for each applet are concatenated. Including those from libc,
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +00005if static busybox is built. When busybox is started, _all_ this data
Denis Vlasenko75605782007-03-14 00:07:51 +00006is allocated, not just that one part for selected applet.
7
8What "allocated" exactly means, depends on arch.
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +00009On NOMMU it's probably bites the most, actually using real
Denis Vlasenko75605782007-03-14 00:07:51 +000010RAM for rwdata and bss. On i386, bss is lazily allocated
11by COWed zero pages. Not sure about rwdata - also COW?
12
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000013In order to keep busybox NOMMU and small-mem systems friendly
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000014we should avoid large global data in our applets, and should
15minimize usage of libc functions which implicitly use
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000016such structures.
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000017
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000018Small experiment to measure "parasitic" bbox memory consumption:
19here we start 1000 "busybox sleep 10" in parallel.
20busybox binary is practically allyesconfig static one,
21built against uclibc. Run on x86-64 machine with 64-bit kernel:
Denis Vlasenko75605782007-03-14 00:07:51 +000022
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000023bash-3.2# nmeter '%t %c %m %p %[pn]'
2423:17:28 .......... 168M 0 147
2523:17:29 .......... 168M 0 147
2623:17:30 U......... 168M 1 147
2723:17:31 SU........ 181M 244 391
2823:17:32 SSSSUUU... 223M 757 1147
2923:17:33 UUU....... 223M 0 1147
3023:17:34 U......... 223M 1 1147
3123:17:35 .......... 223M 0 1147
3223:17:36 .......... 223M 0 1147
3323:17:37 S......... 223M 0 1147
3423:17:38 .......... 223M 1 1147
3523:17:39 .......... 223M 0 1147
3623:17:40 .......... 223M 0 1147
3723:17:41 .......... 210M 0 906
3823:17:42 .......... 168M 1 147
3923:17:43 .......... 168M 0 147
Denis Vlasenko75605782007-03-14 00:07:51 +000040
41This requires 55M of memory. Thus 1 trivial busybox applet
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000042takes 55k of memory on 64-bit x86 kernel.
43
44On 32-bit kernel we need ~26k per applet.
45
Denis Vlasenko5a654472007-06-10 17:11:59 +000046Script:
47
48i=1000; while test $i != 0; do
49 echo -n .
50 busybox sleep 30 &
51 i=$((i - 1))
52done
53echo
54wait
55
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +000056(Data from NOMMU arches are sought. Provide 'size busybox' output too)
Denis Vlasenko75605782007-03-14 00:07:51 +000057
Denis Vlasenko75605782007-03-14 00:07:51 +000058
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000059 Example 1
Denis Vlasenko75605782007-03-14 00:07:51 +000060
61One example how to reduce global data usage is in
Denys Vlasenko8a6a2f92012-03-06 16:27:48 +010062archival/libarchive/decompress_unzip.c:
Denis Vlasenko75605782007-03-14 00:07:51 +000063
64/* This is somewhat complex-looking arrangement, but it allows
65 * to place decompressor state either in bss or in
66 * malloc'ed space simply by changing #defines below.
67 * Sizes on i386:
68 * text data bss dec hex
69 * 5256 0 108 5364 14f4 - bss
70 * 4915 0 0 4915 1333 - malloc
71 */
72#define STATE_IN_BSS 0
73#define STATE_IN_MALLOC 1
74
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000075(see the rest of the file to get the idea)
76
Denis Vlasenko75605782007-03-14 00:07:51 +000077This example completely eliminates globals in that module.
Denis Vlasenkoc14d39e2007-06-08 13:05:39 +000078Required memory is allocated in unpack_gz_stream() [its main module]
Denis Vlasenko972288e2007-03-15 00:57:01 +000079and then passed down to all subroutines which need to access 'globals'
Denis Vlasenko75605782007-03-14 00:07:51 +000080as a parameter.
81
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000082
83 Example 2
Denis Vlasenko75605782007-03-14 00:07:51 +000084
85In case you don't want to pass this additional parameter everywhere,
86take a look at archival/gzip.c. Here all global data is replaced by
Denis Vlasenko972288e2007-03-15 00:57:01 +000087single global pointer (ptr_to_globals) to allocated storage.
Denis Vlasenko75605782007-03-14 00:07:51 +000088
89In order to not duplicate ptr_to_globals in every applet, you can
90reuse single common one. It is defined in libbb/messages.c
Denis Vlasenko4efeaee2007-03-15 19:52:42 +000091as struct globals *const ptr_to_globals, but the struct globals is
Denis Vlasenko972288e2007-03-15 00:57:01 +000092NOT defined in libbb.h. You first define your own struct:
Denis Vlasenko75605782007-03-14 00:07:51 +000093
Denis Vlasenko972288e2007-03-15 00:57:01 +000094struct globals { int a; char buf[1000]; };
Denis Vlasenko75605782007-03-14 00:07:51 +000095
96and then declare that ptr_to_globals is a pointer to it:
97
Denis Vlasenko75605782007-03-14 00:07:51 +000098#define G (*ptr_to_globals)
99
Denis Vlasenko4efeaee2007-03-15 19:52:42 +0000100ptr_to_globals is declared as constant pointer.
101This helps gcc understand that it won't change, resulting in noticeably
Denis Vlasenko574f2f42008-02-27 18:41:59 +0000102smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro:
Denis Vlasenko75605782007-03-14 00:07:51 +0000103
Denis Vlasenko574f2f42008-02-27 18:41:59 +0000104 SET_PTR_TO_GLOBALS(xzalloc(sizeof(G)));
Denis Vlasenko75605782007-03-14 00:07:51 +0000105
Denys Vlasenkod74f8432014-01-13 11:45:34 +0100106Typically it is done in <applet>_main(). Another variation is
107to use stack:
108
109int <applet>_main(...)
110{
111#undef G
112 struct globals G;
113 memset(&G, 0, sizeof(G));
114 SET_PTR_TO_GLOBALS(&G);
Denis Vlasenko972288e2007-03-15 00:57:01 +0000115
Denis Vlasenko4efeaee2007-03-15 19:52:42 +0000116Now you can reference "globals" by G.a, G.buf and so on, in any function.
117
118
119 bb_common_bufsiz1
120
121There is one big common buffer in bss - bb_common_bufsiz1. It is a much
122earlier mechanism to reduce bss usage. Each applet can use it for
123its needs. Library functions are prohibited from using it.
124
125'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
126
127#define G (*(struct globals*)&bb_common_bufsiz1)
128
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000129Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
130Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
131from one libc to another, you have to add compile-time check for it:
132
Denis Vlasenko17a15262007-03-26 20:48:46 +0000133if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000134 BUG_<applet>_globals_too_big();
Denis Vlasenko4efeaee2007-03-15 19:52:42 +0000135
136
137 Drawbacks
138
139You have to initialize it by hand. xzalloc() can be helpful in clearing
140allocated storage to 0, but anything more must be done by hand.
141
142All global variables are prefixed by 'G.' now. If this makes code
143less readable, use #defines:
144
145#define dev_fd (G.dev_fd)
146#define sector (G.sector)
147
148
Denys Vlasenkoabb154b2010-06-02 13:28:17 +0200149 Finding non-shared duplicated strings
150
151strings busybox | sort | uniq -c | sort -nr
152
153
Denis Vlasenko3d101dd2007-03-19 16:04:11 +0000154 gcc's data alignment problem
155
156The following attribute added in vi.c:
157
158static int tabstop;
159static struct termios term_orig __attribute__ ((aligned (4)));
160static struct termios term_vi __attribute__ ((aligned (4)));
161
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000162reduces bss size by 32 bytes, because gcc sometimes aligns structures to
Denis Vlasenko3d101dd2007-03-19 16:04:11 +0000163ridiculously large values. asm output diff for above example:
164
165 tabstop:
166 .zero 4
167 .section .bss.term_orig,"aw",@nobits
168- .align 32
169+ .align 4
170 .type term_orig, @object
171 .size term_orig, 60
172 term_orig:
173 .zero 60
174 .section .bss.term_vi,"aw",@nobits
175- .align 32
176+ .align 4
177 .type term_vi, @object
178 .size term_vi, 60
179
180gcc doesn't seem to have options for altering this behaviour.
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000181
Denis Vlasenkof3630652007-03-20 15:53:11 +0000182gcc 3.4.3 and 4.1.1 tested:
183char c = 1;
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000184// gcc aligns to 32 bytes if sizeof(struct) >= 32
Denis Vlasenkof3630652007-03-20 15:53:11 +0000185struct {
186 int a,b,c,d;
187 int i1,i2,i3;
188} s28 = { 1 }; // struct will be aligned to 4 bytes
189struct {
190 int a,b,c,d;
191 int i1,i2,i3,i4;
192} s32 = { 1 }; // struct will be aligned to 32 bytes
Denis Vlasenkoe84aeb52007-03-20 11:08:39 +0000193// same for arrays
194char vc31[31] = { 1 }; // unaligned
195char vc32[32] = { 1 }; // aligned to 32 bytes
Denis Vlasenkof3630652007-03-20 15:53:11 +0000196
Denis Vlasenkob8e72fd2007-03-21 10:07:01 +0000197-fpack-struct=1 reduces alignment of s28 to 1 (but probably
198will break layout of many libc structs) but s32 and vc32
199are still aligned to 32 bytes.
200
201I will try to cook up a patch to add a gcc option for disabling it.
202Meanwhile, this is where it can be disabled in gcc source:
203
204gcc/config/i386/i386.c
205int
206ix86_data_alignment (tree type, int align)
207{
208#if 0
209 if (AGGREGATE_TYPE_P (type)
210 && TYPE_SIZE (type)
211 && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
212 && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
213 || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
214 return 256;
215#endif
216
217Result (non-static busybox built against glibc):
218
219# size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
220 text data bss dec hex filename
221 634416 2736 23856 661008 a1610 busybox
222 632580 2672 22944 658196 a0b14 busybox_noalign
Denys Vlasenkoa7bb3c12009-10-08 12:28:08 +0200223
224
225
226 Keeping code small
227
Bartosz Golaszewski28a20942013-10-16 19:18:05 +0200228Use scripts/bloat-o-meter to check whether introduced changes
229didn't generate unnecessary bloat. This script needs unstripped binaries
230to generate a detailed report. To automate this, just use
231"make bloatcheck". It requires busybox_old binary to be present,
232use "make baseline" to generate it from unmodified source, or
233copy busybox_unstripped to busybox_old before modifying sources
234and rebuilding.
235
Denys Vlasenkoa7bb3c12009-10-08 12:28:08 +0200236Set CONFIG_EXTRA_CFLAGS="-fno-inline-functions-called-once",
237produce "make bloatcheck", see the biggest auto-inlined functions.
238Now, set CONFIG_EXTRA_CFLAGS back to "", but add NOINLINE
239to some of these functions. In 1.16.x timeframe, the results were
240(annotated "make bloatcheck" output):
241
242function old new delta
243expand_vars_to_list - 1712 +1712 win
244lzo1x_optimize - 1429 +1429 win
245arith_apply - 1326 +1326 win
246read_interfaces - 1163 +1163 loss, leave w/o NOINLINE
247logdir_open - 1148 +1148 win
248check_deps - 1148 +1148 loss
249rewrite - 1039 +1039 win
250run_pipe 358 1396 +1038 win
251write_status_file - 1029 +1029 almost the same, leave w/o NOINLINE
252dump_identity - 987 +987 win
253mainQSort3 - 921 +921 win
254parse_one_line - 916 +916 loss
255summarize - 897 +897 almost the same
256do_shm - 884 +884 win
257cpio_o - 863 +863 win
258subCommand - 841 +841 loss
259receive - 834 +834 loss
260
261855 bytes saved in total.
Denys Vlasenkoadf922e2009-10-08 14:35:37 +0200262
263scripts/mkdiff_obj_bloat may be useful to automate this process: run
264"scripts/mkdiff_obj_bloat NORMALLY_BUILT_TREE FORCED_NOINLINE_TREE"
265and select modules which shrank.