Dave Barach | f46663c | 2018-08-10 11:05:52 -0400 | [diff] [blame^] | 1 | Multi-Architecture Graph Node Cookbook |
| 2 | ====================================== |
| 3 | |
| 4 | In the context of graph node dispatch functions, it's easy enough to |
| 5 | use the vpp multi-architecture support setup. The point of the scheme |
| 6 | is simple: for performance-critical nodes, generate multiple CPU |
| 7 | hardware-dependent versions of the node dispatch functions, and pick |
| 8 | the best one at runtime. |
| 9 | |
| 10 | The vpp scheme is simple enough to use, but details matter. |
| 11 | |
| 12 | 100,000 foot view |
| 13 | ----------------- |
| 14 | |
| 15 | We compile entire graph node dispatch function implementation files |
| 16 | multiple times. These compilations give rise to multiple versions of |
| 17 | the graph node dispatch functions. Per-node constructor-functions |
| 18 | interrogate CPU hardware, select the node dispatch function variant to |
| 19 | use, and set the vlib_node_registration_t ".function" member to the |
| 20 | address of the selected variant. |
| 21 | |
| 22 | Details |
| 23 | ------- |
| 24 | |
| 25 | Declare the node dispatch function as shown, using the VLIB\_NODE\_FN macro. The |
| 26 | name of the node function **MUST** match the name of the graph node. |
| 27 | |
| 28 | :: |
| 29 | |
| 30 | VLIB_NODE_FN (ip4_sdp_node) (vlib_main_t * vm, vlib_node_runtime_t * node, |
| 31 | vlib_frame_t * frame) |
| 32 | { |
| 33 | if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE)) |
| 34 | return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ , |
| 35 | 1 /* is_trace */ ); |
| 36 | else |
| 37 | return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ , |
| 38 | 0 /* is_trace */ ); |
| 39 | } |
| 40 | |
| 41 | We need to generate *precisely one copy* of the |
| 42 | vlib_node_registration_t, error strings, and packet trace decode function. |
| 43 | |
| 44 | Simply bracket these items with "#ifndef CLIB_MARCH_VARIANT...#endif": |
| 45 | |
| 46 | :: |
| 47 | |
| 48 | #ifndef CLIB_MARCH_VARIANT |
| 49 | static u8 * |
| 50 | format_sdp_trace (u8 * s, va_list * args) |
| 51 | { |
| 52 | <snip> |
| 53 | } |
| 54 | #endif |
| 55 | |
| 56 | ... |
| 57 | |
| 58 | #ifndef CLIB_MARCH_VARIANT |
| 59 | static char *sdp_error_strings[] = { |
| 60 | #define _(sym,string) string, |
| 61 | foreach_sdp_error |
| 62 | #undef _ |
| 63 | }; |
| 64 | #endif |
| 65 | |
| 66 | ... |
| 67 | |
| 68 | #ifndef CLIB_MARCH_VARIANT |
| 69 | VLIB_REGISTER_NODE (ip4_sdp_node) = |
| 70 | { |
| 71 | // DO NOT set the .function structure member. |
| 72 | // The multiarch selection __attribute__((constructor)) function |
| 73 | // takes care of it at runtime |
| 74 | .name = "ip4-sdp", |
| 75 | .vector_size = sizeof (u32), |
| 76 | .format_trace = format_sdp_trace, |
| 77 | .type = VLIB_NODE_TYPE_INTERNAL, |
| 78 | |
| 79 | .n_errors = ARRAY_LEN(sdp_error_strings), |
| 80 | .error_strings = sdp_error_strings, |
| 81 | |
| 82 | .n_next_nodes = SDP_N_NEXT, |
| 83 | |
| 84 | /* edit / add dispositions here */ |
| 85 | .next_nodes = |
| 86 | { |
| 87 | [SDP_NEXT_DROP] = "ip4-drop", |
| 88 | }, |
| 89 | }; |
| 90 | #endif |
| 91 | |
| 92 | To belabor the point: *do not* set the ".function" member! That's the job of the multi-arch |
| 93 | selection \_\_attribute\_\_((constructor)) function |
| 94 | |
| 95 | Always inline node dispatch functions |
| 96 | ------------------------------------- |
| 97 | |
| 98 | It's typical for a graph dispatch function to contain one or more |
| 99 | calls to an inline function. See above. If your node dispatch function |
| 100 | is structured that way, make *ABSOLUTELY CERTAIN* to use the |
| 101 | "always_inline" macro: |
| 102 | |
| 103 | :: |
| 104 | |
| 105 | always_inline uword |
| 106 | ip46_sdp_inline (vlib_main_t * vm, vlib_node_runtime_t * node, |
| 107 | vlib_frame_t * frame, |
| 108 | int is_ip4, int is_trace) |
| 109 | { ... } |
| 110 | |
| 111 | Otherwise, the compiler is highly likely NOT to build multiple |
| 112 | versions of the guts of your dispatch function. |
| 113 | |
| 114 | It's fairly easy to spot this mistake in "perf top." If you see, for |
| 115 | example, a bunch of functions with names of the form |
| 116 | "xxx_node_fn_avx2" in the profile, *BUT* your brand-new node function |
| 117 | shows up with a name of the form "xxx_inline.isra.1", it's quite likely |
| 118 | that the inline was declared "static inline" instead of "always_inline". |
| 119 | |
| 120 | Add the required Makefile.am content |
| 121 | ------------------------------------ |
| 122 | |
| 123 | If the component in question already sets a "multiversioning_sources" |
| 124 | variable, simply add the indicated .c file to the list. If not, add |
| 125 | the required boilerplate: |
| 126 | |
| 127 | :: |
| 128 | |
| 129 | if CPU_X86_64 |
| 130 | sdp_multiversioning_sources = \ |
| 131 | sdp/node.c \ |
| 132 | sdp/sdp_slookup.c |
| 133 | |
| 134 | if CC_SUPPORTS_AVX2 |
| 135 | ############################################################### |
| 136 | # AVX2 |
| 137 | ############################################################### |
| 138 | libsdp_plugin_avx2_la_SOURCES = $(sdp_multiversioning_sources) |
| 139 | libsdp_plugin_avx2_la_CFLAGS = \ |
| 140 | $(AM_CFLAGS) @CPU_AVX2_FLAGS@ \ |
| 141 | -DCLIB_MARCH_VARIANT=avx2 |
| 142 | noinst_LTLIBRARIES += libsdp_plugin_avx2.la |
| 143 | sdp_plugin_la_LIBADD += libsdp_plugin_avx2.la |
| 144 | endif |
| 145 | |
| 146 | if CC_SUPPORTS_AVX512 |
| 147 | ############################################################### |
| 148 | # AVX512 |
| 149 | ############################################################### |
| 150 | libsdp_plugin_avx512_la_SOURCES = $(sdp_multiversioning_sources) |
| 151 | libsdp_plugin_avx512_la_CFLAGS = \ |
| 152 | $(AM_CFLAGS) @CPU_AVX512_FLAGS@ \ |
| 153 | -DCLIB_MARCH_VARIANT=avx512 |
| 154 | noinst_LTLIBRARIES += libsdp_plugin_avx512.la |
| 155 | sdp_plugin_la_LIBADD += libsdp_plugin_avx512.la |
| 156 | endif |
| 157 | endif |
| 158 | |
| 159 | A certain amount of cut-paste-modify is currently required. Hopefully |
| 160 | we'll manage to improve the scheme in the future. |