perfmon plugin: 2-way parallel stat collection

As a FUD reduction measure, this patch implements 2-way parallel
counter collection. Synthetic stat component counter pairs run at the
same time. Running two counters (of any kind) at the same time
naturally reduces the aggregate time required by an approximate
factor-of-2, depending on whether an even or odd number of stats have
been requested.

I don't completely buy the argument that computing synthetic stats
such as instructions-per-clock will be inaccurate if component counter
values are collected sequentially. Given uniform traffic pattern, it
must make no difference.

As the collection interval increases, the difference between serial
and parallel component counter collection will approach zero, see also
the Central Limit theorem.

Change-Id: I36ebdcf125e8882cca8a1929ec58f17fba1ad8f1
Signed-off-by: Dave Barach <dave@barachs.net>
diff --git a/src/plugins/perfmon/perfmon.h b/src/plugins/perfmon/perfmon.h
index 47ee471..9663dae 100644
--- a/src/plugins/perfmon/perfmon.h
+++ b/src/plugins/perfmon/perfmon.h
@@ -97,8 +97,11 @@
   perfmon_cpuid_and_table_t *perfmon_tables;
   uword *perfmon_table;
 
-  /* vector of events to collect */
-  perfmon_event_config_t *events_to_collect;
+  /* vector of single events to collect */
+  perfmon_event_config_t *single_events_to_collect;
+
+  /* vector of paired events to collect */
+  perfmon_event_config_t *paired_events_to_collect;
 
   /* Base indices of synthetic event tuples */
   u32 ipc_event_index;
@@ -109,13 +112,14 @@
 
   /* Current event (index) being collected */
   u32 current_event;
-  u32 *rdpmc_indices;
+  int n_active;
+  u32 **rdpmc_indices;
   /* mmap base / size of (mapped) struct perf_event_mmap_page */
-  u8 **perf_event_pages;
+  u8 ***perf_event_pages;
   u32 page_size;
 
   /* Current perf_event file descriptors, per thread */
-  int *pm_fds;
+  int **pm_fds;
 
   /* Logging */
   vlib_log_class_t log_class;