Nathan Skrzypczak | 9ad39c0 | 2021-08-19 11:38:06 +0200 | [diff] [blame] | 1 | Capturing VPP core dumps |
| 2 | ======================== |
| 3 | |
| 4 | In order to debug a crash of VPP, it is required to provide a coredump |
| 5 | file, which allows backtracing of the VPP issue. The following items are |
| 6 | the requirements for capturing a coredump: |
| 7 | |
| 8 | 1. Disable k8s Probes to Prevent k8s from Restarting the POD with a Crashed VPP |
| 9 | ------------------------------------------------------------------------------- |
| 10 | |
| 11 | As described in |
| 12 | `BUG_REPORTS.md <BUG_REPORTS.html#collecting-the-logs-in-case-of-crash-loop>`__. |
| 13 | |
| 14 | 2. Modify VPP Startup config file |
| 15 | --------------------------------- |
| 16 | |
| 17 | In ``/etc/vpp/contiv-vswitch.conf``, add the following lines into the |
| 18 | ``unix`` section: |
| 19 | |
| 20 | :: |
| 21 | |
| 22 | unix { |
| 23 | ... |
| 24 | coredump-size unlimited |
| 25 | full-coredump |
| 26 | } |
| 27 | |
| 28 | 3. Turn on Coredumps in the Vswitch Container |
| 29 | --------------------------------------------- |
| 30 | |
| 31 | After re-deploying Contiv-VPP networking, enter bash shell in the |
| 32 | vswitch container (use actual name of the vswitch POD - |
| 33 | ``contiv-vswitch-7whk7`` in this case): |
| 34 | |
| 35 | :: |
| 36 | |
| 37 | kubectl exec -it contiv-vswitch-7whk7 -n kube-system -c contiv-vswitch bash |
| 38 | |
| 39 | Enable coredumps: |
| 40 | |
| 41 | :: |
| 42 | |
| 43 | mkdir -p /tmp/dumps |
| 44 | sysctl -w debug.exception-trace=1 |
| 45 | sysctl -w kernel.core_pattern="/tmp/dumps/%e-%t" |
| 46 | ulimit -c unlimited |
| 47 | echo 2 > /proc/sys/fs/suid_dumpable |
| 48 | |
| 49 | 4. Let VPP Crash |
| 50 | ---------------- |
| 51 | |
| 52 | Now repeat the steps that lead to the VPP crash. You can also force VPP |
| 53 | to crash at the point where it is running (e.g., if it is stuck) by |
| 54 | using the SIGQUIT signal: |
| 55 | |
| 56 | :: |
| 57 | |
| 58 | kill -3 `pidof vpp` |
| 59 | |
| 60 | 5. Locate and Inspect the Core File |
| 61 | ----------------------------------- |
| 62 | |
| 63 | The core file should appear in ``/tmp/dumps`` in the container: |
| 64 | |
| 65 | :: |
| 66 | |
| 67 | cd /tmp/dumps |
| 68 | ls |
| 69 | vpp_main-1524124440 |
| 70 | |
| 71 | You can try to backtrace, after installing gdb: |
| 72 | |
| 73 | :: |
| 74 | |
| 75 | apt-get update && apt-get install gdb |
| 76 | gdb vpp vpp_main-1524124440 |
| 77 | (gdb) bt |
| 78 | |
| 79 | 6. Copy the Core File Out of the Container |
| 80 | ------------------------------------------ |
| 81 | |
| 82 | Finally, copy the core file out of the container. First, while still |
| 83 | inside the container, pack the core file into an archive: |
| 84 | |
| 85 | :: |
| 86 | |
| 87 | cd /tmp/dumps |
| 88 | tar cvzf vppdump.tar.gz vpp_main-1524124440 |
| 89 | |
| 90 | Now, on the host, determine the docker ID of the container, and then |
| 91 | copy the file out of the host: |
| 92 | |
| 93 | :: |
| 94 | |
| 95 | docker ps | grep vswitch_contiv |
| 96 | d7aceb2e4876 c43a70ac3d01 "/usr/bin/supervisor…" 25 minutes ago Up 25 minutes k8s_contiv-vswitch_contiv-vswitch-zqzn6_kube-system_9923952f-43a6-11e8-be84-080027de08ea_0 |
| 97 | |
| 98 | docker cp d7aceb2e4876:/tmp/dumps/vppdump.tar.gz . |
| 99 | |
| 100 | Now you are ready to file a bug in `jira.fd.io <https://jira.fd.io/>`__ |
| 101 | and attach the core file. |