vhost: crash under heavy traffic condition due to memory corruption (VPP-1016)

With heavy traffic, tx code path may crash due to memory corruption

Thread 5 "vpp_wk_2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff3995c700 (LWP 2505)]
0x00007ffff73675e8 in vhost_user_if_input (vm=0x7fffb5f5bf9c,
    vum=0x7ffff7882a40 <vhost_user_main>, vui=0x7fffb65570c4, qid=0,
    node=0x7fffb6577dac, mode=VNET_HW_INTERFACE_RX_MODE_POLLING)
    at /home/sluong/vpp-master/vpp/build-data/../src/vnet/devices/virtio/vhost-user.c:1610
1610		  bi_current = (vum->cpus[thread_index].rx_buffers)
	                       [vum->cpus[thread_index].rx_buffers_len];
(gdb) p vum->cpus[thread_index].rx_buffers_len
$2 = 793212607
(gdb)

Apparently, some code accidentally wrote the bad value in rx_buffers_len.
rx_buffers_len should never be greater than 1024 since that is how many buffers
we request each time.

After debugging many hours, I discovered that the memory corruption happens
in the tx code path right here on line 2176.

	  {
	    vhost_copy_t *cpy = &vum->cpus[thread_index].copy[copy_len];
	    copy_len++;
	    cpy->len = bytes_left;
	    cpy->len = (cpy->len > buffer_len) ? buffer_len : cpy->len;
	    cpy->dst = buffer_map_addr;
	    cpy->src = (uword) vlib_buffer_get_current (current_b0) +
	      current_b0->current_length - bytes_left;

(gdb) p cpy
$3 = (vhost_copy_t *) 0x7fffb554077c
(gdb) p copy_len
$4 = 1025
(gdb) p &vum->cpus[3].rx_buffers_len
$8 = (u32 *) 0x7fffb5540784

copy_len is picking up the index entry 1024 before it was incremented. copy array has only
1024 members (0 - 1023 are valid).
The assignment here in cpy surely causes memory corruption. It is only discovered later
when the memory location that it corrupted is used.

The condition for the crash is to transmit jumbo frames under heavy volume. Since ring
size is 1024, with one packet taking up one index for frame size (less 2048), it does
not cause overflow. With jumbo frames, it requires multiple indices for one packet,
it can cause the overflow under heavy traffic.

The fix is to do copy out when we have 1000 entries in the array to avoid
overflow.

Change-Id: Iefbc739b8e80470f1cf13123113f8331ffcd0eb2
Signed-off-by: Steven <sluong@cisco.com>
1 file changed
tree: 97404007d5458d887d57ef8403bb1ea2d3172186
  1. build-data/
  2. build-root/
  3. doxygen/
  4. dpdk/
  5. extras/
  6. gmod/
  7. src/
  8. test/
  9. .clang-format
  10. .gitignore
  11. .gitreview
  12. LICENSE
  13. MAINTAINERS
  14. Makefile
  15. README.md
  16. RELEASE.md
README.md

Vector Packet Processing

Introduction

The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality. It is the open source version of Cisco's Vector Packet Processing (VPP) technology: a high performance, packet-processing stack that can run on commodity CPUs.

The benefits of this implementation of VPP are its high performance, proven technology, its modularity and flexibility, and rich feature set.

For more information on VPP and its features please visit the FD.io website and What is VPP? pages.

Changes

Details of the changes leading up to this version of VPP can be found under @ref release_notes.

Directory layout

Directory nameDescription
build-dataBuild metadata
build-rootBuild output directory
doxygenDocumentation generator configuration
dpdkDPDK patches and build infrastructure
@ref extras/libmemifClient library for memif
@ref src/examplesVPP example code
@ref src/pluginsVPP bundled plugins directory
@ref src/svmShared virtual memory allocation library
src/testsStandalone tests (not part of test harness)
src/vatVPP API test program
@ref src/vlibVPP application library
@ref src/vlibapiVPP API library
@ref src/vlibmemoryVPP Memory management
@ref src/vlibsocketVPP Socket I/O
@ref src/vnetVPP networking
@ref src/vppVPP application
@ref src/vpp-apiVPP application API bindings
@ref src/vppinfraVPP core library
@ref src/vpp/apiNot-yet-relocated API bindings
testUnit tests and Python test harness

Getting started

In general anyone interested in building, developing or running VPP should consult the VPP wiki for more complete documentation.

In particular, readers are recommended to take a look at [Pulling, Building, Running, Hacking, Pushing](https://wiki.fd.io/view/VPP/Pulling,_Building,_Run ning,_Hacking_and_Pushing_VPP_Code) which provides extensive step-by-step coverage of the topic.

For the impatient, some salient information is distilled below.

Quick-start: On an existing Linux host

To install system dependencies, build VPP and then install it, simply run the build script. This should be performed a non-privileged user with sudo access from the project base directory:

./extras/vagrant/build.sh

If you want a more fine-grained approach because you intend to do some development work, the Makefile in the root directory of the source tree provides several convenience shortcuts as make targets that may be of interest. To see the available targets run:

make

Quick-start: Vagrant

The directory extras/vagrant contains a VagrantFile and supporting scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. This VM can then be used to test concepts with VPP or as a development platform to extend VPP. Some obvious caveats apply when using a VM for VPP since its performance will never match that of bare metal; if your work is timing or performance sensitive, consider using bare metal in addition or instead of the VM.

For this to work you will need a working installation of Vagrant. Instructions for this can be found [on the Setting up Vagrant wiki page] (https://wiki.fd.io/view/DEV/Setting_Up_Vagrant).

More information

Several modules provide documentation, see @subpage user_doc for more end-user-oriented information. Also see @subpage dev_doc for developer notes.

Visit the VPP wiki for details on more advanced building strategies and other development notes.

Test Framework

There is PyDoc generated documentation available for the VPP test framework. See @ref test_framework_doc for details.