Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 1 | .. |
| 2 | This work is licensed under a Creative Commons Attribution 4.0 |
| 3 | International License. |
| 4 | |
| 5 | =============================================== |
| 6 | Parallelism improvement of Multi Cloud Services |
| 7 | =============================================== |
| 8 | |
| 9 | |
| 10 | Problem Description |
| 11 | =================== |
| 12 | |
| 13 | Multi-Cloud runs Django by using Django's built-in webserver currently. |
| 14 | According to Django Document[Django_Document]_, this mode should not be used |
| 15 | in production. This mode has not gone through security audits or performance |
| 16 | tests, and should only be used in development. From test on local computer, |
| 17 | this mode can only handle ONE API request at one time. This can not meet the |
| 18 | performance requirement. |
| 19 | |
| 20 | .. [Django_Document] https://docs.djangoproject.com/en/dev/ref/django-admin/#runserver |
| 21 | |
| 22 | Although security and scalability might be improved as the side effect of |
| 23 | resolving the performance issue, this spec will only focus on how to improve |
| 24 | the parallelism(performance) of current MultiCloud API framework. |
| 25 | |
| 26 | Possible Solutions |
| 27 | ================== |
| 28 | |
| 29 | Solution 1 |
| 30 | ---------- |
| 31 | |
| 32 | Django is a mature framework. And it has its own way to improve parallelism. |
| 33 | Instead of running Django's build-in webserver, Django APP can be deployed in |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 34 | some dedicated web server. Django’s primary deployment platform is |
| 35 | WSGI[django_deploy]_, |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 36 | the Python standard for web servers and applications. |
| 37 | |
| 38 | .. [django_deploy] https://docs.djangoproject.com/en/2.0/howto/deployment/wsgi/ |
| 39 | |
| 40 | |
| 41 | But on the other side, Danjgo is very huge. And Django is a black box if one |
| 42 | doesn't have good knowledge of it. Adding feature based on Django may be |
| 43 | time-consuming. For example, the unit test[unit_test]_ of Multi-Cloud can't use |
| 44 | regular python test library because of Django. The unit test has to base on |
| 45 | Django's test framework. When we want to improve the parallelism of Multi-Cloud |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 46 | services, we need to find out how Django can implement it, instead of using |
| 47 | some common method. |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 48 | |
| 49 | .. [unit_test] https://gerrit.onap.org/r/#/c/8909/ |
| 50 | |
| 51 | Besides, Django's code pattern is too much like web code. And, most famous use |
| 52 | cases of Django are web UI. Current code of Multi-Cloud puts many logic in |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 53 | files named `views.py`, but actually there is no view to expose. It is |
| 54 | confusing. |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 55 | |
| 56 | The benefit of this solution is that most current code needs no change. |
| 57 | |
| 58 | Solution 2 |
| 59 | ---------- |
| 60 | |
| 61 | Given the fact that Django has shortcomings to move on, this solution propose |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 62 | to use a alternative framework. Eventlet[Eventlet]_ with Pecan[Pecan]_ will be |
| 63 | the idea web framework in this case, because it is lightweight, lean and widely |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 64 | used. |
| 65 | |
| 66 | .. [Eventlet] http://eventlet.net/doc/modules/wsgi.html |
| 67 | |
| 68 | .. [Pecan] https://pecan.readthedocs.io/en/latest/ |
| 69 | |
| 70 | For example, most OpenStack projects use such framework. This framework is so |
| 71 | thin that it can provide flexibility for future architecture design. |
| 72 | |
| 73 | However, it needs to change existing code of API exposing. |
| 74 | |
| 75 | |
| 76 | Performance Test Comparison |
| 77 | =========================== |
| 78 | |
| 79 | Test Environment |
| 80 | ---------------- |
| 81 | |
| 82 | Apache Benchmark is used as test tool. It is shipped with Ubuntu, if you |
| 83 | don’t find it, just run “sudo apt install -y apache2-utils” |
| 84 | |
| 85 | 2 Virtual Machine with Ubuntu1604. Virtual Machines are hosted in a multi-core |
| 86 | hardware server. One VM is for Apache Benchmark. This VM is 1 CPU core, 8G mem. |
| 87 | The other VM is for Multicloud. The VM is 4 CPU core, 6G mem. |
| 88 | |
| 89 | Test Command |
| 90 | ~~~~~~~~~~~~ |
| 91 | |
| 92 | `ab -n <num of total requests> -c <concurrency level> http://<IP:port>/api/multicloud/v0/vim_types` |
| 93 | |
| 94 | Test result |
| 95 | ----------- |
| 96 | |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 97 | It should be noted that data may vary in different test run, but overall result |
| 98 | is similar as below. |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 99 | |
| 100 | 100 requests, concurrency level 1 |
| 101 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 102 | |
| 103 | Command: `ab -n 100 -c 1 http://<IP:port>/api/multicloud/v0/vim_types` |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 104 | Result:: |
| 105 | |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 106 | Django runserver: total takes 0.512 seconds, all requests success |
| 107 | Django+uwsgi: totally takes 0.671 seconds, all requests success. |
| 108 | Pecan+eventlet: totally takes 0.149 seconds, all requests success. |
| 109 | |
| 110 | 10000 requests, concurrency level 100 |
| 111 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 112 | |
| 113 | Command: `ab -n 10000 -c 100 http://<IP:port>/api/multicloud/v0/vim_types` |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 114 | Result:: |
| 115 | |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 116 | Django runserver: total takes 85.326 seconds, all requests success |
| 117 | Django+uwsgi: totally takes 3.808 seconds, all requests success. |
| 118 | Pecan+eventlet: totally takes 3.181 seconds, all requests success. |
| 119 | |
| 120 | 100000 requests, concurrency level 1000 |
| 121 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 122 | |
Bin Sun | 6923064 | 2018-03-23 10:17:12 +0800 | [diff] [blame] | 123 | Command: `ab -n 100000 -c 1000 http://<IP:port>/api/multicloud/v0/vim_types` |
Ethan Lynn | b3e79cc | 2018-06-05 17:26:55 +0800 | [diff] [blame] | 124 | Result:: |
| 125 | |
Hong Hui Xiao | 4ec7716 | 2018-01-17 10:15:26 +0800 | [diff] [blame] | 126 | Django runserver: Apache Benchmark quit because it reports timeout after |
| 127 | running a random portion of all requests. |
| 128 | Django+uwsgi: totally takes 37.316 seconds, about 32% requests fail. I see |
| 129 | some error says that tcp socket open too many. |
| 130 | Pecan+eventlet: totally takes 35.315 seconds, all requests success. |
| 131 | |
| 132 | Proposed Change |
| 133 | =============== |
| 134 | |
| 135 | Given the test result above, this spec proposes to use solution 2. Based on |
| 136 | the consideration of Elastic API exposure[jira_workitem]_, Multi-Cloud will |
| 137 | provide a new way to expose its API. That is to say, existing code of API |
| 138 | exposing needs rewrite in [jira_workitem]_. So the disadvantage of solution |
| 139 | 2 doesn't exist. |
| 140 | |
| 141 | .. [jira_workitem] https://jira.onap.org/browse/MULTICLOUD-152 |
| 142 | |
| 143 | To define a clear scope of this spec, VoLTE is the use case that will be used |
| 144 | to perform test to this spec. All functionality that VoLTE needed should be |
| 145 | implemented in this spec and [jira_workitem]_. |
| 146 | |
| 147 | Backward compatibility |
| 148 | ---------------------- |
| 149 | |
| 150 | This spec will NOT change current API. This spec will NOT replace the current |
| 151 | API framework in R2, nor will switch to new API framework in R2. Instead, |
| 152 | this spec will provide a configuration option, named `web_framework`, to make |
| 153 | sure use case and functionalities not be broken. Default value of the |
| 154 | configuration will BE `django`, which will still run current Django API |
| 155 | framework. An alternative value is `pecan`, which will run the API framework |
| 156 | proposed in this spec. So users don't care about the change won't be |
| 157 | affected. |
| 158 | |
| 159 | WSGI Server |
| 160 | ----------- |
| 161 | |
| 162 | No matter what API framework will be used, a WSGI Server needs to be provided. |
| 163 | This spec will use Eventlet WSGI server. API framework will be run as an |
| 164 | application in WSGI server. |
| 165 | |
| 166 | Multi processes framework |
| 167 | ------------------------- |
| 168 | |
| 169 | This spec proposes to run Multi-Cloud API server in multiple processes mode. |
| 170 | Multi-process can provide parallel API handlers. So, when multiple API |
| 171 | requests come to Multi-Cloud, they can be handled simultaneously. On the other |
| 172 | hand, different processes can effectively isolate different API request. So |
| 173 | that, one API request will not affect another. |
| 174 | |
| 175 | Managing multiple processes could be overwhelming difficult and sometimes |
| 176 | dangerous. Some mature library could be used to reduce related work here, for |
| 177 | example oslo.service[oslo_service]_. Since oslo is used by all OpenStack |
| 178 | projects for many releases, and oslo project is actively updated, it can be |
| 179 | seen as a stable library. |
| 180 | |
| 181 | .. [oslo_service] https://github.com/openstack/oslo.service |
| 182 | |
| 183 | Number of processes |
| 184 | ~~~~~~~~~~~~~~~~~~~ |
| 185 | |
| 186 | To best utilize multi-core CPU, the number of processes will be set to the |
| 187 | number of CPU cores by default. |
| 188 | |
| 189 | Shared socket file |
| 190 | ~~~~~~~~~~~~~~~~~~ |
| 191 | |
| 192 | To make multiple processes work together and provide a unified port number, |
| 193 | multiple processes need to share a socket file. To achieve this, a bootstrap |
| 194 | process will be started and will initialize the socket file. Other processes |
| 195 | can be forked from this bootstrap process. |
| 196 | |
| 197 | Work Items |
| 198 | ========== |
| 199 | |
| 200 | #. Add WSGI server. |
| 201 | #. Run Pecan application in WSGI server. |
| 202 | #. Add multiple processes support. |
| 203 | #. Update deploy script to support new API framework. |
| 204 | |