Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 1 | .. _integration-s3p: |
| 2 | |
Gary Wu | cd47a01 | 2018-11-30 07:18:36 -0800 | [diff] [blame] | 3 | ONAP Maturity Testing Notes |
| 4 | --------------------------- |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 5 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 6 | Historically integration team used to execute specific stability and resilience |
| 7 | tests on target release. For frankfurt a stability test was executed. |
| 8 | Openlab, based on Frankfurt RC0 dockers was also observed a long duration |
| 9 | period to evaluate the overall stability. |
| 10 | Finally the CI daily chain created at Frankfurt RC0 was also a precious indicator |
| 11 | to estimate the solution stability. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 12 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 13 | No resilience or stress tests have been executed due to a lack of resources |
| 14 | and late availability of the release. The testing strategy shall be amended in |
| 15 | Guilin, several requirements have been created to improve the S3P testing domain. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 16 | |
| 17 | Stability |
| 18 | ========= |
mrichomme | 057c10b | 2019-10-18 18:54:16 +0200 | [diff] [blame] | 19 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 20 | ONAP stability was tested through a 72 hour test. |
| 21 | The intent of the 72 hour stability test is not to exhaustively test all |
| 22 | functions but to run a steady load against the system and look for issues like |
| 23 | memory leaks that cannot be found in the short duration install and functional |
| 24 | testing during the development cycle. |
mrichomme | 057c10b | 2019-10-18 18:54:16 +0200 | [diff] [blame] | 25 | |
mrichomme | 842d322 | 2019-10-09 14:02:17 +0200 | [diff] [blame] | 26 | Integration Stability Testing verifies that the ONAP platform remains fully |
| 27 | functional after running for an extended amounts of time. |
| 28 | This is done by repeated running tests against an ONAP instance for a period of |
| 29 | 72 hours. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 30 | |
mrichomme | 0794e67 | 2020-06-09 15:37:37 +0200 | [diff] [blame] | 31 | :: |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 32 | |
mrichomme | 0794e67 | 2020-06-09 15:37:37 +0200 | [diff] [blame] | 33 | **The 72 hour stability run result was PASS** |
| 34 | |
| 35 | The onboard and instantiate tests ran for over **115 hours** before environment |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 36 | issues stopped the test. There were errors due to both tooling and environment |
| 37 | errors. |
| 38 | |
mrichomme | 0794e67 | 2020-06-09 15:37:37 +0200 | [diff] [blame] | 39 | The overall memory utilization only grew about **2%** on the work nodes despite |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 40 | the environment issues. Interestingly the kubernetes ochestration node memory |
| 41 | grew more which could mean we are over driving the API's in some fashion. |
| 42 | |
| 43 | We did not limit other tenant activities in Windriver during this test run and |
| 44 | we saw the impact from things like the re-installation of SB00 in the tenant |
| 45 | and general network latency impacts that caused openstack to be slower to |
| 46 | instantiate. |
| 47 | For future stability runs we should go back to the process of shutting down |
| 48 | non-critical tenants in the test environment to free up host resources for |
| 49 | the test run (or other ways to prevent other testing from affecting the stability |
| 50 | run). |
| 51 | |
| 52 | The control loop tests were **100% successful** and the cycle time for the loop was |
| 53 | fairly consistent despite the environment issues. Future control loop stability |
| 54 | tests should consider doing more policy edit type activites and running more |
| 55 | control loop if host resources are available. The 10 second VES telemetry event |
| 56 | is quite aggressive so we are sending more load into the VES collector and TCA |
| 57 | engine during onset events than would be typical so adding additional loops |
| 58 | should factor that in. The jenkins jobs ran fairly well although the instantiate |
| 59 | Demo vFWCL took longer than usual and should be factored into future test planning. |
| 60 | |
| 61 | |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 62 | Methodology |
| 63 | ~~~~~~~~~~~ |
| 64 | |
| 65 | The Stability Test has two main components: |
| 66 | |
mrichomme | 842d322 | 2019-10-09 14:02:17 +0200 | [diff] [blame] | 67 | - Running "ete stability72hr" Robot suite periodically. This test suite |
| 68 | verifies that ONAP can instantiate vDNS, vFWCL, and VVG. |
| 69 | - Set up vFW Closed Loop to remain running, then check periodically that the |
| 70 | closed loop functionality is still working. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 71 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 72 | The integration-longevity tenant in Intel/Windriver environment was used for the |
| 73 | 72 hour tests. |
| 74 | |
| 75 | The onap-ci job for "Project windriver-longevity-release-manual" was used for |
| 76 | the deployment with the OOM set to frankfurt and Integration branches set to |
| 77 | master. Integration master was used so we could catch the latest updates to |
| 78 | integration scripts and vnf heat templates. |
| 79 | |
| 80 | The jenkins job needs a couple of updates for each release: |
| 81 | |
| 82 | - Set the integration branch to 'origin/master' |
| 83 | - Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt" |
| 84 | to get integration master an oom frankfurt clones onto the nfs server. |
| 85 | |
| 86 | The path for robot logs on dockerdata-nfs changed in Frankfurt so the |
| 87 | /dev-robot/ becomes /dev/robot |
| 88 | |
mrichomme | 0794e67 | 2020-06-09 15:37:37 +0200 | [diff] [blame] | 89 | .. note:: |
| 90 | For Frankfurt release, the stability test has been executed on an |
| 91 | kubernetes infrastructure based on El Alto recommendations. The kubernetes |
| 92 | version was 1.15.3 (frankfurt 1.15.11) and the helm version was 2.14.2 |
| 93 | (frankfurt 2.16.6). However the ONAP dockers were updated to Frankfurt RC2 |
| 94 | candidate versions. The results are informative and can be compared with |
| 95 | previous campaigns. The stability tests used robot container image |
| 96 | **1.6.1-STAGING-20200519T201214Z**. Robot container was patched to use GRA_API |
| 97 | since VNF_API has been deprecated. |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 98 | |
| 99 | Shakedown consists of creating some temporary tags for stability72hrvLB, |
| 100 | stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully |
| 101 | (including cleanup) in the environment before the jenkins job started with the |
| 102 | higher level testsuite tag stability72hr that covers all three test types. |
| 103 | |
| 104 | Clean out the old buid jobs using a jenkins console script (manage jenkins) |
| 105 | |
| 106 | :: |
| 107 | |
| 108 | def jobName = "windriver-longevity-stability72hr"= |
| 109 | def job = Jenkins.instance.getItem(jobName) |
| 110 | job.getBuilds().each { it.delete() } |
| 111 | job.nextBuildNumber = 1 |
| 112 | job.save() |
| 113 | |
| 114 | |
| 115 | appc.properties updated to apply the fix for DMaaP message processing to call |
| 116 | http://localhost:8181 for the streams update. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 117 | |
| 118 | Results: 100% PASS |
| 119 | ~~~~~~~~~~~~~~~~~~ |
Gary Wu | 70d75b5 | 2019-06-21 10:06:17 -0700 | [diff] [blame] | 120 | =================== ======== ========== ======== ========= ========= |
| 121 | Test Case Attempts Env Issues Failures Successes Pass Rate |
| 122 | =================== ======== ========== ======== ========= ========= |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 123 | Stability 72 hours 77 19 0 58 100% |
| 124 | vFW Closed Loop 60 0 0 100 100% |
| 125 | **Total** 137 19 0 158 **100%** |
Gary Wu | 70d75b5 | 2019-06-21 10:06:17 -0700 | [diff] [blame] | 126 | =================== ======== ========== ======== ========= ========= |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 127 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 128 | Detailed results can be found at https://wiki.onap.org/display/DW/Frankfurt+Stability+Run+Notes |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 129 | |
| 130 | .. note:: |
Gary Wu | 70d75b5 | 2019-06-21 10:06:17 -0700 | [diff] [blame] | 131 | - Overall results were good. All of the test failures were due to |
| 132 | issues with the unstable environment and tooling framework. |
| 133 | - JIRAs were created for readiness/liveness probe issues found while |
| 134 | testing under the unstable environment. Patches applied to oom and |
| 135 | testsuite during the testing helped reduce test failures due to |
| 136 | environment and tooling framework issues. |
| 137 | - The vFW Closed Loop test was very stable and self recovered from |
| 138 | environment issues. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 139 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 140 | Resources overview |
| 141 | ~~~~~~~~~~~~~~~~~~ |
| 142 | ============ ====================== =========== ========== ========== |
| 143 | Date #1 CPU #1 RAM CPU* RAM** |
| 144 | ============ ====================== =========== ========== ========== |
| 145 | May 20 18:45 dcae-tca-anaytics:511m appc:2901Mi 1649 36092 |
| 146 | May 21 12:33 dcae-tca-anaytics:664m appc:2901Mi 1605 38221 |
| 147 | May 22 09:35 dcae-tca-anaytics:425m appc:2837Mi 1459 38488 |
| 148 | May 23 11:01 cassandra-1:371m appc:2849Mi 1829 39431 |
| 149 | ============ ====================== =========== ========== ========== |
| 150 | |
| 151 | .. note:: |
| 152 | - Results are given from the command "kubectl -n onap top pods | sort -rn -k 3 |
| 153 | | head -20" |
| 154 | - * sum of the top 20 CPU consumption |
| 155 | - ** sum of the top 20 RAM consumption |
| 156 | |
| 157 | CI results |
| 158 | ========== |
| 159 | |
| 160 | A daily Frankfurt CI chain has been created after RC0. |
| 161 | |
| 162 | The evolution of the full healthcheck test suite can be described as follows: |
| 163 | |
| 164 | |image1| |
| 165 | |
| 166 | Full healthcheck testsuite verifies the status of each component. It is |
| 167 | composed of 47 tests. The success rate from the 9th to the 28th was never under |
| 168 | 95%. |
| 169 | |
| 170 | 4 test categories were defined: |
| 171 | |
| 172 | - infrastructure healthcheck: test of ONAP kubernetes cluster and help chart status |
| 173 | - healthcheck tests: verification of the components in the target deployment |
| 174 | environment |
| 175 | - smoke tests: basic VM tests (including onboarding/distribution/instantiation), |
| 176 | and automated use cases (pnf-registrate, hvves, 5gbulkpm) |
| 177 | - security tests |
| 178 | |
| 179 | The security target (66% for Frankfurt) was reached after the RC1. A regression |
| 180 | due to the automation of the hvves use case (triggering the exposition of a |
| 181 | public port in HTTP) was fixed on the 28th of May. |
| 182 | |
| 183 | |image2| |
| 184 | |
| 185 | Orange Openlab |
| 186 | ============== |
| 187 | |
| 188 | The Orange Openlab is a community lab targeting ONAP end user. It provides an |
| 189 | ONAP and cloud resources to discover ONAP. |
| 190 | A Frankfurt pre-RC0 version was installed beginning of May. The usual gating |
| 191 | testing suite was run daily in addition of the traffic generated by the lab |
| 192 | users. The VM instantiation has been working well without any reinstallation |
| 193 | over the **27** last days. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 194 | |
| 195 | Resilience |
| 196 | ========== |
| 197 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 198 | The resilience test executed in El Alto was not realized in Frankfurt. |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 199 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 200 | .. |image1| image:: files/s3p/daily_frankfurt1.png |
| 201 | :width: 6.5in |
Gary Wu | e8c445a | 2018-11-29 11:54:52 -0800 | [diff] [blame] | 202 | |
mrichomme | ac588d6 | 2020-05-28 22:38:30 +0200 | [diff] [blame] | 203 | .. |image2| image:: files/s3p/daily_frankfurt2.png |
| 204 | :width: 6.5in |