blob: 49c67850f6cf183d8820c191390f4fbbd163cee3 [file] [log] [blame]
Gary Wue8c445a2018-11-29 11:54:52 -08001.. _integration-s3p:
2
Gary Wucd47a012018-11-30 07:18:36 -08003ONAP Maturity Testing Notes
4---------------------------
Gary Wue8c445a2018-11-29 11:54:52 -08005
mrichommeac588d62020-05-28 22:38:30 +02006Historically integration team used to execute specific stability and resilience
7tests on target release. For frankfurt a stability test was executed.
8Openlab, based on Frankfurt RC0 dockers was also observed a long duration
9period to evaluate the overall stability.
10Finally the CI daily chain created at Frankfurt RC0 was also a precious indicator
11to estimate the solution stability.
Gary Wue8c445a2018-11-29 11:54:52 -080012
mrichommeac588d62020-05-28 22:38:30 +020013No resilience or stress tests have been executed due to a lack of resources
14and late availability of the release. The testing strategy shall be amended in
15Guilin, several requirements have been created to improve the S3P testing domain.
Gary Wue8c445a2018-11-29 11:54:52 -080016
17Stability
18=========
mrichomme057c10b2019-10-18 18:54:16 +020019
mrichommeac588d62020-05-28 22:38:30 +020020ONAP stability was tested through a 72 hour test.
21The intent of the 72 hour stability test is not to exhaustively test all
22functions but to run a steady load against the system and look for issues like
23memory leaks that cannot be found in the short duration install and functional
24testing during the development cycle.
mrichomme057c10b2019-10-18 18:54:16 +020025
mrichomme842d3222019-10-09 14:02:17 +020026Integration Stability Testing verifies that the ONAP platform remains fully
27functional after running for an extended amounts of time.
28This is done by repeated running tests against an ONAP instance for a period of
2972 hours.
Gary Wue8c445a2018-11-29 11:54:52 -080030
mrichomme0794e672020-06-09 15:37:37 +020031::
mrichommeac588d62020-05-28 22:38:30 +020032
mrichomme0794e672020-06-09 15:37:37 +020033 **The 72 hour stability run result was PASS**
34
35The onboard and instantiate tests ran for over **115 hours** before environment
mrichommeac588d62020-05-28 22:38:30 +020036issues stopped the test. There were errors due to both tooling and environment
37errors.
38
mrichomme0794e672020-06-09 15:37:37 +020039The overall memory utilization only grew about **2%** on the work nodes despite
mrichommeac588d62020-05-28 22:38:30 +020040the environment issues. Interestingly the kubernetes ochestration node memory
41grew more which could mean we are over driving the API's in some fashion.
42
43We did not limit other tenant activities in Windriver during this test run and
44we saw the impact from things like the re-installation of SB00 in the tenant
45and general network latency impacts that caused openstack to be slower to
46instantiate.
47For future stability runs we should go back to the process of shutting down
48non-critical tenants in the test environment to free up host resources for
49the test run (or other ways to prevent other testing from affecting the stability
50run).
51
52The control loop tests were **100% successful** and the cycle time for the loop was
53fairly consistent despite the environment issues. Future control loop stability
54tests should consider doing more policy edit type activites and running more
55control loop if host resources are available. The 10 second VES telemetry event
56is quite aggressive so we are sending more load into the VES collector and TCA
57engine during onset events than would be typical so adding additional loops
58should factor that in. The jenkins jobs ran fairly well although the instantiate
59Demo vFWCL took longer than usual and should be factored into future test planning.
60
61
Gary Wue8c445a2018-11-29 11:54:52 -080062Methodology
63~~~~~~~~~~~
64
65The Stability Test has two main components:
66
mrichomme842d3222019-10-09 14:02:17 +020067- Running "ete stability72hr" Robot suite periodically. This test suite
68 verifies that ONAP can instantiate vDNS, vFWCL, and VVG.
69- Set up vFW Closed Loop to remain running, then check periodically that the
70 closed loop functionality is still working.
Gary Wue8c445a2018-11-29 11:54:52 -080071
mrichommeac588d62020-05-28 22:38:30 +020072The integration-longevity tenant in Intel/Windriver environment was used for the
7372 hour tests.
74
75The onap-ci job for "Project windriver-longevity-release-manual" was used for
76the deployment with the OOM set to frankfurt and Integration branches set to
77master. Integration master was used so we could catch the latest updates to
78integration scripts and vnf heat templates.
79
80The jenkins job needs a couple of updates for each release:
81
82- Set the integration branch to 'origin/master'
83- Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt"
84 to get integration master an oom frankfurt clones onto the nfs server.
85
86The path for robot logs on dockerdata-nfs changed in Frankfurt so the
87/dev-robot/ becomes /dev/robot
88
mrichomme0794e672020-06-09 15:37:37 +020089.. note::
90 For Frankfurt release, the stability test has been executed on an
91 kubernetes infrastructure based on El Alto recommendations. The kubernetes
92 version was 1.15.3 (frankfurt 1.15.11) and the helm version was 2.14.2
93 (frankfurt 2.16.6). However the ONAP dockers were updated to Frankfurt RC2
94 candidate versions. The results are informative and can be compared with
95 previous campaigns. The stability tests used robot container image
96 **1.6.1-STAGING-20200519T201214Z**. Robot container was patched to use GRA_API
97 since VNF_API has been deprecated.
mrichommeac588d62020-05-28 22:38:30 +020098
99Shakedown consists of creating some temporary tags for stability72hrvLB,
100stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully
101(including cleanup) in the environment before the jenkins job started with the
102higher level testsuite tag stability72hr that covers all three test types.
103
104Clean out the old buid jobs using a jenkins console script (manage jenkins)
105
106::
107
108 def jobName = "windriver-longevity-stability72hr"=
109 def job = Jenkins.instance.getItem(jobName)
110 job.getBuilds().each { it.delete() }
111 job.nextBuildNumber = 1
112 job.save()
113
114
115appc.properties updated to apply the fix for DMaaP message processing to call
116http://localhost:8181 for the streams update.
Gary Wue8c445a2018-11-29 11:54:52 -0800117
118Results: 100% PASS
119~~~~~~~~~~~~~~~~~~
Gary Wu70d75b52019-06-21 10:06:17 -0700120=================== ======== ========== ======== ========= =========
121Test Case Attempts Env Issues Failures Successes Pass Rate
122=================== ======== ========== ======== ========= =========
mrichommeac588d62020-05-28 22:38:30 +0200123Stability 72 hours 77 19 0 58 100%
124vFW Closed Loop 60 0 0 100 100%
125**Total** 137 19 0 158 **100%**
Gary Wu70d75b52019-06-21 10:06:17 -0700126=================== ======== ========== ======== ========= =========
Gary Wue8c445a2018-11-29 11:54:52 -0800127
mrichommeac588d62020-05-28 22:38:30 +0200128Detailed results can be found at https://wiki.onap.org/display/DW/Frankfurt+Stability+Run+Notes
Gary Wue8c445a2018-11-29 11:54:52 -0800129
130.. note::
Gary Wu70d75b52019-06-21 10:06:17 -0700131 - Overall results were good. All of the test failures were due to
132 issues with the unstable environment and tooling framework.
133 - JIRAs were created for readiness/liveness probe issues found while
134 testing under the unstable environment. Patches applied to oom and
135 testsuite during the testing helped reduce test failures due to
136 environment and tooling framework issues.
137 - The vFW Closed Loop test was very stable and self recovered from
138 environment issues.
Gary Wue8c445a2018-11-29 11:54:52 -0800139
mrichommeac588d62020-05-28 22:38:30 +0200140Resources overview
141~~~~~~~~~~~~~~~~~~
142============ ====================== =========== ========== ==========
143Date #1 CPU #1 RAM CPU* RAM**
144============ ====================== =========== ========== ==========
145May 20 18:45 dcae-tca-anaytics:511m appc:2901Mi 1649 36092
146May 21 12:33 dcae-tca-anaytics:664m appc:2901Mi 1605 38221
147May 22 09:35 dcae-tca-anaytics:425m appc:2837Mi 1459 38488
148May 23 11:01 cassandra-1:371m appc:2849Mi 1829 39431
149============ ====================== =========== ========== ==========
150
151.. note::
152 - Results are given from the command "kubectl -n onap top pods | sort -rn -k 3
153 | head -20"
154 - * sum of the top 20 CPU consumption
155 - ** sum of the top 20 RAM consumption
156
157CI results
158==========
159
160A daily Frankfurt CI chain has been created after RC0.
161
162The evolution of the full healthcheck test suite can be described as follows:
163
164|image1|
165
166Full healthcheck testsuite verifies the status of each component. It is
167composed of 47 tests. The success rate from the 9th to the 28th was never under
16895%.
169
1704 test categories were defined:
171
172- infrastructure healthcheck: test of ONAP kubernetes cluster and help chart status
173- healthcheck tests: verification of the components in the target deployment
174 environment
175- smoke tests: basic VM tests (including onboarding/distribution/instantiation),
176 and automated use cases (pnf-registrate, hvves, 5gbulkpm)
177- security tests
178
179The security target (66% for Frankfurt) was reached after the RC1. A regression
180due to the automation of the hvves use case (triggering the exposition of a
181public port in HTTP) was fixed on the 28th of May.
182
183|image2|
184
185Orange Openlab
186==============
187
188The Orange Openlab is a community lab targeting ONAP end user. It provides an
189ONAP and cloud resources to discover ONAP.
190A Frankfurt pre-RC0 version was installed beginning of May. The usual gating
191testing suite was run daily in addition of the traffic generated by the lab
192users. The VM instantiation has been working well without any reinstallation
193over the **27** last days.
Gary Wue8c445a2018-11-29 11:54:52 -0800194
195Resilience
196==========
197
mrichommeac588d62020-05-28 22:38:30 +0200198The resilience test executed in El Alto was not realized in Frankfurt.
Gary Wue8c445a2018-11-29 11:54:52 -0800199
mrichommeac588d62020-05-28 22:38:30 +0200200.. |image1| image:: files/s3p/daily_frankfurt1.png
201 :width: 6.5in
Gary Wue8c445a2018-11-29 11:54:52 -0800202
mrichommeac588d62020-05-28 22:38:30 +0200203.. |image2| image:: files/s3p/daily_frankfurt2.png
204 :width: 6.5in