docs/integration-s3p.rst - onap/integration - Gitiles

 .. _integration-s3p:

 :orphan:

 ONAP Maturity Testing Notes
 ---------------------------

 .. important::
     The Release stability has been evaluated by:

     - The Daily Guilin CI/CD chain
     - A simple 24h healthcheck verification
     - A 7 days stability test

 .. note:
     The scope of these tests remains limited and does not provide a full set of
     KPIs to determinate the limits and the dimensioning of the ONAP solution.

 CI results
 ==========

 As usual, a daily CI chain dedicated to the release is created after RC0.
 A Daily Guilin has been created on the 18th of November 2020.

 Unfortunately several technical issues disturbed the chain:

 - Due to policy changes in DockerHub (new quotas), the installation chain was
   not stable as the quota limit was rapidly reached. As a consequence the
   installation was incomplete and most of the tests were failing. The problem
   was fixed by the subscription of unlimitted account on DockerHub.
 - Due to an upgrade of the Git Jenkins plugin done by LF IT, the synchronization
   of the miror of the xtesting repository, used daily to generate the test suite
   dockers was corrupted. The dockers were built daily from Jenkins but with an
   id from the 25th of September. As a consequence the tests reported lots of
   failure because they were corresponding to Frankfurt tests without the
   adaptations done for Guilin. The problem was fixed temporarily by moving to
   GitLab.com Docker registry then by the downgrade of the plugin executed by LF
   IT during Thanksgiving break.

 The first week of the Daily Guilin results are therefore not really usable.
 Most of the results from the `daily Guilin result portal
 <https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/>`_
 are not trustable and may be misleading.
 The results became more stable from the the 6th of December.

 The graphs given hereafter are based on the data collected until the 8th of
 december. This Daily chain will be maintained during the Honolulu development
 cycle (Daily Master) and can be audited at any time. In case of reproducible
 errors, the integration team will open JIRA on Guilin.

 Several public Daily Guilin chains have been put in place, one in Orange
 (Helm v2) and one in DT (Helm v3). DT results are pushed in the test DB and can
 be observed in
 `ONAP Testing DT lab result page <http://testresults.opnfv.org/onap-integration/dt/dt.html>`_.

 Infrastructure Healthcheck Tests
 ................................

 These tests deal with the Kubernetes/Helm tests on ONAP cluster.
 The global expected criteria is **50%** when installing with Helm 2.
 The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in
 kubernetes are expected to be PASS but two tests are expected to fail:

 - onap-helm (32/33 OK) due to the size of the SO helm chart (too big for Helm2).
 - nodeport_check_certs due to bad certificate issuers (Root CA certificate non
   valid). In theory all the certificate shall be generated during the installation
   and be valid for the 364 days after the installation. It is still not the case.
   However, for the first time, no certificate was expired. Next certificates to
   renew are:
     - Music (2021-02-03)
     - VID (2021-03-17)
     - Message-router-external (2021-03-25)
     - CDS-UI (2021-02-18)
     - AAI and AAI-SPARKY-BE (2021-03-17)

 .. image:: files/s3p/guilin_daily_infrastructure_healthcheck.png
    :align: center

 Healthcheck Tests
 .................

 These tests are the traditionnal robot healthcheck tests and additional tests
 dealing with a single component.

 The expectation is **100% OK**.

 .. image:: files/s3p/guilin_daily_healthcheck.png
   :align: center

 Smoke Tests
 ...........

 These tests are end to end tests.
 See the :ref:`the Integration Test page <integration-tests>` for details.

 The expectation is **100% OK**.

 .. figure:: files/s3p/guilin_daily_smoke.png
   :align: center

 An error has been detected on the SDC when performing parallel tests.
 See `SDC-3366 <https://jira.onap.org/browse/SDC-3366>`_ for details.

 Security Tests
 ..............

 These tests are tests dealing with security.
 See the  :ref:`the Integration Test page <integration-tests>` for details.

 The expectation is **66% OK**. The criteria is met.

 It may even be above as 2 fail tests are almost correct:

 - the unlimited pod test is still fail due to only one pod: onap-ejbca.
 - the nonssl tests is FAIL due to so and os-vnfm adapter, which were supposed to
   be managed with the ingress (not possible for this release) and got a waiver
   in Frankfurt.

 .. figure:: files/s3p/guilin_daily_security.png
   :align: center

 A simple 24h healthcheck verification
 =====================================

 This test consists in running the Healthcheck tests every 10 minutes during
 24h.

 The test was run from the 6th of december to the 7th of december.

 The success rate was 100%.

 The results are stored in the
 `test database <http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full>`_

 A 6 days stability test
 =======================

 This test consists on running the test basic_vm continuously during 1 week.

 We observe the cluster metrics as well as the evolution of the test duration.
 The test basic_vm is describe in :ref:`the Integration Test page <integration-tests>`.

 Within a long duration test context, the test will onboard a service once then
 instantiate this service multiple times. Before instantiating, it will
 systematically contact the SDC and the AAI to verify that the resources already
 exist. In this context the most impacted component is SO, which was delivered
 relatively late compared to the other components.

 Basic_vm test
 .............

 The basic_vm test consists in the different following steps:

 - [SDC] VendorOnboardStep: Onboard vendor in SDC.
 - [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
 - [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC.
 - [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
   in SDC.
 - [AAI] RegisterCloudRegionStep: Register cloud region.
 - [AAI] ComplexCreateStep: Create complex.
 - [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex.
 - [AAI] CustomerCreateStep: Create customer.
 - [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service
   subscription.
 - [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with
   cloud region.
 - [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described
   in YAML using SO a'la carte method.
 - [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML
   using SO a'la carte method.
 - [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module
   described in YAML using SO a'la carte method.

 The test has been initiated on a weekly lab on the 2nd of december.
 The results provided hereafter correspond to the period from 2020-12-02 to
 2020-12-08.

 .. csv-table:: Basic_vm results
    :file: ./files/csv/stability_basic_vm.csv
    :widths: 70, 30
    :delim: ;
    :header-rows: 1

 .. note::

    The corrected success rate excludes the FAIL results obtained during the SDNC
    saturation phase.
    The cause of the errors shall be analyzed more in details. The huge majority of
    errors (79%) occurs on SO service creation, 18% on VNF creation and 3% on
    module creation.

 .. important::
    The test success rate is about 86%.
    CPU consumption is low (see next section).
    Memory consumption is high.

    After ~ 24-48h, the test is systematically FAIL. The trace shows that the SDNC
    is no more responding. This error required the manual restart of the SDNC.
    It seems that the SDNC exceeds its limits set in OOM. The simple manual
    restart (delete of the pod was enough, the test after the restart is PASS,
    and keep most of the time PASS for the next 24-48h)

 We can observe the consequences of the manual restart of the SDNC on its memory
 graph as well as the memory threshold.

 .. figure:: files/s3p/stability_sdnc_memory.png
   :align: center

 The duration of the test is increasing slowly over the week and can be described
 as follows:

 .. figure:: files/s3p/basic_vm_duration.png
   :align: center

 If we consider the histogram, we can see the distribution of the duration.

 .. figure:: files/s3p/basic_vm_duration_histo.png
   :align: center

 As a conclusion, the solution seems stable.

 The memory issue detected in the SDNC may be due to a bad sizing of the limits
 and requests in OOM but a problem of light memory leak cannot be exclude.
 The workaround consisting in restarting of the SDNC seems to fix the issue.
 The issue is tracked in `SDNC-1430 <https://jira.onap.org/browse/SDNC-1430>`_.
 Further study shall be done on this topic to consildate the detection of the
 root cause.

 Cluster metrics
 ...............

 The Metrics of the ONAP cluster on this 6 days period are given by the
 following tables:

 .. csv-table:: CPU
    :file: ./files/csv/stability_cluster_metric_cpu.csv
    :widths: 20,10,10,10,10,10,10,10
    :delim: ;
    :header-rows: 1

 .. csv-table:: Memory
   :file: ./files/csv/stability_cluster_metric_memory.csv
   :widths: 20,10,10,10,10,10,10,10
   :delim: ;
   :header-rows: 1

 .. csv-table:: Network
    :file: ./files/csv/stability_cluster_metric_network.csv
    :widths: 10,15,15,15,15,15,15
    :delim: ;
    :header-rows: 1

 The Top Ten for Memory consumption is given in the table below:

 .. csv-table:: Memory
   :file: ./files/csv/stability_top10_memory.csv
   :widths: 20,15,15,20,15,15
   :delim: ;
   :header-rows: 1

 At least 9 components exceeds their Memory Requests. And 7 are over the Memory
 limits set in OOM: the 2 Opendaylight controllers and the cassandra Databases.

 As indicated CPU consumption is negligeable and not dimensioning.
 It shall be reconsider for use cases including extensive computation (loops,
 optimization algorithms).
	.. _integration-s3p:

	:orphan:

	ONAP Maturity Testing Notes
	---------------------------

	.. important::
	The Release stability has been evaluated by:

	- The Daily Guilin CI/CD chain
	- A simple 24h healthcheck verification
	- A 7 days stability test

	.. note:
	The scope of these tests remains limited and does not provide a full set of
	KPIs to determinate the limits and the dimensioning of the ONAP solution.

	CI results
	==========

	As usual, a daily CI chain dedicated to the release is created after RC0.
	A Daily Guilin has been created on the 18th of November 2020.

	Unfortunately several technical issues disturbed the chain:

	- Due to policy changes in DockerHub (new quotas), the installation chain was
	not stable as the quota limit was rapidly reached. As a consequence the
	installation was incomplete and most of the tests were failing. The problem
	was fixed by the subscription of unlimitted account on DockerHub.
	- Due to an upgrade of the Git Jenkins plugin done by LF IT, the synchronization
	of the miror of the xtesting repository, used daily to generate the test suite
	dockers was corrupted. The dockers were built daily from Jenkins but with an
	id from the 25th of September. As a consequence the tests reported lots of
	failure because they were corresponding to Frankfurt tests without the
	adaptations done for Guilin. The problem was fixed temporarily by moving to
	GitLab.com Docker registry then by the downgrade of the plugin executed by LF
	IT during Thanksgiving break.

	The first week of the Daily Guilin results are therefore not really usable.
	Most of the results from the `daily Guilin result portal
	<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/>`_
	are not trustable and may be misleading.
	The results became more stable from the the 6th of December.

	The graphs given hereafter are based on the data collected until the 8th of
	december. This Daily chain will be maintained during the Honolulu development
	cycle (Daily Master) and can be audited at any time. In case of reproducible
	errors, the integration team will open JIRA on Guilin.

	Several public Daily Guilin chains have been put in place, one in Orange
	(Helm v2) and one in DT (Helm v3). DT results are pushed in the test DB and can
	be observed in
	`ONAP Testing DT lab result page <http://testresults.opnfv.org/onap-integration/dt/dt.html>`_.

	Infrastructure Healthcheck Tests
	................................

	These tests deal with the Kubernetes/Helm tests on ONAP cluster.
	The global expected criteria is 50% when installing with Helm 2.
	The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in
	kubernetes are expected to be PASS but two tests are expected to fail:

	- onap-helm (32/33 OK) due to the size of the SO helm chart (too big for Helm2).
	- nodeport_check_certs due to bad certificate issuers (Root CA certificate non
	valid). In theory all the certificate shall be generated during the installation
	and be valid for the 364 days after the installation. It is still not the case.
	However, for the first time, no certificate was expired. Next certificates to
	renew are:
	- Music (2021-02-03)
	- VID (2021-03-17)
	- Message-router-external (2021-03-25)
	- CDS-UI (2021-02-18)
	- AAI and AAI-SPARKY-BE (2021-03-17)

	.. image:: files/s3p/guilin_daily_infrastructure_healthcheck.png
	:align: center

	Healthcheck Tests
	.................

	These tests are the traditionnal robot healthcheck tests and additional tests
	dealing with a single component.

	The expectation is 100% OK.

	.. image:: files/s3p/guilin_daily_healthcheck.png
	:align: center

	Smoke Tests
	...........

	These tests are end to end tests.
	See the :ref:`the Integration Test page <integration-tests>` for details.

	The expectation is 100% OK.

	.. figure:: files/s3p/guilin_daily_smoke.png
	:align: center

	An error has been detected on the SDC when performing parallel tests.
	See `SDC-3366 <https://jira.onap.org/browse/SDC-3366>`_ for details.

	Security Tests
	..............

	These tests are tests dealing with security.
	See the :ref:`the Integration Test page <integration-tests>` for details.

	The expectation is 66% OK. The criteria is met.

	It may even be above as 2 fail tests are almost correct:

	- the unlimited pod test is still fail due to only one pod: onap-ejbca.
	- the nonssl tests is FAIL due to so and os-vnfm adapter, which were supposed to
	be managed with the ingress (not possible for this release) and got a waiver
	in Frankfurt.

	.. figure:: files/s3p/guilin_daily_security.png
	:align: center

	A simple 24h healthcheck verification
	=====================================

	This test consists in running the Healthcheck tests every 10 minutes during
	24h.

	The test was run from the 6th of december to the 7th of december.

	The success rate was 100%.

	The results are stored in the
	`test database <http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full>`_

	A 6 days stability test
	=======================

	This test consists on running the test basic_vm continuously during 1 week.

	We observe the cluster metrics as well as the evolution of the test duration.
	The test basic_vm is describe in :ref:`the Integration Test page <integration-tests>`.

	Within a long duration test context, the test will onboard a service once then
	instantiate this service multiple times. Before instantiating, it will
	systematically contact the SDC and the AAI to verify that the resources already
	exist. In this context the most impacted component is SO, which was delivered
	relatively late compared to the other components.

	Basic_vm test
	.............

	The basic_vm test consists in the different following steps:

	- [SDC] VendorOnboardStep: Onboard vendor in SDC.
	- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
	- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC.
	- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
	in SDC.
	- [AAI] RegisterCloudRegionStep: Register cloud region.
	- [AAI] ComplexCreateStep: Create complex.
	- [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex.
	- [AAI] CustomerCreateStep: Create customer.
	- [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service
	subscription.
	- [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with
	cloud region.
	- [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described
	in YAML using SO a'la carte method.
	- [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML
	using SO a'la carte method.
	- [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module
	described in YAML using SO a'la carte method.

	The test has been initiated on a weekly lab on the 2nd of december.
	The results provided hereafter correspond to the period from 2020-12-02 to
	2020-12-08.

	.. csv-table:: Basic_vm results
	:file: ./files/csv/stability_basic_vm.csv
	:widths: 70, 30
	:delim: ;
	:header-rows: 1

	.. note::

	The corrected success rate excludes the FAIL results obtained during the SDNC
	saturation phase.
	The cause of the errors shall be analyzed more in details. The huge majority of
	errors (79%) occurs on SO service creation, 18% on VNF creation and 3% on
	module creation.

	.. important::
	The test success rate is about 86%.
	CPU consumption is low (see next section).
	Memory consumption is high.

	After ~ 24-48h, the test is systematically FAIL. The trace shows that the SDNC
	is no more responding. This error required the manual restart of the SDNC.
	It seems that the SDNC exceeds its limits set in OOM. The simple manual
	restart (delete of the pod was enough, the test after the restart is PASS,
	and keep most of the time PASS for the next 24-48h)

	We can observe the consequences of the manual restart of the SDNC on its memory
	graph as well as the memory threshold.

	.. figure:: files/s3p/stability_sdnc_memory.png
	:align: center

	The duration of the test is increasing slowly over the week and can be described
	as follows:

	.. figure:: files/s3p/basic_vm_duration.png
	:align: center

	If we consider the histogram, we can see the distribution of the duration.

	.. figure:: files/s3p/basic_vm_duration_histo.png
	:align: center

	As a conclusion, the solution seems stable.

	The memory issue detected in the SDNC may be due to a bad sizing of the limits
	and requests in OOM but a problem of light memory leak cannot be exclude.
	The workaround consisting in restarting of the SDNC seems to fix the issue.
	The issue is tracked in `SDNC-1430 <https://jira.onap.org/browse/SDNC-1430>`_.
	Further study shall be done on this topic to consildate the detection of the
	root cause.

	Cluster metrics
	...............

	The Metrics of the ONAP cluster on this 6 days period are given by the
	following tables:

	.. csv-table:: CPU
	:file: ./files/csv/stability_cluster_metric_cpu.csv
	:widths: 20,10,10,10,10,10,10,10
	:delim: ;
	:header-rows: 1

	.. csv-table:: Memory
	:file: ./files/csv/stability_cluster_metric_memory.csv
	:widths: 20,10,10,10,10,10,10,10
	:delim: ;
	:header-rows: 1

	.. csv-table:: Network
	:file: ./files/csv/stability_cluster_metric_network.csv
	:widths: 10,15,15,15,15,15,15
	:delim: ;
	:header-rows: 1

	The Top Ten for Memory consumption is given in the table below:

	.. csv-table:: Memory
	:file: ./files/csv/stability_top10_memory.csv
	:widths: 20,15,15,20,15,15
	:delim: ;
	:header-rows: 1

	At least 9 components exceeds their Memory Requests. And 7 are over the Memory
	limits set in OOM: the 2 Opendaylight controllers and the cassandra Databases.

	As indicated CPU consumption is negligeable and not dimensioning.
	It shall be reconsider for use cases including extensive computation (loops,
	optimization algorithms).