blob: b8e5d1bb9db57f7d2f4e2892b3c740f14c4b202b [file] [log] [blame]
Roger Maitland953b5f12018-03-22 15:24:04 -04001.. This work is licensed under a Creative Commons Attribution 4.0 International License.
2.. http://creativecommons.org/licenses/by/4.0
3.. Copyright 2018 Amdocs, Bell Canada
4
5.. Links
6.. _Curated applications for Kubernetes: https://github.com/kubernetes/charts
7.. _Services: https://kubernetes.io/docs/concepts/services-networking/service/
8.. _ReplicaSet: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
9.. _StatefulSet: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
10.. _Helm Documentation: https://docs.helm.sh/helm/
11.. _Helm: https://docs.helm.sh/
12.. _Kubernetes: https://Kubernetes.io/
13
14.. _user-guide-label:
15
16OOM User Guide
17##############
18
19The ONAP Operations Manager (OOM) provide the ability to manage the entire
20life-cycle of an ONAP installation, from the initial deployment to final
21decommissioning. This guide provides instructions for users of ONAP to
22use the Kubernetes_/Helm_ system as a complete ONAP management system.
23
24This guide provides many examples of Helm command line operations. For a
25complete description of these commands please refer to the `Helm
26Documentation`_.
27
28.. figure:: oomLogoV2-medium.png
29 :align: right
30
31The following sections describe the life-cycle operations:
32
33- Deploy_ - with built-in component dependency management
34- Configure_ - unified configuration across all ONAP components
35- Monitor_ - real-time health monitoring feeding to a Consul UI and Kubernetes
36- Heal_- failed ONAP containers are recreated automatically
37- Scale_ - cluster ONAP services to enable seamless scaling
38- Upgrade_ - change-out containers or configuration with little or no service impact
39- Delete_ - cleanup individual containers or entire deployments
40
41.. figure:: oomLogoV2-Deploy.png
42 :align: right
43
44Deploy
45======
46
47The OOM team with assistance from the ONAP project teams, have built a
48comprehensive set of Helm charts, yaml files very similar to TOSCA files, that
49describe the composition of each of the ONAP components and the relationship
50within and between components. Using this model Helm is able to deploy all of
51ONAP this simple command::
52
53 > helm install osn/onap
54
55.. note::
56 The osn repo is not currently available so creation of a local repository is
57 required.
58
59Helm is able to use charts served up from a repository and comes setup with a
60default CNCF provided `Curated applications for Kubernetes`_ repository called
61stable which should be removed to avoid confusion::
62
63 > helm repo remove stable
64
65.. To setup the Open Source Networking Nexus repository for helm enter::
66.. > helm repo add osn 'https://nexus3.onap.org:10001/helm/helm-repo-in-nexus/master/'
67
68To prepare your system for an installation of ONAP, you'll need to::
69
70 > git clone http://gerrit.onap.org/r/oom
71 > cd kubernetes
72
73Then build your local Helm repository::
74
75 > make all
76
77To setup a local Helm server to server up the ONAP charts::
78
79 > helm serve &
80
81Note the port number that is listed and use it in the Helm repo add as follows::
82
83 > helm repo add local http://127.0.0.1:8879
84
85To get a list of all of the available Helm chart repositories::
86
87 > helm repo list
88 NAME URL
89 local http://127.0.0.1:8879
90
91The Helm search command reads through all of the repositories configured on the
92system, and looks for matches::
93
94 > helm search -l
95 NAME VERSION DESCRIPTION
96 local/appc 2.0.0 Application Controller
97 local/clamp 2.0.0 ONAP Clamp
98 local/common 2.0.0 Common templates for inclusion in other charts
99 local/onap 2.0.0 Open Network Automation Platform (ONAP)
100 local/robot 2.0.0 A helm Chart for kubernetes-ONAP Robot
101 local/so 2.0.0 ONAP Service Orchestrator
102
103In any case, setup of the Helm repository is a one time activity.
104
105Once the repo is setup, installation of ONAP can be done with a single command::
106
107 > helm install local/onap -name development
108
109This will install ONAP from a local repository in a 'development' Helm release.
110As described below, to override the default configuration values provided by
111OOM, an environment file can be provided on the command line as follows::
112
113 > helm install local/onap -name development -f onap-development.yaml
114
115To get a summary of the status of all of the pods (containers) running in your
116deployment::
117
118 > kubectl get pods --all-namespaces -o=wide
119
120.. note::
121 The Kubernetes namespace concept allows for multiple instances of a component
122 (such as all of ONAP) to co-exist with other components in the same
123 Kubernetes cluster by isolating them entirely. Namespaces share only the
124 hosts that form the cluster thus providing isolation between production and
125 development systems as an example. The OOM deployment of ONAP in Beijing is
126 now done within a single Kubernetes namespace where in Amsterdam a namespace
127 was created for each of the ONAP components.
128
129.. note::
130 The Helm `-name` option refers to a release name and not a Kubernetes namespace.
131
132
133To install a specific version of a single ONAP component (`so` in this example)
134with the given name enter::
135
136 > helm install onap/so --version 2.0.1 -n so
137
138To display details of a specific resource or group of resources type::
139
140 > kubectl describe pod so-1071802958-6twbl
141
142where the pod identifier refers to the auto-generated pod identifier.
143
144.. figure:: oomLogoV2-Configure.png
145 :align: right
146
147Configure
148=========
149
150Each project within ONAP has its own configuration data generally consisting
151of: environment variables, configuration files, and database initial values.
152Many technologies are used across the projects resulting in significant
153operational complexity and an inability to apply global parameters across the
154entire ONAP deployment. OOM solves this problem by introducing a common
155configuration technology, Helm charts, that provide a hierarchical
156configuration configuration with the ability to override values with higher
157level charts or command line options.
158
159The structure of the configuration of ONAP is shown in the following diagram.
160Note that key/value pairs of a parent will always take precedence over those
161of a child. Also note that values set on the command line have the highest
162precedence of all.
163
164.. graphviz::
165
166 digraph config {
167 {
168 node [shape=folder]
169 oValues [label="values.yaml"]
170 demo [label="onap-demo.yaml"]
171 prod [label="onap-production.yaml"]
172 oReq [label="requirements.yaml"]
173 soValues [label="values.yaml"]
174 soReq [label="requirements.yaml"]
175 mdValues [label="values.yaml"]
176 }
177 {
178 oResources [label="resources"]
179 }
180 onap -> oResources
181 onap -> oValues
182 oResources -> environments
183 oResources -> oReq
184 oReq -> so
185 environments -> demo
186 environments -> prod
187 so -> soValues
188 so -> soReq
189 so -> charts
190 charts -> mariadb
191 mariadb -> mdValues
192
193 }
194
195The top level onap/values.yaml file contains the values required to be set
196before deploying ONAP. Here is the contents of this file:
197
198.. include:: onap_values.yaml
199 :code: yaml
200
201One may wish to create a value file that is specific to a given deployment such
202that it can be differentiated from other deployments. For example, a
203onap-development.yaml file may create a minimal environment for development
204while onap-production.yaml might describe a production deployment that operates
205independently of the developer version.
206
207For example, if the production OpenStack instance was different from a
208developer's instance, the onap-production.yaml file may contain a different
209value for the vnfDeployment/openstack/oam_network_cidr key as shown below.
210
211.. code-block:: yaml
212
213 nsPrefix: onap
214 nodePortPrefix: 302
215 apps: consul msb mso message-router sdnc vid robot portal policy appc aai
216 sdc dcaegen2 log cli multicloud clamp vnfsdk aaf kube2msb
217 dataRootDir: /dockerdata-nfs
218
219 # docker repositories
220 repository:
221 onap: nexus3.onap.org:10001
222 oom: oomk8s
223 aai: aaionap
224 filebeat: docker.elastic.co
225
226 image:
227 pullPolicy: Never
228
229 # vnf deployment environment
230 vnfDeployment:
231 openstack:
232 ubuntu_14_image: "Ubuntu_14.04.5_LTS"
233 public_net_id: "e8f51956-00dd-4425-af36-045716781ffc"
234 oam_network_id: "d4769dfb-c9e4-4f72-b3d6-1d18f4ac4ee6"
235 oam_subnet_id: "191f7580-acf6-4c2b-8ec0-ba7d99b3bc4e"
236 oam_network_cidr: "192.168.30.0/24"
237 <...>
238
239
240To deploy ONAP with this environment file, enter::
241
242 > helm install local/onap -n beijing -f environments/onap-production.yaml
243
244.. include:: environments_onap_demo.yaml
245 :code: yaml
246
247When deploying all of ONAP a requirements.yaml file control which and what
248version of the ONAP components are included. Here is an excerpt of this
249file:
250
251.. code-block:: yaml
252
253 # Referencing a named repo called 'local'.
254 # Can add this repo by running commands like:
255 # > helm serve
256 # > helm repo add local http://127.0.0.1:8879
257 dependencies:
258 <...>
259 - name: so
260 version: ~2.0.0
261 repository: '@local'
262 condition: so.enabled
263 <...>
264
265The ~ operator in the `so` version value indicates that the latest "2.X.X"
266version of `so` shall be used thus allowing the chart to allow for minor
267upgrades that don't impact the so API; hence, version 2.0.1 will be installed
268in this case.
269
270The onap/resources/environment/onap-dev.yaml (see the excerpt below) enables
271for fine grained control on what components are included as part of this
272deployment. By changing this `so` line to `enabled: false` the `so` component
273will not be deployed. If this change is part of an upgrade the existing `so`
274component will be shut down. Other `so` parameters and even `so` child values
275can be modified, for example the `so`'s `liveness` probe could be disabled
276(which is not recommended as this change would disable auto-healing of `so`).
277
278.. code-block:: yaml
279
280 #################################################################
281 # Global configuration overrides.
282 #
283 # These overrides will affect all helm charts (ie. applications)
284 # that are listed below and are 'enabled'.
285 #################################################################
286 global:
287 <...>
288
289 #################################################################
290 # Enable/disable and configure helm charts (ie. applications)
291 # to customize the ONAP deployment.
292 #################################################################
293 aaf:
294 enabled: false
295 <...>
296 so: # Service Orchestrator
297 enabled: true
298
299 replicaCount: 1
300
301 liveness:
302 # necessary to disable liveness probe when setting breakpoints
303 # in debugger so K8s doesn't restart unresponsive container
304 enabled: true
305
306 <...>
307
308.. figure:: oomLogoV2-Monitor.png
309 :align: right
310
311Monitor
312=======
313
314All highly available systems include at least one facility to monitor the
315health of components within the system. Such health monitors are often used as
316inputs to distributed coordination systems (such as etcd, zookeeper, or consul)
317and monitoring systems (such as nagios or zabbix). OOM provides two mechanims
318to monitor the real-time health of an ONAP deployment:
319
320- a Consul GUI for a human operator or downstream monitoring systems and
321 Kubernetes liveness probes that enable automatic healing of failed
322 containers, and
323- a set of liveness probes which feed into the Kubernetes manager which
324 are described in the Heal section.
325
326Within ONAP Consul is the monitoring system of choice and deployed by OOM in two parts:
327
328- a three-way, centralized Consul server cluster is deployed as a highly
329 available monitor of all of the ONAP components,and
330- a number of Consul agents.
331
332The Consul server provides a user interface that allows a user to graphically
333view the current health status of all of the ONAP components for which agents
334have been created - a sample from the ONAP Integration labs follows:
335
336.. figure:: consulHealth.png
337 :align: center
338
339To see the real-time health of a deployment go to: http://<kubernetes IP>:30270/ui/
340where a GUI much like the following will be found:
341
342
343.. figure:: oomLogoV2-Heal.png
344 :align: right
345
346Heal
347====
348
349The ONAP deployment is defined by Helm charts as mentioned earlier. These Helm
350charts are also used to implement automatic recoverability of ONAP components
351when individual components fail. Once ONAP is deployed, a "liveness" probe
352starts checking the health of the components after a specified startup time.
353
354Should a liveness probe indicate a failed container it will be terminated and a
355replacement will be started in its place - containers are ephemeral. Should the
356deployment specification indicate that there are one or more dependencies to
357this container or component (for example a dependency on a database) the
358dependency will be satisfied before the replacement container/component is
359started. This mechanism ensures that, after a failure, all of the ONAP
360components restart successfully.
361
362To test healing, the following command can be used to delete a pod::
363
364 > kubectl delete pod [pod name] -n [pod namespace]
365
366One could then use the following command to monitor the pods and observe the
367pod being terminated and the service being automatically healed with the
368creation of a replacement pod::
369
370 > kubectl get pods --all-namespaces -o=wide
371
372.. figure:: oomLogoV2-Scale.png
373 :align: right
374
375Scale
376=====
377
378Many of the ONAP components are horizontally scalable which allows them to
379adapt to expected offered load. During the Beijing release scaling is static,
380that is during deployment or upgrade a cluster size is defined and this cluster
381will be maintained even in the presence of faults. The parameter that controls
382the cluster size of a given component is found in the values.yaml file for that
383component. Here is an excerpt that shows this parameter:
384
385.. code-block:: yaml
386
387 # default number of instances
388 replicaCount: 1
389
390In order to change the size of a cluster, an operator could use a helm upgrade
391(described in detail in the next section) as follows::
392
393 > helm upgrade --set replicaCount=3 onap/so/mariadb
394
395The ONAP components use Kubernetes provided facilities to build clustered,
396highly available systems including: Services_ with load-balancers, ReplicaSet_,
397and StatefulSet_. Some of the open-source projects used by the ONAP components
398directly support clustered configurations, for example ODL and MariaDB Galera.
399
400The Kubernetes Services_ abstraction to provide a consistent access point for
401each of the ONAP components, independent of the pod or container architecture
402of that component. For example, SDN-C uses OpenDaylight clustering with a
403default cluster size of three but uses a Kubernetes service to and change the
404number of pods in this abstract this cluster from the other ONAP components
405such that the cluster could change size and this change is isolated from the
406other ONAP components by the load-balancer implemented in the ODL service
407abstraction.
408
409A ReplicaSet_ is a construct that is used to describe the desired state of the
410cluster. For example 'replicas: 3' indicates to Kubernetes that a cluster of 3
411instances is the desired state. Should one of the members of the cluster fail,
412a new member will be automatically started to replace it.
413
414Some of the ONAP components many need a more deterministic deployment; for
415example to enable intra-cluster communication. For these applications the
416component can be deployed as a Kubernetes StatefulSet_ which will maintain a
417persistent identifier for the pods and thus a stable network id for the pods.
418For example: the pod names might be web-0, web-1, web-{N-1} for N 'web' pods
419with corresponding DNS entries such that intra service communication is simple
420even if the pods are physically distributed across multiple nodes. An example
421of how these capabilities can be used is described in the Running Consul on
422Kubernetes tutorial.
423
424.. figure:: oomLogoV2-Upgrade.png
425 :align: right
426
427Upgrade
428=======
429
430Helm has built-in capabilities to enable the upgrade of pods without causing a
431loss of the service being provided by that pod or pods (if configured as a
432cluster). As described in the OOM Developer's Guide, ONAP components provide
433an abstracted 'service' end point with the pods or containers providing this
434service hidden from other ONAP components by a load balancer. This capability
435is used during upgrades to allow a pod with a new image to be added to the
436service before removing the pod with the old image. This 'make before break'
437capability ensures minimal downtime.
438
439Prior to doing an upgrade, determine of the status of the deployed charts::
440
441 > helm list
442 NAME REVISION UPDATED STATUS CHART NAMESPACE
443 so 1 Mon Feb 5 10:05:22 2018 DEPLOYED so-2.0.1 default
444
445When upgrading a cluster a parameter controls the minimum size of the cluster
446during the upgrade while another parameter controls the maximum number of nodes
447in the cluster. For example, SNDC configured as a 3-way ODL cluster might
448require that during the upgrade no fewer than 2 pods are available at all times
449to provide service while no more than 5 pods are ever deployed across the two
450versions at any one time to avoid depleting the cluster of resources. In this
451scenario, the SDNC cluster would start with 3 old pods then Kubernetes may add
452a new pod (3 old, 1 new), delete one old (2 old, 1 new), add two new pods (2
453old, 3 new) and finally delete the 2 old pods (3 new). During this sequence
454the constraints of the minimum of two pods and maximum of five would be
455maintained while providing service the whole time.
456
457Initiation of an upgrade is triggered by changes in the Helm charts. For
458example, if the image specified for one of the pods in the SDNC deployment
459specification were to change (i.e. point to a new Docker image in the nexus3
460repository - commonly through the change of a deployment variable), the
461sequence of events described in the previous paragraph would be initiated.
462
463For example, to upgrade a container by changing configuration, specifically an
464environment value::
465
466 > helm upgrade beijing onap/so --version 2.0.1 --set enableDebug=true
467
468Issuing this command will result in the appropriate container being stopped by
469Kubernetes and replaced with a new container with the new environment value.
470
471To upgrade a component to a new version with a new configuration file enter::
472
473 > helm upgrade beijing onap/so --version 2.0.2 -f environments/demo.yaml
474
475To fetch release history enter::
476
477 > helm history so
478 REVISION UPDATED STATUS CHART DESCRIPTION
479 1 Mon Feb 5 10:05:22 2018 SUPERSEDED so-2.0.1 Install complete
480 2 Mon Feb 5 10:10:55 2018 DEPLOYED so-2.0.2 Upgrade complete
481
482Unfortunately, not all upgrades are successful. In recognition of this the
483lineup of pods within an ONAP deployment is tagged such that an administrator
484may force the ONAP deployment back to the previously tagged configuration or to
485a specific configuration, say to jump back two steps if an incompatibility
486between two ONAP components is discovered after the two individual upgrades
487succeeded.
488
489This rollback functionality gives the administrator confidence that in the
490unfortunate circumstance of a failed upgrade the system can be rapidly brought
491back to a known good state. This process of rolling upgrades while under
492service is illustrated in this short YouTube video showing a Zero Downtime
493Upgrade of a web application while under a 10 million transaction per second
494load.
495
496For example, to roll-back back to previous system revision enter::
497
498 > helm rollback so 1
499
500 > helm history so
501 REVISION UPDATED STATUS CHART DESCRIPTION
502 1 Mon Feb 5 10:05:22 2018 SUPERSEDED so-2.0.1 Install complete
503 2 Mon Feb 5 10:10:55 2018 SUPERSEDED so-2.0.2 Upgrade complete
504 3 Mon Feb 5 10:14:32 2018 DEPLOYED so-2.0.1 Rollback to 1
505
506.. note::
507
508 The description field can be overridden to document actions taken or include
509 tracking numbers.
510
511Many of the ONAP components contain their own databases which are used to
512record configuration or state information. The schemas of these databases may
513change from version to version in such a way that data stored within the
514database needs to be migrated between versions. If such a migration script is
515available it can be invoked during the upgrade (or rollback) by Container
516Lifecycle Hooks. Two such hooks are available, PostStart and PreStop, which
517containers can access by registering a handler against one or both. Note that
518it is the responsibility of the ONAP component owners to implement the hook
519handlers - which could be a shell script or a call to a specific container HTTP
520endpoint - following the guidelines listed on the Kubernetes site. Lifecycle
521hooks are not restricted to database migration or even upgrades but can be used
522anywhere specific operations need to be taken during lifecycle operations.
523
524OOM uses Helm K8S package manager to deploy ONAP components. Each component is
525arranged in a packaging format called a chart - a collection of files that
526describe a set of k8s resources. Helm allows for rolling upgrades of the ONAP
527component deployed. To upgrade a component Helm release you will need an
528updated Helm chart. The chart might have modified, deleted or added values,
529deployment yamls, and more. To get the release name use::
530
531 > helm ls
532
533To easily upgrade the release use::
534
535 > helm upgrade [RELEASE] [CHART]
536
537To roll back to a previous release version use::
538
539 > helm rollback [flags] [RELEASE] [REVISION]
540
541For example, to upgrade the onap-so helm release to the latest SO container
542release v1.1.2:
543
544- Edit so values.yaml which is part of the chart
545- Change "so: nexus3.onap.org:10001/openecomp/so:v1.1.1" to
546 "so: nexus3.onap.org:10001/openecomp/so:v1.1.2"
547- From the chart location run::
548
549 > helm upgrade onap-so
550
551The previous so pod will be terminated and a new so pod with an updated so
552container will be created.
553
554.. figure:: oomLogoV2-Delete.png
555 :align: right
556
557Delete
558======
559
560Existing deployments can be partially or fully removed once they are no longer
561needed. To minimize errors it is recommended that before deleting components
562from a running deployment the operator perform a 'dry-run' to display exactly
563what will happen with a given command prior to actually deleting anything. For
564example::
565
566 > helm delete --dry-run beijing
567
568will display the outcome of deleting the 'beijing' release from the deployment.
569To completely delete a release and remove it from the internal store enter::
570
571 > helm delete --purge beijing
572
573One can also remove individual components from a deployment by changing the
574ONAP configuration values. For example, to remove `so` from a running
575deployment enter::
576
577 > helm upgrade beijing osn/onap --set so.enabled=false
578
579will remove `so` as the configuration indicates it's no longer part of the
580deployment. This might be useful if a one wanted to replace just `so` by
581installing a custom version.