| .. |
| .. Copyright (c) 2019 AT&T Intellectual Property. |
| .. Copyright (c) 2019 Nokia. |
| .. |
| .. Licensed under the Creative Commons Attribution 4.0 International |
| .. Public License (the "License"); you may not use this file except |
| .. in compliance with the License. You may obtain a copy of the License at |
| .. |
| .. https://creativecommons.org/licenses/by/4.0/ |
| .. |
| .. Unless required by applicable law or agreed to in writing, documentation |
| .. distributed under the License is distributed on an "AS IS" BASIS, |
| .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| .. |
| .. See the License for the specific language governing permissions and |
| .. limitations under the License. |
| .. |
| |
| User-Guide |
| ========== |
| |
| .. contents:: |
| :depth: 3 |
| :local: |
| |
| RIC Alarm System |
| ---------------- |
| |
| Overview |
| -------- |
| RIC alarm system consists of three components: Alarm Manager, Application Library and Command Line Interface |
| |
| The Alarm Manager is responsible for managing alarm situations in RIC cluster and interfacing with Northbound applications |
| such as Prometheus Alert Manager to post the alarms as alerts. Alert Manager takes care of de-duplicating, silencing and |
| inhibition (suppressing) of alerts, and routing them to the VES-Agent, which, in turn, takes care of converting alerts to |
| faults and sending them to ONAP as VES events. |
| |
| The Alarm Library provides a simple interface for RIC applications (both platform application and xApps) to raise and clear |
| alarms. The Alarm Library interacts with the Alarm Manager via RMR interface. |
| |
| .. image:: images/RIC_Alarm_System.png |
| :width: 600 |
| :alt: Place in RIC's software architecture picture |
| |
| |
| Alarm Manager |
| ------------- |
| The Alarm Manager listens alarms coming via RMR and REST interfaces. An application can raise or clear alarms via either |
| of interfaces. Alarm Manager listens also commands coming from CLI (Command Line Interface). In addition Alarm Manager supports few |
| other commands that can be given through the interfaces. Such as list active alarms, list alarm history, add new alarms |
| definition, delete existing alarm definition, re-raise alarms and clear all alarms. Those are not typically used by applications while |
| running. Alarm Manager itself re-raises alarms periodically to keep alarms in active state. The other commands are can be used through |
| CLI interface by operator or are used when applications is starting up or restarting. |
| |
| Maximum amount of active alarms and size of alarm history are configurable. By default, the values are Maximum number of active |
| alarms = 5000, Maximum number of alarm history = 20,000. |
| |
| Alarm definitions can be updated dynamically via REST interface. Default definitions are read from JSON configuration file when FM |
| service is deployed. |
| |
| |
| Alarm Library |
| ------------- |
| The Alarm Library provides simple interface for RIC applications (both platform application and xApps) to raise and clear |
| alarms. A new alarm instance is created with InitAlarm()-function. ManagedObject (mo) and Application (ap) identities are |
| given as parameters for Alarm Context/Object |
| |
| The Alarm object contains following parameters: |
| |
| SpecificProblem: problem that is the cause of the alarm \(* |
| |
| PerceivedSeverity: The severity of the alarm, see below for possible values |
| |
| ManagedObjectId: The name of the managed object that is the cause of the fault \(* |
| |
| ApplicationId: The name of the process raised the alarm \(* |
| |
| AdditionalInfo: Additional information given by the application |
| |
| IdentifyingInfo: Identifying additional information, which is part of alarm identity \(* |
| |
| Items marked with \*, i.e., ManagedObjectId (mo), SpecificProblem (sp), ApplicationId (ap) and IdentifyingInfo (IdentifyingInfo) make |
| up the identity of the alarm. All parameters must be according to the alarm definition, i.e. all mandatory parameters should be present, |
| and parameters should have correct value type or be from some predefined range. Addressing the same alarm instance in a clear() or reraise() |
| call is done by making sure that all four values are the same is in the original raise() / reraise() call. |
| |
| Alarm Manager does not allow raising "same alarm" more than once without that the alarm is cleared first. Alarm Manager compares |
| ManagedObjectId (mo), SpecificProblem (sp), ApplicationId (ap) and IdentifyingInfo (IdentifyingInfo) parameters to check possible |
| duplicate. If the values are the same then alarm is suppressed. If application raises the "same alarm" but PerceivedSeverity of the alarm |
| is changed then Alarm Manager deletes the old alarm and makes new alarm according to new information. |
| |
| |
| Alarm APIs |
| |
| Raise: Raises the alarm instance given as a parameter |
| |
| Clear: Clears the alarm instance given as a parameter, if it the alarm active |
| |
| Reraise: Attempts to re-raise the alarm instance given as a parameter |
| |
| ClearAll: Clears all alarms matching moId and appId given as parameters (not supported yet) |
| |
| |
| Command line interface |
| ---------------------- |
| |
| Through CLI operator can do the following operations: |
| |
| - Check active alarms |
| - Check alarm history |
| - Raise an alarm |
| - Clear an alarm |
| - Configure maximum active alarms and maximum alarms in alarm history |
| - Add new alarm definitions that can be raised |
| - Delete existing alarm definition that can be raised |
| |
| CLI commands need to be given inside Alarm Manger pod. To get there first print name of the Alarm Manger pod. |
| |
| .. code-block:: none |
| |
| kubectl get pods -A | grep alarmmanager |
| |
| Output should be look someting like this: |
| |
| .. code-block:: none |
| |
| ricplt deployment-ricplt-alarmmanager-6cc8764749-gnwjh 1/1 running 0 15d |
| |
| Then give this command to enter inside the pod. Replace the pod name with the actual name from the printout. |
| |
| .. code-block:: none |
| |
| kubectl exec -it deployment-ricplt-alarmmanager-6cc8764749-gnwjh bash |
| |
| CLI commands can have some of the following parameters |
| |
| .. code-block:: none |
| |
| --moid ManagedObjectId, example string: RIC |
| --apid ApplicationId string, example string: UEEC |
| --sp SpecificProblem, example value: 8007 |
| --severity Severity of the alarm, possible values: UNSPECIFIED, CRITICAL, MAJOR, MINOR, WARNING, CLEARED or DEFAULT |
| --iinfo Identifying info, a user specified string, example string: INFO-1 |
| --mal Maximum number of active alarms, example value 1000 |
| --mah Maximum number of alarms in alarm history, example value: 2000 |
| --aid Alarm id, example value: 8007 |
| --atx Alarm text string, example string: E2 CONNECTIVITY LOST TO E-NODEB |
| --ety Event type string, example string: Communication error |
| --oin Operation instructions string, example string: Not defined |
| --rad Raise alarm delay in seconds. Default value = 0 |
| --cad Clear alarm delay in seconds. Default value = 0 |
| --prf Performance profile id, possible values: 1 = peak performance test or 2 = endurance test |
| --nal Number of alarms, example value: 50 |
| --aps Alarms per second, example value: 1 |
| --tim Total time of test in minutes, example value: 1 |
| --host Alarm Manager host address. Default value = localhost |
| --port Alarm Manager port. Default value = 8080 |
| --if Used Alarm Manager command interface, http or rmr: default value = http |
| --active Active alerts in Prometheus Alert Manager. Default value = true |
| --inhibited Inhibited alerts in Prometheus Alert Manager. Default value = true |
| --silenced Silenced alerts in Prometheus Alert Manager. Default value = true |
| --unprocessed Unprocessed alerts in Prometheus Alert Manager. Default value = true |
| --host Prometheus Alert Manager host address |
| --port Prometheus Alert Manager port. Default value = 9093 |
| |
| |
| ``Note that there are two minus signs before parameter name!`` |
| |
| If parameter contains any white spaces then it must be enclosed in quotation marks like: "INFO 1" |
| |
| CLI command examples: |
| |
| Following command are given at top level directory! |
| |
| Check active alarms: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli active [--host] [--port] |
| |
| Example: cli/alarm-cli active |
| |
| Example: cli/alarm-cli active --host localhost --port 8080 |
| |
| Check alarm history: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli active [--host] [--port] |
| |
| Example: cli/alarm-cli history |
| |
| Example: cli/alarm-cli history --host localhost --port 8080 |
| |
| Raise alarm: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli raise --moid --apid --sp --severity --iinfo [--host] [--port] [--if] |
| |
| Example: cli/alarm-cli raise --moid RIC --apid UEEC --sp 8007 --severity CRITICAL --iinfo INFO-1 |
| |
| Following is meant only for testing and verification purpose! |
| |
| Example: cli/alarm-cli raise --moid RIC --apid UEEC --sp 8007 --severity CRITICAL --iinfo INFO-1 --host localhost --port 8080 --if rmr |
| |
| Clear alarm: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli clear --moid --apid --sp --severity --iinfo [--host] [--port] [--if] |
| |
| Example: cli/alarm-cli clear --moid RIC --apid UEEC --sp 8007 --iinfo INFO-1 |
| |
| Example: cli/alarm-cli clear --moid RIC --apid UEEC --sp 8007 --iinfo INFO-1 --host localhost --port 8080 --if rmr |
| |
| Configure maximum active alarms and maximum alarms in alarm history: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli configure --mal --mah [--host] [--port] |
| |
| Example: cli/alarm-cli configure --mal 1000 --mah 5000 |
| |
| Example: cli/alarm-cli configure --mal 1000 --mah 5000 --host localhost --port 8080 |
| |
| Add new alarm definition: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined" [--rad] [--cad] [--host] [--port] |
| |
| Example: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined" --rad 0 --cad 0 |
| |
| Example: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined" --rad 0 --cad 0 --host localhost --port 8080 |
| |
| Delete existing alarm definition: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli undefine --aid [--host] [--port] |
| |
| Example: cli/alarm-cli undefine --aid 8007 |
| |
| Example: cli/alarm-cli undefine --aid 8007 --host localhost --port 8080 |
| |
| Conduct performance test: |
| |
| Note that this is meant only for testing and verification purpose! |
| |
| Before any performance test command can be issued, an environment variable needs to be set. The variable holds information where |
| test alarm object file is stored. |
| |
| .. code-block:: none |
| |
| PERF_OBJ_FILE=cli/perf-alarm-object.json |
| |
| Syntax: cli/alarm-cli perf --prf --nal --aps --tim [--host] [--port] [--if] |
| |
| Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --if rmr |
| |
| Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --if http |
| |
| Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --host localhost --port 8080 --if rmr |
| |
| Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --if rmr |
| |
| Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --if http |
| |
| Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --host localhost --port 8080 --if rmr |
| |
| Get alerts from Prometheus Alert Manager: |
| |
| .. code-block:: none |
| |
| Syntax: cli/alarm-cli alerts --active --inhibited --silenced --unprocessed --host [--port] |
| |
| Example: cli/alarm-cli alerts --active true --inhibited true --silenced true --unprocessed true --host 10.102.36.121 --port 9093 |
| |
| |
| REST interface usage guide |
| -------------------------- |
| |
| REST interface offers all the same services plus some more that are available via CLI. The CLI also uses the REST interface to implement the services it offers. |
| |
| Below are examples for REST interface. Curl tool is used to send REST commands. |
| |
| Check active alarms: |
| |
| Example: curl -X GET "http://localhost:8080/ric/v1/alarms/active" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Check alarm history: |
| |
| Example: curl -X GET "http://localhost:8080/ric/v1/alarms/history" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Raise alarm: |
| |
| Example: curl -X POST "http://localhost:8080/ric/v1/alarms" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"managedObjectId\": \"RIC\", \"applicationId\": \"UEEC\", \"specificProblem\": 8007, \"perceivedSeverity\": \"CRITICAL\", \"additionalInfo\": \"-\", \"identifyingInfo\": \"INFO-1\", \"AlarmAction\": \"RAISE\", \"AlarmTime\": 0}" |
| |
| Clear alarm: |
| |
| Example: curl -X DELETE "http://localhost:8080/ric/v1/alarms" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"managedObjectId\": \"RIC\", \"applicationId\": \"UEEC\", \"specificProblem\": 8007, \"perceivedSeverity\": \"\", \"additionalInfo\": \"-\", \"identifyingInfo\": \"INFO-1\", \"AlarmAction\": \"CLEAR\", \"AlarmTime\": 0}" |
| |
| Get configuration of maximum active alarms and maximum alarms in alarm history: |
| |
| Example: curl -X GET "http://localhost:8080/ric/v1/alarms/config" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Configure maximum active alarms and maximum alarms in alarm history: |
| |
| Example: curl -X POST "http://localhost:8080/ric/v1/alarms/config" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"maxactivealarms\": 1000, \"maxalarmhistory\": 5000}" |
| |
| Get all alarm definitions: |
| |
| Example: curl -X GET "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Get an alarm definition: |
| |
| Syntax: curl -X GET "http://localhost:8080/ric/v1/alarms/define/{alarmId}" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Example: curl -X GET "http://localhost:8080/ric/v1/alarms/define/8007" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Add one new alarm definition: |
| |
| Example: curl -X POST "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"alarmdefinitions\": [{\"alarmId\": 8007, \"alarmText\": \"E2 CONNECTIVITY LOST TO E-NODEB\", \"eventtype\": \"Communication error\", \"operationinstructions\": \"Not defined\, \"raiseDelay\": 1, \"clearDelay\": 1"}]}" |
| |
| Add two new alarm definitions: |
| |
| Example: curl -X POST "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"alarmdefinitions\": [{\"alarmId\": 8007, \"alarmText\": \"E2 CONNECTIVITY LOST TO E-NODEB\", \"eventtype\": \"Communication error\", \"operationinstructions\": \"Not defined\, \"raiseDelay\": 0, \"clearDelay\": 0"},{\"alarmId\": 8008, \"alarmText\": \"ACTIVE ALARM EXCEED MAX THRESHOLD\", \"eventtype\": \"storage warning\", \"operationinstructions\": \"Clear alarms or raise threshold\", \"raiseDelay\": 0, \"clearDelay\": 0}]}" |
| |
| Delete one existing alarm definition: |
| |
| Syntax: curl -X DELETE "http://localhost:8080/ric/v1/alarms/define/{alarmId}" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| Example: curl -X DELETE "http://localhost:8080/ric/v1/alarms/define/8007" -H "accept: application/json" -H "Content-Type: application/json" -d "{}" |
| |
| |
| RMR interface usage guide |
| ------------------------- |
| Through RMR interface application can only raise and clear alarms. RMR message payload is similar JSON message as in above REST interface use cases. |
| |
| Supported events via RMR interface |
| |
| - Raise alarm |
| - Clear alarm |
| - Reraise alarm |
| - ClearAll alarms (not supported yet) |
| |
| |
| Example on how to use the API from Golang code |
| ---------------------------------------------- |
| Alarm library functions can be used directly from Golang code. Rising and clearing alarms goes via RMR interface from alarm library to Alarm Manager. |
| |
| |
| .. code-block:: none |
| |
| package main |
| |
| import ( |
| alarm "gerrit.o-ran-sc.org/r/ric-plt/alarm-go.git/alarm" |
| ) |
| |
| func main() { |
| // Initialize the alarm component |
| alarmer, err := alarm.InitAlarm("my-pod", "my-app") |
| |
| // Create a new Alarm object (SP=8004, etc) |
| alarm := alarmer.NewAlarm(8004, alarm.SeverityMajor, "NetworkDown", "eth0") |
| |
| // Raise an alarm (SP=8004, etc) |
| err := alarmer.Raise(alarm) |
| |
| // Clear an alarm (SP=8004) |
| err := alarmer.Clear(alarm) |
| |
| // Re-raise an alarm (SP=8004) |
| err := alarmer.Reraise(alarm) |
| |
| // Clear all alarms raised by the application - (not supported yet) |
| err := alarmer.ClearAll() |
| } |
| |
| |
| Example VES event |
| ----------------- |
| |
| .. code-block:: none |
| |
| INFO[2020-06-08T07:50:10Z] |
| { |
| "event": { |
| "commonEventHeader": { |
| "domain": "fault", |
| "eventId": "fault0000000001", |
| "eventName": "Fault_ricp_E2 CONNECTIVITY LOST TO G-NODEB", |
| "lastEpochMicrosec": 1591602610944553, |
| "nfNamingCode": "ricp", |
| "priority": "Medium", |
| "reportingEntityId": "035EEB88-7BA2-4C23-A349-3B6696F0E2C4", |
| "reportingEntityName": "Vespa", |
| "sequence": 1, |
| "sourceName": "RIC", |
| "startEpochMicrosec": 1591602610944553, |
| "version": 3 |
| }, |
| |
| "faultFields": { |
| "alarmCondition": "E2 CONNECTIVITY LOST TO G-NODEB", |
| "eventSeverity": "MAJOR", |
| "eventSourceType": "virtualMachine", |
| "faultFieldsVersion": 2, |
| "specificProblem": "eth12", |
| "vfStatus": "Active" |
| } |
| } |
| } |
| INFO[2020-06-08T07:50:10Z] Schema validation succeeded |