blob: e531f7c9d03c92502be7c0bf6008b5cc450f6915 [file] [log] [blame]
Anssi Mannila0a491382020-10-08 15:10:49 +03001..
2.. Copyright (c) 2019 AT&T Intellectual Property.
3.. Copyright (c) 2019 Nokia.
4..
5.. Licensed under the Creative Commons Attribution 4.0 International
6.. Public License (the "License"); you may not use this file except
7.. in compliance with the License. You may obtain a copy of the License at
8..
9.. https://creativecommons.org/licenses/by/4.0/
10..
11.. Unless required by applicable law or agreed to in writing, documentation
12.. distributed under the License is distributed on an "AS IS" BASIS,
13.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14..
15.. See the License for the specific language governing permissions and
16.. limitations under the License.
17..
18
19User-Guide
20==========
21
22.. contents::
23 :depth: 3
24 :local:
25
26RIC Alarm System
27----------------
28
29Overview
30--------
31RIC alarm system consists of three components: Alarm Manager, Application Library and Command Line Interface
32
33The Alarm Manager is responsible for managing alarm situations in RIC cluster and interfacing with Northbound applications
34such as Prometheus Alert Manager to post the alarms as alerts. Alert Manager takes care of de-duplicating, silencing and
35inhibition (suppressing) of alerts, and routing them to the VES-Agent, which, in turn, takes care of converting alerts to
36faults and sending them to ONAP as VES events.
37
38The Alarm Library provides a simple interface for RIC applications (both platform application and xApps) to raise and clear
39alarms. The Alarm Library interacts with the Alarm Manager via RMR interface.
40
41 .. image:: images/RIC_Alarm_System.png
42 :width: 600
43 :alt: Place in RIC's software architecture picture
44
45
46Alarm Manager
47-------------
48The Alarm Manager listens alarms coming via RMR and REST interfaces. An application can raise or clear alarms via either
49of interfaces. Alarm Manager listens also commands coming from CLI (Command Line Interface). In addition Alarm Manager supports few
50other commands that can be given through the interfaces. Such as list active alarms, list alarm history, add new alarms
51definition, delete existing alarm definition, re-raise alarms and clear all alarms. Those are not typically used by applications while
52running. Alarm Manager itself re-raises alarms periodically to keep alarms in active state. The other commands are can be used through
53CLI interface by operator or are used when applications is starting up or restarting.
54
55Maximum amount of active alarms and size of alarm history are configurable. By default, the values are Maximum number of active
56alarms = 5000, Maximum number of alarm history = 20,000.
57
58Alarm definitions can be updated dynamically via REST interface. Default definitions are read from JSON configuration file when FM
59service is deployed.
60
61
62Alarm Library
63-------------
64The Alarm Library provides simple interface for RIC applications (both platform application and xApps) to raise and clear
65alarms. A new alarm instance is created with InitAlarm()-function. ManagedObject (mo) and Application (ap) identities are
66given as parameters for Alarm Context/Object
67
68The Alarm object contains following parameters:
69
Anssi Mannila835e3992020-10-12 09:02:55 +030070 SpecificProblem: problem that is the cause of the alarm \(*
Anssi Mannila0a491382020-10-08 15:10:49 +030071
72 PerceivedSeverity: The severity of the alarm, see below for possible values
73
Anssi Mannila835e3992020-10-12 09:02:55 +030074 ManagedObjectId: The name of the managed object that is the cause of the fault \(*
Anssi Mannila0a491382020-10-08 15:10:49 +030075
Anssi Mannila835e3992020-10-12 09:02:55 +030076 ApplicationId: The name of the process raised the alarm \(*
Anssi Mannila0a491382020-10-08 15:10:49 +030077
78 AdditionalInfo: Additional information given by the application
79
Anssi Mannila835e3992020-10-12 09:02:55 +030080 IdentifyingInfo: Identifying additional information, which is part of alarm identity \(*
Anssi Mannila0a491382020-10-08 15:10:49 +030081
82Items marked with \*, i.e., ManagedObjectId (mo), SpecificProblem (sp), ApplicationId (ap) and IdentifyingInfo (IdentifyingInfo) make
83up the identity of the alarm. All parameters must be according to the alarm definition, i.e. all mandatory parameters should be present,
84and parameters should have correct value type or be from some predefined range. Addressing the same alarm instance in a clear() or reraise()
85call is done by making sure that all four values are the same is in the original raise() / reraise() call.
86
87Alarm Manager does not allow raising "same alarm" more than once without that the alarm is cleared first. Alarm Manager compares
88ManagedObjectId (mo), SpecificProblem (sp), ApplicationId (ap) and IdentifyingInfo (IdentifyingInfo) parameters to check possible
89duplicate. If the values are the same then alarm is suppressed. If application raises the "same alarm" but PerceivedSeverity of the alarm
90is changed then Alarm Manager deletes the old alarm and makes new alarm according to new information.
91
92
93Alarm APIs
94
95 Raise: Raises the alarm instance given as a parameter
96
97 Clear: Clears the alarm instance given as a parameter, if it the alarm active
98
99 Reraise: Attempts to re-raise the alarm instance given as a parameter
100
101 ClearAll: Clears all alarms matching moId and appId given as parameters (not supported yet)
102
103
104Command line interface
105----------------------
106
107Through CLI operator can do the following operations:
108
109 - Check active alarms
110 - Check alarm history
111 - Raise an alarm
112 - Clear an alarm
113 - Configure maximum active alarms and maximum alarms in alarm history
114 - Add new alarm definitions that can be raised
115 - Delete existing alarm definition that can be raised
116
117CLI commands need to be given inside Alarm Manger pod. To get there first print name of the Alarm Manger pod.
118
Anssi Mannila835e3992020-10-12 09:02:55 +0300119.. code-block:: none
120
Anssi Mannila0a491382020-10-08 15:10:49 +0300121 kubectl get pods -A | grep alarmmanager
122
123Output should be look someting like this:
124
Anssi Mannila835e3992020-10-12 09:02:55 +0300125.. code-block:: none
126
Anssi Mannila0a491382020-10-08 15:10:49 +0300127 ricplt deployment-ricplt-alarmmanager-6cc8764749-gnwjh 1/1 running 0 15d
128
129Then give this command to enter inside the pod. Replace the pod name with the actual name from the printout.
130
Anssi Mannila835e3992020-10-12 09:02:55 +0300131.. code-block:: none
132
Anssi Mannila0a491382020-10-08 15:10:49 +0300133 kubectl exec -it deployment-ricplt-alarmmanager-6cc8764749-gnwjh bash
134
135CLI commands can have some of the following parameters
136
Anssi Mannila835e3992020-10-12 09:02:55 +0300137.. code-block:: none
138
139 --moid ManagedObjectId, example string: RIC
140 --apid ApplicationId string, example string: UEEC
141 --sp SpecificProblem, example value: 8007
142 --severity Severity of the alarm, possible values: UNSPECIFIED, CRITICAL, MAJOR, MINOR, WARNING, CLEARED or DEFAULT
143 --iinfo Identifying info, a user specified string, example string: INFO-1
144 --mal Maximum number of active alarms, example value 1000
145 --mah Maximum number of alarms in alarm history, example value: 2000
146 --aid Alarm id, example value: 8007
147 --atx Alarm text string, example string: E2 CONNECTIVITY LOST TO E-NODEB
148 --ety Event type string, example string: Communication error
149 --oin Operation instructions string, example string: Not defined
150 --prf Performance profile id, possible values: 1 = peak performance test or 2 = endurance test
151 --nal Number of alarms, example value: 50
152 --aps Alarms per second, example value: 1
153 --tim Total time of test in minutes, example value: 1
Anssi Mannila00894a42020-10-19 11:36:26 +0300154 --host Alarm Manager host address. Default value = localhost
155 --port Alarm Manager port. Default value = 8080
Anssi Mannila835e3992020-10-12 09:02:55 +0300156 --if Used Alarm Manager command interface, http or rmr: default value = http
Anssi Mannila00894a42020-10-19 11:36:26 +0300157 --active Active alerts in Prometheus Alert Manager. Default value = true
158 --inhibited Inhibited alerts in Prometheus Alert Manager. Default value = true
159 --silenced Silenced alerts in Prometheus Alert Manager. Default value = true
160 --unprocessed Unprocessed alerts in Prometheus Alert Manager. Default value = true
161 --host Prometheus Alert Manager host address
162 --port Prometheus Alert Manager port. Default value = 9093
Anssi Mannila0a491382020-10-08 15:10:49 +0300163
164
Anssi Mannila835e3992020-10-12 09:02:55 +0300165``Note that there are two minus signs before parameter name!``
Anssi Mannila0a491382020-10-08 15:10:49 +0300166
Anssi Mannila835e3992020-10-12 09:02:55 +0300167If parameter contains any white spaces then it must be enclosed in quotation marks like: "INFO 1"
Anssi Mannila0a491382020-10-08 15:10:49 +0300168
169CLI command examples:
170
171 Following command are given at top level directory!
172
173 Check active alarms:
174
175 .. code-block:: none
176
177 Syntax: cli/alarm-cli active [--host] [--port]
178
179 Example: cli/alarm-cli active
180
181 Example: cli/alarm-cli active --host localhost --port 8080
182
183 Check alarm history:
184
185 .. code-block:: none
186
187 Syntax: cli/alarm-cli active [--host] [--port]
188
189 Example: cli/alarm-cli history
190
191 Example: cli/alarm-cli history --host localhost --port 8080
192
193 Raise alarm:
194
195 .. code-block:: none
196
197 Syntax: cli/alarm-cli raise --moid --apid --sp --severity --iinfo [--host] [--port] [--if]
198
199 Example: cli/alarm-cli raise --moid RIC --apid UEEC --sp 8007 --severity CRITICAL --iinfo INFO-1
200
201 Following is meant only for testing and verification purpose!
202
203 Example: cli/alarm-cli raise --moid RIC --apid UEEC --sp 8007 --severity CRITICAL --iinfo INFO-1 --host localhost --port 8080 --if rmr
204
205 Clear alarm:
206
207 .. code-block:: none
208
209 Syntax: cli/alarm-cli clear --moid --apid --sp --severity --iinfo [--host] [--port] [--if]
210
211 Example: cli/alarm-cli clear --moid RIC --apid UEEC --sp 8007 --iinfo INFO-1
212
213 Example: cli/alarm-cli clear --moid RIC --apid UEEC --sp 8007 --iinfo INFO-1 --host localhost --port 8080 --if rmr
214
215 Configure maximum active alarms and maximum alarms in alarm history:
216
217 .. code-block:: none
218
219 Syntax: cli/alarm-cli configure --mal --mah [--host] [--port]
220
221 Example: cli/alarm-cli configure --mal 1000 --mah 5000
222
223 Example: cli/alarm-cli configure --mal 1000 --mah 5000 --host localhost --port 8080
224
225 Add new alarm definition:
226
227 .. code-block:: none
228
229 Syntax: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined" [--host] [--port]
230
231 Example: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined"
232
233 Example: cli/alarm-cli define --aid 8007 --atx "E2 CONNECTIVITY LOST TO E-NODEB" --ety "Communication error" --oin "Not defined" --host localhost --port 8080
234
235 Delete existing alarm definition:
236
237 .. code-block:: none
238
239 Syntax: cli/alarm-cli undefine --aid [--host] [--port]
240
241 Example: cli/alarm-cli undefine --aid 8007
242
243 Example: cli/alarm-cli undefine --aid 8007 --host localhost --port 8080
244
245 Conduct performance test:
246
247 Note that this is meant only for testing and verification purpose!
248
249 Before any performance test command can be issued, an environment variable needs to be set. The variable holds information where
250 test alarm object file is stored.
251
252 .. code-block:: none
253
254 PERF_OBJ_FILE=cli/perf-alarm-object.json
255
256 Syntax: cli/alarm-cli perf --prf --nal --aps --tim [--host] [--port] [--if]
257
258 Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --if rmr
259
260 Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --if http
261
262 Peak performance test example: cli/alarm-cli perf --prf 1 --nal 50 --aps 1 --tim 1 --host localhost --port 8080 --if rmr
263
264 Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --if rmr
265
266 Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --if http
267
268 Endurance test example: cli/alarm-cli perf --prf 2 --nal 50 --aps 1 --tim 1 --host localhost --port 8080 --if rmr
269
Anssi Mannila00894a42020-10-19 11:36:26 +0300270Get alerts from Prometheus Alert Manager:
271
272 .. code-block:: none
273
274 Syntax: cli/alarm-cli gapam --active --inhibited --silenced --unprocessed --host [--port]
275
276 Example: cli/alarm-cli gapam --active true --inhibited true --silenced true --unprocessed true --host 10.102.36.121 --port 9093
277
Anssi Mannila0a491382020-10-08 15:10:49 +0300278
279REST interface usage guide
280--------------------------
281
282REST interface offers all the same services plus some more that are available via CLI. The CLI also uses the REST interface to implement the services it offers.
283
284Below are examples for REST interface. Curl tool is used to send REST commands.
285
286 Check active alarms:
287
288 Example: curl -X GET "http://localhost:8080/ric/v1/alarms/active" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
289
290 Check alarm history:
291
292 Example: curl -X GET "http://localhost:8080/ric/v1/alarms/history" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
293
294 Raise alarm:
295
296 Example: curl -X POST "http://localhost:8080/ric/v1/alarms" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"managedObjectId\": \"RIC\", \"applicationId\": \"UEEC\", \"specificProblem\": 8007, \"perceivedSeverity\": \"CRITICAL\", \"additionalInfo\": \"-\", \"identifyingInfo\": \"INFO-1\", \"AlarmAction\": \"RAISE\", \"AlarmTime\": 0}"
297
298 Clear alarm:
299
300 Example: curl -X DELETE "http://localhost:8080/ric/v1/alarms" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"managedObjectId\": \"RIC\", \"applicationId\": \"UEEC\", \"specificProblem\": 8007, \"perceivedSeverity\": \"\", \"additionalInfo\": \"-\", \"identifyingInfo\": \"INFO-1\", \"AlarmAction\": \"CLEAR\", \"AlarmTime\": 0}"
301
302 Get configuration of maximum active alarms and maximum alarms in alarm history:
303
304 Example: curl -X GET "http://localhost:8080/ric/v1/alarms/config" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
305
306 Configure maximum active alarms and maximum alarms in alarm history:
307
308 Example: curl -X POST "http://localhost:8080/ric/v1/alarms/config" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"maxactivealarms\": 1000, \"maxalarmhistory\": 5000}"
309
310 Get all alarm definitions:
311
312 Example: curl -X GET "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
313
314 Get an alarm definition:
315
316 Syntax: curl -X GET "http://localhost:8080/ric/v1/alarms/define/{alarmId}" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
317
318 Example: curl -X GET "http://localhost:8080/ric/v1/alarms/define/8007" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
319
320 Add one new alarm definition:
321
322 Example: curl -X POST "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"alarmdefinitions\": [{\"alarmId\": 8007, \"alarmText\": \"E2 CONNECTIVITY LOST TO E-NODEB\", \"eventtype\": \"Communication error\", \"operationinstructions\": \"Not defined\"}]}"
323
324 Add two new alarm definitions:
325
326 Example: curl -X POST "http://localhost:8080/ric/v1/alarms/define" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"alarmdefinitions\": [{\"alarmId\": 8007, \"alarmText\": \"E2 CONNECTIVITY LOST TO E-NODEB\", \"eventtype\": \"Communication error\", \"operationinstructions\": \"Not defined\"},{\"alarmId\": 8008, \"alarmText\": \"ACTIVE ALARM EXCEED MAX THRESHOLD\", \"eventtype\": \"storage warning\", \"operationinstructions\": \"Clear alarms or raise threshold\"}]}"
327
328 Delete one existing alarm definition:
329
330 Syntax: curl -X DELETE "http://localhost:8080/ric/v1/alarms/define/{alarmId}" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
331
332 Example: curl -X DELETE "http://localhost:8080/ric/v1/alarms/define/8007" -H "accept: application/json" -H "Content-Type: application/json" -d "{}"
333
334
335RMR interface usage guide
336-------------------------
337Through RMR interface application can only raise and clear alarms. RMR message payload is similar JSON message as in above REST interface use cases.
338
339 Supported events via RMR interface
340
341 - Raise alarm
342 - Clear alarm
343 - Reraise alarm
344 - ClearAll alarms (not supported yet)
345
346
347Example on how to use the API from Golang code
348----------------------------------------------
349Alarm library functions can be used directly from Golang code. Rising and clearing alarms goes via RMR interface from alarm library to Alarm Manager.
350
351
352.. code-block:: none
353
354 package main
355
356 import (
357 alarm "gerrit.o-ran-sc.org/r/ric-plt/alarm-go/alarm"
358 )
359
360 func main() {
361 // Initialize the alarm component
362 alarmer, err := alarm.InitAlarm("my-pod", "my-app")
363
364 // Create a new Alarm object (SP=8004, etc)
365 alarm := alarmer.NewAlarm(8004, alarm.SeverityMajor, "NetworkDown", "eth0")
366
367 // Raise an alarm (SP=8004, etc)
368 err := alarmer.Raise(alarm)
369
370 // Clear an alarm (SP=8004)
371 err := alarmer.Clear(alarm)
372
373 // Re-raise an alarm (SP=8004)
374 err := alarmer.Reraise(alarm)
375
376 // Clear all alarms raised by the application - (not supported yet)
377 err := alarmer.ClearAll()
378 }
379
380
381Example VES event
382-----------------
383
384.. code-block:: none
385
386 INFO[2020-06-08T07:50:10Z]
387 {
388 "event": {
389 "commonEventHeader": {
390 "domain": "fault",
391 "eventId": "fault0000000001",
392 "eventName": "Fault_ricp_E2 CONNECTIVITY LOST TO G-NODEB",
393 "lastEpochMicrosec": 1591602610944553,
394 "nfNamingCode": "ricp",
395 "priority": "Medium",
396 "reportingEntityId": "035EEB88-7BA2-4C23-A349-3B6696F0E2C4",
397 "reportingEntityName": "Vespa",
398 "sequence": 1,
399 "sourceName": "RIC",
400 "startEpochMicrosec": 1591602610944553,
401 "version": 3
402 },
403
404 "faultFields": {
405 "alarmCondition": "E2 CONNECTIVITY LOST TO G-NODEB",
406 "eventSeverity": "MAJOR",
407 "eventSourceType": "virtualMachine",
408 "faultFieldsVersion": 2,
409 "specificProblem": "eth12",
410 "vfStatus": "Active"
411 }
412 }
413 }
414 INFO[2020-06-08T07:50:10Z] Schema validation succeeded