blob: 183337783893dff9cbf83acfcf6b5b5be34ef4a6 [file] [log] [blame]
kjaniak8b1223d2018-12-19 22:41:40 +01001.. This work is licensed under a Creative Commons Attribution 4.0 International License.
2.. http://creativecommons.org/licenses/by/4.0
3
4.. _healthcheck_and_monitoring:
5
6Healthcheck and Monitoring
7==========================
8
9Healthcheck
10-----------
11Inside HV-VES docker container runs a small HTTP service for healthcheck. Port for healthchecks can be configured
12at deployment using ``--health-check-api-port`` command line option or via `VESHV_HEALTHCHECK_API_PORT` environment variable (for details see :ref:`deployment`).
13
14This service exposes endpoint **GET /health/ready** which returns a **HTTP 200 OK** when HV-VES is healthy
15and ready for connections. Otherwise it returns a **HTTP 503 Service Unavailable** message with a short reason of unhealthiness.
16
17
18Monitoring
19----------
20HV-VES collector allows to collect metrics data at runtime. To serve this purpose HV-VES application exposes an endpoint **GET /monitoring/prometheus**
21which returns a **HTTP 200 OK** message with a specific data in its body. Returned data is in a format readable by Prometheus service.
22Prometheus endpoint shares a port with healthchecks.
23
24Metrics provided by HV-VES metrics:
25
26+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
27| Name of metric | Unit | Description |
28+===============================================+==============+==========================================================================================+
29| hvves_clients_rejected_cause_total | cause/piece | number of rejected clients grouped by cause |
30+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
31| hvves_clients_rejected_total | piece | total number of rejected clients |
32+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
33| hvves_connections_active | piece | number of currently active connections |
34+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
35| hvves_connections_total | piece | total number of connections |
36+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
37| hvves_data_received_bytes_total | bytes | total number of received bytes |
38+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
39| hvves_disconnections_total | piece | total number of disconnections |
40+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
41| hvves_messages_dropped_cause_total | cause/piece | number of dropped messages grouped by cause |
42+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
43| hvves_messages_dropped_total | piece | total number of dropped messages |
44+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010045| hvves_messages_latency_seconds_bucket | seconds | latency is a time between | cumulative counters for the latency occurance |
kjaniak8b1223d2018-12-19 22:41:40 +010046+-----------------------------------------------+--------------+ message.header.lastEpochMicrosec +------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010047| hvves_messages_latency_seconds_count | piece | and time when data has been sent | counter for number of latency occurance |
kjaniak8b1223d2018-12-19 22:41:40 +010048+-----------------------------------------------+--------------+ from HV-VES to Kafka +------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010049| hvves_messages_latency_seconds_max | seconds | | maximal observed latency |
50+-----------------------------------------------+--------------+ +------------------------------------------------------+
kjaniak8b1223d2018-12-19 22:41:40 +010051| hvves_messages_latency_seconds_sum | seconds | | sum of latency parameter from each message |
52+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010053| hvves_messages_processing_time_seconds_bucket | seconds | processing time is time meassured | cumulative counters for processing time occurance |
kjaniak8b1223d2018-12-19 22:41:40 +010054+-----------------------------------------------+--------------+ between decoding of WTP message +------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010055| hvves_messages_processing_time_seconds_count | piece | and time when data has been sent | counter for number of processing time occurance |
kjaniak8b1223d2018-12-19 22:41:40 +010056+-----------------------------------------------+--------------+ From HV-VES to Kafka +------------------------------------------------------+
Filip Krzywkaf426f4e2019-01-22 14:37:33 +010057| hvves_messages_processing_time_seconds_max | seconds | | maximal processing time |
58+-----------------------------------------------+--------------+ +------------------------------------------------------+
kjaniak8b1223d2018-12-19 22:41:40 +010059| hvves_messages_processing_time_seconds_sum | seconds | | sum of processing time from each message |
60+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
61| hvves_messages_received_payload_bytes_total | bytes | total number of received payload bytes |
62+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
63| hvves_messages_received_total | piece | total number of received messages |
64+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
65| hvves_messages_sent_topic_total | topic/piece | number of sent messages grouped by topic |
66+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
67| hvves_messages_sent_total | piece | number of sent messages |
68+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
69
70JVM metrics:
71
72- jvm_buffer_memory_used_bytes
73- jvm_classes_unloaded_total
74- jvm_gc_memory_promoted_bytes_total
75- jvm_buffer_total_capacity_bytes
76- jvm_threads_live
77- jvm_classes_loaded
78- jvm_gc_memory_allocated_bytes_total
79- jvm_threads_daemon
80- jvm_buffer_count
81- jvm_gc_pause_seconds_count
82- jvm_gc_pause_seconds_sum
83- jvm_gc_pause_seconds_max
84- jvm_gc_max_data_size_bytes
85- jvm_memory_committed_bytes
86- jvm_gc_live_data_size_bytes
87- jvm_memory_max_bytes
88- jvm_memory_used_bytes
89- jvm_threads_peak
90
91Sample response for **GET monitoring/prometheus**:
92
93.. literalinclude:: metrics_sample_response.txt