blob: 5c614bbeb90798d3193d6984bd994cb8890dd3c9 [file] [log] [blame]
kjaniak53ba1702018-10-25 14:25:13 +02001.. This work is licensed under a Creative Commons Attribution 4.0 International License.
2.. http://creativecommons.org/licenses/by/4.0
3
4.. _troubleshooting:
5
6Troubleshooting
7===============
8
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +01009**NOTE**
10
11According to **ONAP** logging policy, **HV-VES** logs contain all required markers as well as service and client specific Mapped Diagnostic Context (later referred as MDC)
12
13Default console log pattern:
14
15::
16
17 | %date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC}\t| %thread\t| %highlight(%-5level)\t| %msg\t| %marker\t| %rootException\t| %mdc\t| %thread
18
19A sample, fully qualified message implementing this pattern:
20
21::
22
23 | 2018-12-18T13:12:44.369Z | p.dcae.collectors.veshv.impl.socket.NettyTcpServer | DEBUG | Client connection request received | ENTRY | | RequestID=d7762b18-854c-4b8c-84aa-95762c6f8e62, InstanceID=9b9799ca-33a5-4f61-ba33-5c7bf7e72d07, InvocationID=b13d34ba-e1cd-4816-acda-706415308107, PartnerName=C=PL, ST=DL, L=Wroclaw, O=Nokia, OU=MANO, CN=dcaegen2-hvves-client, StatusCode=INPROGRESS, ClientIPAddress=192.168.0.9, ServerFQDN=a4ca8f96c7e5 | reactor-tcp-nio-2
24
25
26For simplicity, all log messages in this section are shortened to contain only:
27 * logger name
28 * log level
29 * message
30
31Error and warning logs contain also:
32 * exception message
33 * stack trace
34
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020035Also exact exception's stack traces has been dropped due to readability
36
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +010037**Do not rely on exact log messages or their presence, as they are often subject to change.**
38
kjaniak53ba1702018-10-25 14:25:13 +020039Deployment/Installation errors
40------------------------------
41
42**Missing required parameters**
43
44::
45
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020046 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to create configuration: Base configuration filepath missing on command line
47 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | org.onap.dcae.collectors.veshv.config.api.model.MissingArgumentException: Base configuration filepath missing on command line
kjaniak53ba1702018-10-25 14:25:13 +020048
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020049These log messages are printed when the single required parameter, configuration file path, was not specified (via command line, or as an environment variable).
kjaniak53ba1702018-10-25 14:25:13 +020050Command line arguments have priority over environment variables. If you configure a parameter in both ways, **HV-VES** applies the one from the command line.
kjaniak53ba1702018-10-25 14:25:13 +020051For more information about **HV-VES** configuration parameters, see :ref:`deployment`.
52
53Configuration errors
54--------------------
55
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020056**Consul service not available**
kjaniak53ba1702018-10-25 14:25:13 +020057
58::
59
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020060 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Failed to retrieve CBS client: consul-server: Temporary failure in name resolution
61 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | java.net.UnknownHostException: consul-server: Temporary failure in name resolution
kjaniak53ba1702018-10-25 14:25:13 +020062
63
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020064**HV-VES** looks for Consul under hostname defined in CONSUL_HOST environment variable. If the service is down, above logs will appear and after few retries collector will shut down.
kjaniak53ba1702018-10-25 14:25:13 +020065
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020066
67**Config Binding Service not available**
68
69::
70
71 | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000
72 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created
73 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: config-binding-service: Temporary failure in name resolution
74 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription
75
76Logs indicate that **HV-VES** successfully retrieved Config Binding Service (later referred as CBS) connection string from Consul, though the address was either incorrect, or the CBS is down.
77Make sure CBS is up and running and the connection string stored in Consul is correct.
kjaniak53ba1702018-10-25 14:25:13 +020078
79====
80
81**Missing configuration on Consul**
82
83::
84
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020085 | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000
86 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created
87 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: Request failed for URL 'http://config-binding-service:10000/service_component/invalid-resource'. Response code: 404 Not Found
88 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | | org.onap.dcaegen2.services.sdk.rest.services.adapters.http.exceptions.HttpException: Request failed for URL 'http://config-binding-service:10000/service_component/dcae-hv-ves-collector'. Response code: 404 Not Found
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +010089
90
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020091**HV-VES** logs this information when connected to Consul, but cannot find JSON configuration under given key which in this case is **invalid-resource**.
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +010092For more information, see :ref:`run_time_configuration`
93
94====
95
96**Invalid configuration format**
97
98::
Jakub Dudycz80cd05a2019-04-10 13:42:02 +020099 | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000
100 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created
101 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | Received new configuration:
102 | {"streams_publishes":{"perf3gpp":{"typo":"kafka","kafka_info":{"bootstrap_servers":"message-router-kafka:9092","topic_name":"HV_VES_PERF3GPP"}}}}
103 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: Could not find sub-node 'type'. Actual sub-nodes: typo, kafka_info
104 | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | org.onap.dcaegen2.services.sdk.rest.services.cbs.client.api.exceptions.StreamParsingException: Could not find sub-node 'type'. Actual sub-nodes: typo, kafka_info
kjaniak53ba1702018-10-25 14:25:13 +0200105
106
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200107This log is printed when you upload a configuration in an invalid format
108Received json contains invalid Streams configuration, therefore **HV-VES** does not apply it and becomes **unhealthy**.
109For more information on dynamic configuration, see :ref:`run_time_configuration`.
kjaniak53ba1702018-10-25 14:25:13 +0200110
111
112Message handling errors
113-----------------------
114
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100115**Handling messages when invalid Kafka url is specified**
kjaniak53ba1702018-10-25 14:25:13 +0200116
117::
118
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200119 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection
120 | org.apache.kafka.clients.ClientUtils | WARN | Removing server invalid-message-router-kafka:9092 from bootstrap.servers as DNS resolution failed for invalid-message-router-kafka
121 | org.apache.kafka.clients.producer.KafkaProducer | INFO | [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 0 ms.
122 | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Error while handling message stream: org.apache.kafka.common.KafkaException (Failed to construct kafka producer)
123 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been close0d
kjaniak53ba1702018-10-25 14:25:13 +0200124
125
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200126**HV-VES** responds with the above when it handles message with domain that has invalid bootstrap_servers specified in **streams_publishes** configuration.
127To fix this problem you have to correct **streams_publishes** configuration stored in Consul.
128For more information, see: :ref:`run_time_configuration`.
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100129
130====
131
132**Kafka service became unavailable after producer has been created**
133
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200134**HV-VES** lazily creates Kafka producer for each domain.
135If Kafka service becomes unreachable after producer initialization, appropriate logs will be shown and **HV-VES** will fail to deliver future messages to that Kafka service.
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100136
137::
138
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200139 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available.
140 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available.
141 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available.
142 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available.
143 | org.onap.dcae.collector.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection
144 | org.onap.dcae.collector.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed
145 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available
146 | org.onap.dcae.collector.veshv.impl.adapters.kafka.KafkaPublisher | WARN | Failed to send message to Kafka. Reason: Expiring 1 record(s) for HV_VES_PERF3GPP-0: 30007 ms has passed since batch creation plus linger time
147 | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Error while handling message stream: org.apache.kafka.common.errors.TimeoutException (Expiring 1 record(s) for HV_VES_PERF3GPP-0: 30007 ms has passed since batch creation plus linger time)
148 | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Error connecting to node message-router-kafka:9092 (id: 1001 rack: null)
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100149
150
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200151To resolve this issue, you can either wait for that Kafka service to be available, or just like in previous paragraph, provide alternative Kafka bootstrap server via dynamic configuration (see :ref:`run_time_configuration`.)
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100152
153====
154
155**Message with too big payload size**
156
157::
158
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200159 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection
160 | org.onap.dcae.collectors.veshv.impl.wire.WireChunkDecoder | WARN | Error while handling message stream: org.onap.dcae.collectors.veshv.impl.wire.WireFrameException (PayloadSizeExceeded: payload size exceeds the limit (1048576 bytes))
161 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100162
kjaniak53ba1702018-10-25 14:25:13 +0200163
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200164The above log is printed when the message payload size is too big. **HV-VES** does not handle messages that exceed maximum payload size specified under streams_publishes configuration (see :ref:`dmaap-connection-objects`)
kjaniak53ba1702018-10-25 14:25:13 +0200165
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100166====
167
168**Invalid GPB data**
169
170Messages with invalid Google Protocol Buffers data encoded are omitted. **HV-VES** responds as follows:
171
172::
173
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200174 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection
175 | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Failed to decode ves event header, reason: Protocol message tag had invalid wire type.
176 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100177
178====
179
180**Invalid Wire Frame**
181
182Messages with invalid Wire Frame, just like those containing invalid GPB data, will be dropped. The exact reason can be found in logs.
183
184::
185
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200186 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection
187 | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Invalid wire frame header, reason: Invalid major version in wire frame header. Expected 1 but was 2
188 | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed
kjaniak53ba1702018-10-25 14:25:13 +0200189
190====
191
kjaniak53ba1702018-10-25 14:25:13 +0200192
193For more information, see the :ref:`hv_ves_behaviors` section.
194
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100195
kjaniak53ba1702018-10-25 14:25:13 +0200196Authorization related errors
197----------------------------
198
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200199**WARNING: SSL/TLS authorization is a part of an experimental feature for ONAP Dublin release and should be treated as unstable and subject to change in future releases.**
kjaniake2844092018-11-14 15:42:03 +0100200**For more information, see** :ref:`ssl_tls_authorization`.
kjaniak53ba1702018-10-25 14:25:13 +0200201
202**Key or trust store missing**
203
204::
205
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200206 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.io.FileNotFoundException: /etc/ves-hv/server.p12
207
kjaniak53ba1702018-10-25 14:25:13 +0200208
209
210The above error is logged when key store is not provided. Similarly, when trust store is not provided, **/etc/ves-hv/trust.p12** file missing is logged.
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200211They can be changed by specifying ``security.keys.trustStore`` or ``security.keys.keyStore`` file configuration entries.
kjaniak53ba1702018-10-25 14:25:13 +0200212
213====
214
215**Invalid credentials**
216
217::
218
Jakub Dudyczcf9b6e12018-12-18 14:21:32 +0100219 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
kjaniak53ba1702018-10-25 14:25:13 +0200220
221
222Key or trust store password provided in configuration is invalid.
223
224====
225
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200226**Empty line at the end of password file**
227
228::
229 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: java.io.IOException: getSecretKey failed: Password is not ASCII
230
231
232Password file should not contain empty line at the end of the file, otherwise server startup will fail.
233
234====
235
kjaniak53ba1702018-10-25 14:25:13 +0200236**Invalid key store file**
237
238::
239
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200240 | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.io.EOFException: Detect premature EOF
241
kjaniak53ba1702018-10-25 14:25:13 +0200242
243The above is logged when provided keystore has invalid or corrupted content.
Jakub Dudycz80cd05a2019-04-10 13:42:02 +0200244This log also appears when you try to use key store/trust store in archive format other than inferred from file extension.