kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 1 | .. This work is licensed under a Creative Commons Attribution 4.0 International License. |
| 2 | .. http://creativecommons.org/licenses/by/4.0 |
| 3 | |
| 4 | .. _troubleshooting: |
| 5 | |
| 6 | Troubleshooting |
| 7 | =============== |
| 8 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 9 | **NOTE** |
| 10 | |
| 11 | According to **ONAP** logging policy, **HV-VES** logs contain all required markers as well as service and client specific Mapped Diagnostic Context (later referred as MDC) |
| 12 | |
| 13 | Default console log pattern: |
| 14 | |
| 15 | :: |
| 16 | |
| 17 | | %date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC}\t| %thread\t| %highlight(%-5level)\t| %msg\t| %marker\t| %rootException\t| %mdc\t| %thread |
| 18 | |
| 19 | A sample, fully qualified message implementing this pattern: |
| 20 | |
| 21 | :: |
| 22 | |
| 23 | | 2018-12-18T13:12:44.369Z | p.dcae.collectors.veshv.impl.socket.NettyTcpServer | DEBUG | Client connection request received | ENTRY | | RequestID=d7762b18-854c-4b8c-84aa-95762c6f8e62, InstanceID=9b9799ca-33a5-4f61-ba33-5c7bf7e72d07, InvocationID=b13d34ba-e1cd-4816-acda-706415308107, PartnerName=C=PL, ST=DL, L=Wroclaw, O=Nokia, OU=MANO, CN=dcaegen2-hvves-client, StatusCode=INPROGRESS, ClientIPAddress=192.168.0.9, ServerFQDN=a4ca8f96c7e5 | reactor-tcp-nio-2 |
| 24 | |
| 25 | |
| 26 | For simplicity, all log messages in this section are shortened to contain only: |
| 27 | * logger name |
| 28 | * log level |
| 29 | * message |
| 30 | |
| 31 | Error and warning logs contain also: |
| 32 | * exception message |
| 33 | * stack trace |
| 34 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 35 | Also exact exception's stack traces has been dropped due to readability |
| 36 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 37 | **Do not rely on exact log messages or their presence, as they are often subject to change.** |
| 38 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 39 | Deployment/Installation errors |
| 40 | ------------------------------ |
| 41 | |
| 42 | **Missing required parameters** |
| 43 | |
| 44 | :: |
| 45 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 46 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to create configuration: Base configuration filepath missing on command line |
| 47 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | org.onap.dcae.collectors.veshv.config.api.model.MissingArgumentException: Base configuration filepath missing on command line |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 48 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 49 | These log messages are printed when the single required parameter, configuration file path, was not specified (via command line, or as an environment variable). |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 50 | Command line arguments have priority over environment variables. If you configure a parameter in both ways, **HV-VES** applies the one from the command line. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 51 | For more information about **HV-VES** configuration parameters, see :ref:`deployment`. |
| 52 | |
| 53 | Configuration errors |
| 54 | -------------------- |
| 55 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 56 | **Consul service not available** |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 57 | |
| 58 | :: |
| 59 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 60 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Failed to retrieve CBS client: consul-server: Temporary failure in name resolution |
| 61 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | java.net.UnknownHostException: consul-server: Temporary failure in name resolution |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 62 | |
| 63 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 64 | **HV-VES** looks for Consul under hostname defined in CONSUL_HOST environment variable. If the service is down, above logs will appear and after few retries collector will shut down. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 65 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 66 | |
| 67 | **Config Binding Service not available** |
| 68 | |
| 69 | :: |
| 70 | |
| 71 | | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000 |
| 72 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created |
| 73 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: config-binding-service: Temporary failure in name resolution |
| 74 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription |
| 75 | |
| 76 | Logs indicate that **HV-VES** successfully retrieved Config Binding Service (later referred as CBS) connection string from Consul, though the address was either incorrect, or the CBS is down. |
| 77 | Make sure CBS is up and running and the connection string stored in Consul is correct. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 78 | |
| 79 | ==== |
| 80 | |
| 81 | **Missing configuration on Consul** |
| 82 | |
| 83 | :: |
| 84 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 85 | | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000 |
| 86 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created |
| 87 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: Request failed for URL 'http://config-binding-service:10000/service_component/invalid-resource'. Response code: 404 Not Found |
| 88 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | | org.onap.dcaegen2.services.sdk.rest.services.adapters.http.exceptions.HttpException: Request failed for URL 'http://config-binding-service:10000/service_component/dcae-hv-ves-collector'. Response code: 404 Not Found |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 89 | |
| 90 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 91 | **HV-VES** logs this information when connected to Consul, but cannot find JSON configuration under given key which in this case is **invalid-resource**. |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 92 | For more information, see :ref:`run_time_configuration` |
| 93 | |
| 94 | ==== |
| 95 | |
| 96 | **Invalid configuration format** |
| 97 | |
| 98 | :: |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 99 | | org.onap.dcae.services.sdk.rest.services.cbs.client.impl.CbsLookup | INFO | Config Binding Service address: config-binding-service:10000 |
| 100 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | CBS client successfully created |
| 101 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | INFO | Received new configuration: |
| 102 | | {"streams_publishes":{"perf3gpp":{"typo":"kafka","kafka_info":{"bootstrap_servers":"message-router-kafka:9092","topic_name":"HV_VES_PERF3GPP"}}}} |
| 103 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | ERROR | Error while creating configuration: Could not find sub-node 'type'. Actual sub-nodes: typo, kafka_info |
| 104 | | org.onap.dcae.collectors.veshv.config.impl.CbsConfigurationProvider | WARN | Exception from configuration provider client, retrying subscription | org.onap.dcaegen2.services.sdk.rest.services.cbs.client.api.exceptions.StreamParsingException: Could not find sub-node 'type'. Actual sub-nodes: typo, kafka_info |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 105 | |
| 106 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 107 | This log is printed when you upload a configuration in an invalid format |
| 108 | Received json contains invalid Streams configuration, therefore **HV-VES** does not apply it and becomes **unhealthy**. |
| 109 | For more information on dynamic configuration, see :ref:`run_time_configuration`. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 110 | |
| 111 | |
| 112 | Message handling errors |
| 113 | ----------------------- |
| 114 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 115 | **Handling messages when invalid Kafka url is specified** |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 116 | |
| 117 | :: |
| 118 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 119 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection |
| 120 | | org.apache.kafka.clients.ClientUtils | WARN | Removing server invalid-message-router-kafka:9092 from bootstrap.servers as DNS resolution failed for invalid-message-router-kafka |
| 121 | | org.apache.kafka.clients.producer.KafkaProducer | INFO | [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 0 ms. |
| 122 | | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Error while handling message stream: org.apache.kafka.common.KafkaException (Failed to construct kafka producer) |
| 123 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been close0d |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 124 | |
| 125 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 126 | **HV-VES** responds with the above when it handles message with domain that has invalid bootstrap_servers specified in **streams_publishes** configuration. |
| 127 | To fix this problem you have to correct **streams_publishes** configuration stored in Consul. |
| 128 | For more information, see: :ref:`run_time_configuration`. |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 129 | |
| 130 | ==== |
| 131 | |
| 132 | **Kafka service became unavailable after producer has been created** |
| 133 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 134 | **HV-VES** lazily creates Kafka producer for each domain. |
| 135 | If Kafka service becomes unreachable after producer initialization, appropriate logs will be shown and **HV-VES** will fail to deliver future messages to that Kafka service. |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 136 | |
| 137 | :: |
| 138 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 139 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available. |
| 140 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available. |
| 141 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available. |
| 142 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available. |
| 143 | | org.onap.dcae.collector.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection |
| 144 | | org.onap.dcae.collector.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed |
| 145 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available |
| 146 | | org.onap.dcae.collector.veshv.impl.adapters.kafka.KafkaPublisher | WARN | Failed to send message to Kafka. Reason: Expiring 1 record(s) for HV_VES_PERF3GPP-0: 30007 ms has passed since batch creation plus linger time |
| 147 | | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Error while handling message stream: org.apache.kafka.common.errors.TimeoutException (Expiring 1 record(s) for HV_VES_PERF3GPP-0: 30007 ms has passed since batch creation plus linger time) |
| 148 | | org.apache.kafka.clients.NetworkClient | WARN | [Producer clientId=producer-1] Error connecting to node message-router-kafka:9092 (id: 1001 rack: null) |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 149 | |
| 150 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 151 | To resolve this issue, you can either wait for that Kafka service to be available, or just like in previous paragraph, provide alternative Kafka bootstrap server via dynamic configuration (see :ref:`run_time_configuration`.) |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 152 | |
| 153 | ==== |
| 154 | |
| 155 | **Message with too big payload size** |
| 156 | |
| 157 | :: |
| 158 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 159 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection |
| 160 | | org.onap.dcae.collectors.veshv.impl.wire.WireChunkDecoder | WARN | Error while handling message stream: org.onap.dcae.collectors.veshv.impl.wire.WireFrameException (PayloadSizeExceeded: payload size exceeds the limit (1048576 bytes)) |
| 161 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 162 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 163 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 164 | The above log is printed when the message payload size is too big. **HV-VES** does not handle messages that exceed maximum payload size specified under streams_publishes configuration (see :ref:`dmaap-connection-objects`) |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 165 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 166 | ==== |
| 167 | |
| 168 | **Invalid GPB data** |
| 169 | |
| 170 | Messages with invalid Google Protocol Buffers data encoded are omitted. **HV-VES** responds as follows: |
| 171 | |
| 172 | :: |
| 173 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 174 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection |
| 175 | | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Failed to decode ves event header, reason: Protocol message tag had invalid wire type. |
| 176 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 177 | |
| 178 | ==== |
| 179 | |
| 180 | **Invalid Wire Frame** |
| 181 | |
| 182 | Messages with invalid Wire Frame, just like those containing invalid GPB data, will be dropped. The exact reason can be found in logs. |
| 183 | |
| 184 | :: |
| 185 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 186 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Handling new client connection |
| 187 | | org.onap.dcae.collectors.veshv.impl.HvVesCollector | WARN | Invalid wire frame header, reason: Invalid major version in wire frame header. Expected 1 but was 2 |
| 188 | | org.onap.dcae.collectors.veshv.impl.socket.NettyTcpServer | INFO | Connection has been closed |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 189 | |
| 190 | ==== |
| 191 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 192 | |
| 193 | For more information, see the :ref:`hv_ves_behaviors` section. |
| 194 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 195 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 196 | Authorization related errors |
| 197 | ---------------------------- |
| 198 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 199 | **WARNING: SSL/TLS authorization is a part of an experimental feature for ONAP Dublin release and should be treated as unstable and subject to change in future releases.** |
kjaniak | e284409 | 2018-11-14 15:42:03 +0100 | [diff] [blame] | 200 | **For more information, see** :ref:`ssl_tls_authorization`. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 201 | |
| 202 | **Key or trust store missing** |
| 203 | |
| 204 | :: |
| 205 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 206 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.io.FileNotFoundException: /etc/ves-hv/server.p12 |
| 207 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 208 | |
| 209 | |
| 210 | The above error is logged when key store is not provided. Similarly, when trust store is not provided, **/etc/ves-hv/trust.p12** file missing is logged. |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 211 | They can be changed by specifying ``security.keys.trustStore`` or ``security.keys.keyStore`` file configuration entries. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 212 | |
| 213 | ==== |
| 214 | |
| 215 | **Invalid credentials** |
| 216 | |
| 217 | :: |
| 218 | |
Jakub Dudycz | cf9b6e1 | 2018-12-18 14:21:32 +0100 | [diff] [blame] | 219 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption. |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 220 | |
| 221 | |
| 222 | Key or trust store password provided in configuration is invalid. |
| 223 | |
| 224 | ==== |
| 225 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 226 | **Empty line at the end of password file** |
| 227 | |
| 228 | :: |
| 229 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: java.io.IOException: getSecretKey failed: Password is not ASCII |
| 230 | |
| 231 | |
| 232 | Password file should not contain empty line at the end of the file, otherwise server startup will fail. |
| 233 | |
| 234 | ==== |
| 235 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 236 | **Invalid key store file** |
| 237 | |
| 238 | :: |
| 239 | |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 240 | | org.onap.dcae.collectors.veshv.main | ERROR | Failed to start a server | java.io.EOFException: Detect premature EOF |
| 241 | |
kjaniak | 53ba170 | 2018-10-25 14:25:13 +0200 | [diff] [blame] | 242 | |
| 243 | The above is logged when provided keystore has invalid or corrupted content. |
Jakub Dudycz | 80cd05a | 2019-04-10 13:42:02 +0200 | [diff] [blame] | 244 | This log also appears when you try to use key store/trust store in archive format other than inferred from file extension. |