blob: fc14f995230cbb2835e74e2839d4828600cce794 [file] [log] [blame]
Guobiao Mobb94cb72020-02-11 17:14:33 -08001.. This work is licensed under a Creative Commons Attribution 4.0
2 International License. http://creativecommons.org/licenses/by/4.0
3
4.. _docs_Datalake_Handler_MS:
5
6Architecture
7------------
8
9
10Background
11~~~~~~~~~~
12There are large amount of data flowing among ONAP components, mostly via DMaaP and Web Services.
13For example, all events/feed collected by DCAE collectors go through DMaaP.
14DMaaP is backed by Kafka, which is a system for Publish-Subscribe,
15where data is not meant to be permanent and gets deleted after certain retention period.
16Kafka is not a database, means that data there is not for query.
17Though some components may store processed result into their local databases, most of the raw data will eventually lost.
18We should provide a systematic way to store these raw data, and even the processed result,
19which will serve as the source for data analytics and machine learning, providing insight to the network operation.
20
21
22Relations with Other ONAP Components
23~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24The architecture below depicts the DataLake MS as a part of ONAP. Only the relevant interactions and components are shown.
25
26.. image:: ./arch.PNG
27
28Note that not all data storage systems in the picture are supported. In R6, the following storage are supported:
29 - MongoDB
30 - Couchbase
31 - Elasticsearch and Kibana
32 - HDFS
VENKATESH KUMARe21c8482020-11-12 15:00:02 -050033
Guobiao Mobb94cb72020-02-11 17:14:33 -080034Depending on demands, new systems may be added to the supported list. In the following we use the term database for the storage,
35even though HDFS is a file system (but with simple settings, it can be treats as a database, e.g. Hive.)
36
37Note that once the data is stored in databases, other ONAP components and systems will directly query data from the databases,
38without interacting with DataLake Handler.
39
40Description
41~~~~~~~~~~~
Kai6c9735a2020-11-13 17:03:52 +080042DataLake Handler's main function is to monitor and persist data flow through DMaaP and provide a query API for other component or external services. The databases are outside of ONAP scope,
Guobiao Mobb94cb72020-02-11 17:14:33 -080043since the data is expected to be huge, and a database may be a complicated cluster consisting of thousand of nodes.
44
45Admin UI
46~~~~~~~~
47A system administrator uses DataLake Admin UI to:
48 - Configure external database connections, such as host, port, login.
49 - Configure which Topics to monitor, which databases to store the data for each Topic.
50 - Pre-configured 3rd Party Tools dashboards and templates.
51
Niranjana25c8aa02021-05-07 11:18:09 +053052This UI tool is used to manage all the Dayalake settings stored in postgres. Here is the database schema:
Guobiao Mobb94cb72020-02-11 17:14:33 -080053
54.. image:: ./dbschema.PNG
55
56Feeder
57~~~~~~
58Architecture
59.. image:: ./feeder-arch.PNG
60
61Features
62
63 - Read data directly from Kafka for performance.
64 - Support for pluggable databases. To add a new database, we only need to implement its corrosponding service.
Niranjana25c8aa02021-05-07 11:18:09 +053065 - Support REST API for inter-component communications. Besides managing DatAlake settings in postgres, Admin UI also use this API to start/stop Feeder, query Feeder status and statistics.
66 - Use postgres to store settings.
VENKATESH KUMARe21c8482020-11-12 15:00:02 -050067 - Support data processing features. Before persisting data, data can be massaged in Feeder. Currently two features are implemented: Correlate Cleared Message (in org.onap.datalake.feeder.service.db.ElasticsearchService) and Flatten JSON Array (org.onap.datalake.feeder.service.StoreService).
Guobiao Mobb94cb72020-02-11 17:14:33 -080068 - Connection to Kafka and DBs are secured
69
Kai6c9735a2020-11-13 17:03:52 +080070Des
71~~~
72Architecture
73.. image:: ./des-arch.PNG
74
75Features
76
77 - Provide a data query API for other components to consume.
78 - Integrate with Presto to do data query via sql template.
Guobiao Mobb94cb72020-02-11 17:14:33 -080079
80Links
81~~~~~
82 - DataLake Development Environment Setup https://wiki.onap.org/display/DW/DataLake+Development+Environment+Setup
Kai6c9735a2020-11-13 17:03:52 +080083 - Des description and deployment steps: https://wiki.onap.org/display/DW/DES
Guobiao Mobb94cb72020-02-11 17:14:33 -080084 - Source Code https://gerrit.onap.org/r/gitweb?p=dcaegen2/services.git;a=tree;f=components/datalake-handler;hb=HEAD