DL-Handler Doc

Issue-ID: DCAEGEN2-2028
Change-Id: Ifa26667a9ef1d6d8be711b1454d3e4ff6ea9a74b
Signed-off-by: Guobiao Mo <guobiaomo@chinamobile.com>
diff --git a/docs/sections/services/datalake-handler/overview.rst b/docs/sections/services/datalake-handler/overview.rst
new file mode 100644
index 0000000..51dab10
--- /dev/null
+++ b/docs/sections/services/datalake-handler/overview.rst
@@ -0,0 +1,76 @@
+.. This work is licensed under a Creative Commons Attribution 4.0

+   International License. http://creativecommons.org/licenses/by/4.0

+   

+.. _docs_Datalake_Handler_MS:

+

+Architecture

+------------

+

+

+Background

+~~~~~~~~~~

+There are large amount of data flowing among ONAP components, mostly via DMaaP and Web Services. 

+For example, all events/feed collected by DCAE collectors go through DMaaP. 

+DMaaP is backed by Kafka, which is a system for Publish-Subscribe, 

+where data is not meant to be permanent and gets deleted after certain retention period. 

+Kafka is not a database, means that data there is not for query.

+Though some components may store processed result into their local databases, most of the raw data will eventually lost. 

+We should provide a systematic way to store these raw data, and even the processed result,

+which will serve as the source for data analytics and machine learning, providing insight to the network operation.

+

+

+Relations with Other ONAP Components

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+The architecture below depicts the DataLake MS as a part of ONAP. Only the relevant interactions and components are shown.

+

+.. image:: ./arch.PNG

+

+Note that not all data storage systems in the picture are supported. In R6, the following storage are supported:

+  - MongoDB

+  - Couchbase

+  - Elasticsearch and Kibana

+  - HDFS

+Depending on demands, new systems may be added to the supported list. In the following we use the term database for the storage, 

+even though HDFS is a file system (but with simple settings, it can be treats as a database, e.g. Hive.)

+

+Note that once the data is stored in databases, other ONAP components and systems will directly query data from the databases, 

+without interacting with DataLake Handler.

+

+Description

+~~~~~~~~~~~

+DataLate Handler's main function is to monitor and persist data flow through DMaaP. The databases are outside of ONAP scope, 

+since the data is expected to be huge, and a database may be a complicated cluster consisting of thousand of nodes.

+

+Admin UI

+~~~~~~~~

+A system administrator uses DataLake Admin UI to:

+  - Configure external database connections, such as host, port, login.

+  - Configure which Topics to monitor, which databases to store the data for each Topic.

+  - Pre-configured 3rd Party Tools dashboards and templates.

+

+This UI tool is used to manage all the Dayalake settings stored in MariaDB. Here is the database schema:

+

+.. image:: ./dbschema.PNG

+

+Feeder

+~~~~~~

+Architecture

+.. image:: ./feeder-arch.PNG

+

+Features

+

+   - Read data directly from Kafka for performance.

+   - Support for pluggable databases. To add a new database, we only need to implement its corrosponding service.

+   - Support REST API for inter-component communications. Besides managing DatAlake settings in MariaDB, 

+   Admin UI also use this API to start/stop Feeder, query Feeder status and statistics.

+   - Use MariaDB to store settings.

+   - Support data processing features. Before persisting data, data can be massaged in Feeder. 

+   Currently two features are implemented: Correlate Cleared Message (in org.onap.datalake.feeder.service.db.ElasticsearchService) 

+   and Flatten JSON Array (org.onap.datalake.feeder.service.StoreService).    

+   - Connection to Kafka and DBs are secured

+

+

+Links

+~~~~~

+   - DataLake Development Environment Setup https://wiki.onap.org/display/DW/DataLake+Development+Environment+Setup

+   - Source Code https://gerrit.onap.org/r/gitweb?p=dcaegen2/services.git;a=tree;f=components/datalake-handler;hb=HEAD