However, for some workloads like an Operational Database based on HBase
or Kafka, in-place upgrades are still appropriate and will be supported in a future version of Cloudera Data Hub.
: E um banco de dados distribuido e escalavel que oferece suporte para o armazenamento de grandes tabelas.
It can access several data sources like HDFS, Cassandra, and HBase
. Spark provides several interesting features, for example, iterative machine learning algorithms through Mllib library which provides efficient algorithms with high speed, structured data analysis using Hive, and graph processing based on GraphX and SparkSQL that restore data from many sources and manipulate them using SQL languages.
As discussed in the introduction, Spark-based approaches can easily access HDFS, Amazon S3, Cassandra, and HBase
. Benefitting from cloud storage and management development, an increasing amount of data has been stored in the cloud and is open to the public.
We realize distributed streaming data processing based on the Storm platform, which can achieve high-speed data reading combined with Kafka queues and can achieve high-speed data writing combined with HBase
. Meanwhile, data transmission performance between the internal components of Storm is also very efficient.
in their paper describes the integrity check module in which HBase
is used to store the large volume of logs.
If any entry is available in HBase
the resource manager allocates YARN jobs which is automatically processed by a server.
Specifically, we employ Pig and Hive to support the application layer by combining them with Hbase
 and also Sqoop to connect with the traditional relational databases.
The MapReduce paradigm is used for the data analysis, while manipulation and storing are performed by Hadoop distributed file system (HDFS), HBASE
, and HIVE.
The objective was to establish an interactive and dynamic framework with front-end and interfaced applications (i.e., Apache Phoenix, Apache Spark, and Apache Drill) linked to the Hadoop Distributed File System (HDFS) and backend NoSQL database of HBase
to form a platform with Big Data technologies to analyze very large data volumes.
We emphasize that single-row atomic transactions of this form are not a peculiarity of Cassandra as they are supported in other NoSQL databases such as, for example, MongoDB and HBase