The Modules of Hadoop
|
Hadoop Common - consists of common utilities to support
various hadoop modules.
|
|
Hadoop Distributed File System (
HDFS) - provides high
through-put for application data management that can be operated on commodity
hardware. HDFS is used to scale single Apache Hadoop cluster to a vary large
number of nodes (from hundreds to thousands as required).
|
|
Hadoop YARN
- provides job/tasks scheduling features, data processing,
cluster resource management - to keep the system up (HA)
|
|
Hadoop Ozone - provides object data store features
|
|
MapReduce - a programming paradigm where
Map feature transforms data by
breaking into key value pairs
(tuple) and
Reduce uses output data of Map operation and aggregates
(or combine) the tuples into smaller datasets. The mapping is performed
first and its output is use by the reduce feature. The programming
languages used in MapReduce are C++,
Java, and Python. Use of
parallel processing algorithms and with reduced movement of data
massive (in range of
peta bytes)
data can processed
|