Hadoop originally started with two main components: MapReduce and HDFS.
MapReduce – Processing data from the cluster of computers or distributed data processing + Managing the computing resources of the cluster
HDFS – Storage of big data on multiple computers in a cluster or Distributed Storage
The major drawback of this system was the excess workload on the MapReduce component. Hence, Hadoop 2.0 introduced YARN.
MapReduce – distributed data processing
HDFS – Distributed Storage
YARN – Resource Management
Along with improvements in Hadoop technology, a handy and vibrant set of technologies started getting developed to improve Big Data development in Hadoop. These technologies came to be known as the Hadoop Ecosystem. These technologies are built upon the existing MapReduce concept.
Each such technology identified specific repetitive data processes which could be automated through configurations and abstractions instead of reinventing the wheel every time and coding in Java for MapReduce. Some such essential technologies are:
SQOOP: It is an application that simplifies the data transfer between RDBMS and HDFS.
Oozie: A dedicated workflow scheduler to manage activities like data movement, processing on the scheduled time
Hive: It enables an SQL-style user interface to interact with the Hadoop for easy development and fetching of data.
Other technologies, like HBase and Pig, are also based on similar fundamentals.