What is Hadoop?

Screenshot 2024-06-22 at 12.56.28 PM.png

How does Hadoop work?

3 main component:

  1. HDFS → Hadoop Distributed File system → Stores large data by taking a single cluster to hundreds of clusters
  2. Mapreduce → Processing unit of Hadoop, processes data by splitting it into smaller chunks, assigning each chunk to a different node in cluster for parrallel processing (at the same time)
    1. For a while, Mapreduce was the only way to access data stored in HDFS. Then came along Hive, and Pig, and so on
  3. YARN (Yet Another Resource Negogiator)→ Prepares the RAM, storage, and the CPU for hadoop to run data in batch, stream, interactive and graph processing which are stored in HDFS

Challenges of Hadoop

The drawbacks of Hadoop outweigh the benefits

Mapreduce