Rack Awareness protects your highly crucial data from the forces of nature.
What is a rack in Hadoop ?
A Rack is a collection of servers (usually 10 or more) that are:
➡ Datanodes in a rack are physically close to each other(within the same data center or network switch)
➡ The intra-rack Datanodes are connected through a local network switch and have high bandwidth and low latency communication among themselves.
Different cases of Rack usage
Case 1:
Store all the data block copies in the same rack.
In case of natural disaster or data center outage, all data is lost and recovery is a time-consuming process, hence cluster downtime.
Case 2:
Store each copy in a different rack
This is highly preferable for end users, as it provides high data availability and low latency due to presence in different geographies.
Issue: highly time consuming for data transfer between racks.
Case 3: Balanced approach
By default, Hadoop places block replicas in 2 racks.
This approach offers protection from natural disasters and geographically distributed data availability.