Rack Awareness protects your highly crucial data from the forces of nature.
What is a rack in Hadoop ?
A Rack is a collection of servers (usually 10 or more) that are:
➡ Datanodes in a rack are physically close to each other(within the same data center or network switch)
➡ The intra-rack Datanodes are connected through a local network switch and have high bandwidth and low latency communication among themselves.
Different cases of Rack usage
Store all the data block copies in the same rack.
In case of natural disaster or data center outage, all data is lost and recovery is a time-consuming process, hence cluster downtime.
Store each copy in a different rack
This is highly preferable for end users, as it provides high data availability and low latency due to presence in different geographies.
Issue: highly time consuming for data transfer between racks.
Case 3: Balanced approach
By default, Hadoop places block replicas in 2 racks.
This approach offers protection from natural disasters and geographically distributed data availability.