Erasure Codes in Facebook's Data Warehouse
Ramkumar Vadali, Facebook Wednesday, Feb 22nd, EEB 248, 2:00pm
Ramkumar Vadali is part of the Data Infrastructure group in Facebook, where he works on improving hadoop to meet the ever-growing data storage and processing needs of Facebook. His interests include bulding highly scalable distributed systems, and using Erasure Codes in distributed storage systems.
Facebook's data warehouse runs on Hadoop, and is the world's largest installation of Hadoop. At over 30PB, the 3-way replication of Hadoop Distributed File System (HDFS) is really expensive. HDFS RAID is an implementation of Erasure Codes in HDFS to reduce the storage cost of the warehouse. Facebook's warehouse uses XOR RAID for recent data and Reed-Solomon RAID for older data. More recent work includes a collaboration with Prof. Alex Dimakis to incorporate Simple Regenerating Codes. This talk will provide an overview of Hadoop usage in the warehouse, the need for HDFS RAID and some implementation and operational aspects of HDFS RAID.
Back to CommNetS Seminar Homepage