MapReduce: Simplified Data Processing on Large Clusters

Saturday, June 26, 2010

MapReduce: Simplified Data Processing on Large Clusters


Abstract
1. Introduction
2. Programming Model
    2.1 Example
    2.2 Types
    2.3 More Examples
3. Implementation
    3.1 Execution Overview
    3.2 Master Data Structure
    3.3 Fault Tolerance
        3.3.1 Worker Failure
        3.3.2 Master Failure
        3.3.3 Semantics in the Presence of Failures
    3.4 Locality
    3.5 Task Granularity
    3.6 Backup Tasks
4. Refinements
    4.1 Partitioning Function
    4.2 Ordering Guarantees
    4.3 Combiner Function
    4.4 Input and Output Types
    4.5 Side-effects
    4.6 Skipping Bad Records
    4.7 Local Execution
    4.8 Status Information
    4.9 Counters
5. Performance
    5.1 Cluster Configuration
    5.2 Grep
    5.3 Sort
    5.4 Effect of Backup Tasks
    5.5 Machine Failures
6. Experience
    6.1 Large-Scale Indexing
7. Related Work
8. Conclusions

Acknowledgements
References