Thursday, June 24, 2010

FlumeJava: Easy, Efficient Data-Parallel Pipelines


Abstract
1. Introduction
2. Background on MapReduce
3. The FlumeJava Library
3.1 Core Abstractions
3.2 Derived Operations
3.3 Deferred Evaluation
3.4 PObjects
4. Optimizer
4.1 ParallelDo Fusion
4.2 The MapShuffleCombineReduce (MSCR) Operation
4.3 MSCR Fusion
4.4 Overall Optimizer Strategy
4.5 Example: SiteData
4.6 Optimizer Limitations and Future Work
5. Executor
6. Evaluation
6.1 User Adoption and Experience
6.2 Optimizer Effectiveness
6.3 Execution Performance
7. Related Work
8. Conclusion