map-reduce编程核心问题
1-How do we break up a large problem into smaller tasks? More specically, how do
we decompose the problem so that the smaller tasks can be executed in parallel?
2- How do we assign tasks to workers distributed across a potentially large number
of machines (while keeping in mind that some workers are better suited to running
some tasks than others, e.g., due to available resources, locality constraints, etc.)?
3-How do we ensure that the workers get the data they need?
4-How do we coordinate synchronization among the dierent workers?
5-How do we share partial results from one worker that is needed by another?
6- How do we accomplish all of the above in the face of software errors and hardware
faults?