Part of the difficulty is that there are very few open source _applications_ (vs frameworks) built using MapReduce available to study. The only ones I know of are:
1. Scale our event enrichment process horizontally - raw events come in, we validate them, enrich them (IP -> geo etc), store them. With MapReduce, we just throw more boxes at the enrichment process for larger users (we enrich 200m events in ~90 mins on 6 x c3.2xlarges, spot cost of $0.58)
2. Do easy recomputations across user's full history of raw events - e.g. we add a new enrichment or a user's business logic changes, we can rerun over their full history going back to 2012
1. Scale our event enrichment process horizontally - raw events come in, we validate them, enrich them (IP -> geo etc), store them. With MapReduce, we just throw more boxes at the enrichment process for larger users (we enrich 200m events in ~90 mins on 6 x c3.2xlarges, spot cost of $0.58)
2. Do easy recomputations across user's full history of raw events - e.g. we add a new enrichment or a user's business logic changes, we can rerun over their full history going back to 2012
Hope this helps!