Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Part of the difficulty is that there are very few open source _applications_ (vs frameworks) built using MapReduce available to study. The only ones I know of are:

     https://github.com/snowplow/snowplow
     https://github.com/PredictionIO/PredictionIO
We (Snowplow) use MapReduce primarily to:

1. Scale our event enrichment process horizontally - raw events come in, we validate them, enrich them (IP -> geo etc), store them. With MapReduce, we just throw more boxes at the enrichment process for larger users (we enrich 200m events in ~90 mins on 6 x c3.2xlarges, spot cost of $0.58)

2. Do easy recomputations across user's full history of raw events - e.g. we add a new enrichment or a user's business logic changes, we can rerun over their full history going back to 2012

Hope this helps!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: