Posts

Showing posts from June, 2013

Simple Introduction to Map Reduce

Image
The term MapReduce actually is a compound word, which simply is a programming model/architecture used for processing large data sets in parallel, normally in a distributed setting. Image Sourced from : https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkIa1Hh1GzJ-2yyiUOMsSuAptPuXWcI6blhfCE0gXZ4zCggqQKnqkYmQwuJOTZOwDXsif5UWfrpnCmMKoSGTzk8wLgvJt1MeZbr5YBMh5pgnIeNcHyM5frpDW35UcYNUgjgSrMKBcY9oVr/s1600/WordCountFlow.JPG The above figure shows the typical phase in a MapReduce program. Phase 1 : In the initial phase the file contents are been read by the program into an InputStream. Phase 2 : In  this phase  (Some applications will combine phase 2 & 3 (mapping)) each line of the input file is read into a separate mapper instance that will be executed in parallel, sometimes in a distributed setting. Phase 3 : In this phase e ach line from the previous phase is then fed into a Map function that tokenize each term/data item  and thereby converting it into