Compressed input as Hadoop MapReduce
Get serious problem of hadoop MapReduce for the compressed files. It seems that it takes longer time to process, and when sequentially process the data (de-compress them), data kinds of explod and crashed the nodes.
Here’s what I found:
If the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines).