perl - why is my reducer is failing? (Hadoop) -


so wrote 2 perl scripts practice map reduce. program supposed count words in bunch of text files put in directory.

this mapper.pl

#!/usr/bin/perl  use 5.010; use strict; use warnings;  while(my $line = <>) {     @words = split(' ', $line);      foreach $word(@words) {         print "$word \t 1\n";     } } 

this reducer.pl

#!/bin/usr/perl  use 5.010; use warnings;  $currentword = ""; $currentcount = 0;  ##use block testing reduce script test data. #open test file #open(my $fh, "<", "testdata.txt"); #while(!eof $fh) {}  while(my $line = <>) {     #remove \n     chomp $line;      #index 0 word, index 1 count value     @linedata = split('\t', $line);     $word = $linedata[0];     $count = $linedata[1];      if($currentword eq $word) {         $currentcount = $currentcount + $count;     } else {         if($currentword ne "") {             #output key we're finished working             print "$currentword \t $currentcount \n";         }         #switch current variables on next key         $currentcount = $count;         $currentword = $word;     } }  #deal last loop  print "$currentword \t $currentcount \n"; 

so when run these using hadoop streaming command:

bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countwords/mapper.pl -mapper /home/hduser/countwords/mapper.pl -file /home/hduser/countwords/reducer.pl -reducer /home/hduser/countwords/reducer.pl -input /user/hduser/testdata/* -output /user/hduser/testdata/output/* 

i following error:

13/07/19 11:36:33 info streaming.streamjob:  map 0%  reduce 0% 13/07/19 11:36:39 info streaming.streamjob:  map 9%  reduce 0% 13/07/19 11:36:40 info streaming.streamjob:  map 64%  reduce 0% 13/07/19 11:36:41 info streaming.streamjob:  map 73%  reduce 0% 13/07/19 11:36:44 info streaming.streamjob:  map 82%  reduce 0% 13/07/19 11:36:45 info streaming.streamjob:  map 100%  reduce 0% 13/07/19 11:36:49 info streaming.streamjob:  map 100%  reduce 11% 13/07/19 11:36:53 info streaming.streamjob:  map 100%  reduce 0% 13/07/19 11:37:02 info streaming.streamjob:  map 100%  reduce 17% 13/07/19 11:37:03 info streaming.streamjob:  map 100%  reduce 33% 13/07/19 11:37:06 info streaming.streamjob:  map 100%  reduce 17% 13/07/19 11:37:08 info streaming.streamjob:  map 100%  reduce 0% 13/07/19 11:37:16 info streaming.streamjob:  map 100%  reduce 33% 13/07/19 11:37:21 info streaming.streamjob:  map 100%  reduce 0% 13/07/19 11:37:31 info streaming.streamjob:  map 100%  reduce 33% 13/07/19 11:37:35 info streaming.streamjob:  map 100%  reduce 17% 13/07/19 11:37:38 info streaming.streamjob:  map 100%  reduce 100% 13/07/19 11:37:38 info streaming.streamjob: kill job, run: 13/07/19 11:37:38 info streaming.streamjob: /usr/local/hadoop/libexec/../bin/hadoop job  -dmapred.job.tracker=shiv0:54311 -kill job_201307031312_0065 13/07/19 11:37:38 info streaming.streamjob: tracking url: http://shiv0:50030/jobdetails.jsp?jobid=job_201307031312_0065 13/07/19 11:37:38 error streaming.streamjob: job not successful. error: # of failed reduce tasks exceeded allowed limit. failedcount: 1. lastfailedtask: task_201307031312_0065_r_000001 13/07/19 11:37:38 info streaming.streamjob: killjob... streaming command failed! 

i've been trying figure out i'm doing wrong while , keep scratching head. have advice on how can diagnose this?

bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countwords/mapper.py -mapper /home/hduser/countwords/mapper.py -file /home/hduser/countwords/reducer.py -reducer /home/hduser/countwords/reducer.py -input /user/hduser/testdata/* -output /user/hduser/testdata/output/*

why calling .py files? shouldn't calling perl files i.e. reducer.pl instead of reducer.py


Comments