so wrote 2 perl scripts practice map reduce. program supposed count words in bunch of text files put in directory.
this mapper.pl
#!/usr/bin/perl use 5.010; use strict; use warnings; while(my $line = <>) { @words = split(' ', $line); foreach $word(@words) { print "$word \t 1\n"; } } this reducer.pl
#!/bin/usr/perl use 5.010; use warnings; $currentword = ""; $currentcount = 0; ##use block testing reduce script test data. #open test file #open(my $fh, "<", "testdata.txt"); #while(!eof $fh) {} while(my $line = <>) { #remove \n chomp $line; #index 0 word, index 1 count value @linedata = split('\t', $line); $word = $linedata[0]; $count = $linedata[1]; if($currentword eq $word) { $currentcount = $currentcount + $count; } else { if($currentword ne "") { #output key we're finished working print "$currentword \t $currentcount \n"; } #switch current variables on next key $currentcount = $count; $currentword = $word; } } #deal last loop print "$currentword \t $currentcount \n"; so when run these using hadoop streaming command:
bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countwords/mapper.pl -mapper /home/hduser/countwords/mapper.pl -file /home/hduser/countwords/reducer.pl -reducer /home/hduser/countwords/reducer.pl -input /user/hduser/testdata/* -output /user/hduser/testdata/output/* i following error:
13/07/19 11:36:33 info streaming.streamjob: map 0% reduce 0% 13/07/19 11:36:39 info streaming.streamjob: map 9% reduce 0% 13/07/19 11:36:40 info streaming.streamjob: map 64% reduce 0% 13/07/19 11:36:41 info streaming.streamjob: map 73% reduce 0% 13/07/19 11:36:44 info streaming.streamjob: map 82% reduce 0% 13/07/19 11:36:45 info streaming.streamjob: map 100% reduce 0% 13/07/19 11:36:49 info streaming.streamjob: map 100% reduce 11% 13/07/19 11:36:53 info streaming.streamjob: map 100% reduce 0% 13/07/19 11:37:02 info streaming.streamjob: map 100% reduce 17% 13/07/19 11:37:03 info streaming.streamjob: map 100% reduce 33% 13/07/19 11:37:06 info streaming.streamjob: map 100% reduce 17% 13/07/19 11:37:08 info streaming.streamjob: map 100% reduce 0% 13/07/19 11:37:16 info streaming.streamjob: map 100% reduce 33% 13/07/19 11:37:21 info streaming.streamjob: map 100% reduce 0% 13/07/19 11:37:31 info streaming.streamjob: map 100% reduce 33% 13/07/19 11:37:35 info streaming.streamjob: map 100% reduce 17% 13/07/19 11:37:38 info streaming.streamjob: map 100% reduce 100% 13/07/19 11:37:38 info streaming.streamjob: kill job, run: 13/07/19 11:37:38 info streaming.streamjob: /usr/local/hadoop/libexec/../bin/hadoop job -dmapred.job.tracker=shiv0:54311 -kill job_201307031312_0065 13/07/19 11:37:38 info streaming.streamjob: tracking url: http://shiv0:50030/jobdetails.jsp?jobid=job_201307031312_0065 13/07/19 11:37:38 error streaming.streamjob: job not successful. error: # of failed reduce tasks exceeded allowed limit. failedcount: 1. lastfailedtask: task_201307031312_0065_r_000001 13/07/19 11:37:38 info streaming.streamjob: killjob... streaming command failed! i've been trying figure out i'm doing wrong while , keep scratching head. have advice on how can diagnose this?
bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countwords/mapper.py -mapper /home/hduser/countwords/mapper.py -file /home/hduser/countwords/reducer.py -reducer /home/hduser/countwords/reducer.py -input /user/hduser/testdata/* -output /user/hduser/testdata/output/*
why calling .py files? shouldn't calling perl files i.e. reducer.pl instead of reducer.py
Comments
Post a Comment