i have job hadoop. when job stated, have number of mappers started. , each mapper write file disk, part-m-00000, part-m-00001. understand, each mapper create 1 part file. have big amount of data, there must more 1 mapper, can somehow control number of output files? mean, hadoop start, example 10 mappers, there 3 part files?
i found post how multiple reducers output 1 part-file in hadoop? there using old version of hadoop library. i'm using classes org.apache.hadoop.mapreduce.* , not org.apache.hadoop.mapred.*
i'm using hadoop version 0.20, , hadoop-core:1.2.0.jar
is there possibility this, using new hadoop api?
the number of output files equals number of reducers or number of mappers if there aren't reducers.
you can add single reducer job output mappers directed , single output file. note less efficient data (output of mappers) sent on wire (network io) node reducer run. since single process (eventually) data run slower.
by wat,the fact there multiple parts shouldn't significant can pass directory containing them subsequent jobs
Comments
Post a Comment