怎樣創(chuàng)建和使用Combiner組件？

更新時(shí)間:2020-11-12 來(lái)源:黑馬程序員瀏覽量:

　　在Map階段輸出可能會(huì)產(chǎn)生大量相同的數(shù)據(jù)，例如、……，勢(shì)必會(huì)降低Reduce聚合階段的執(zhí)行效率。Combiner組件的作用就是對(duì)Map階段的輸出的重復(fù)數(shù)據(jù)先做一次合并計(jì)算，然后把新的(key，value)作為Reduce階段的輸入。圖1描述的就是Combiner組件對(duì)Map的合并操作。

圖1 Combiner組件的合并操作

　　Combiner組件是MapReduce程序中的一種重要的組件，如果想自定義Combiner，我們需要繼承Reducer類(lèi)，并且重寫(xiě)reduce()方法。接下來(lái)，我們針對(duì)詞頻統(tǒng)計(jì)案例編寫(xiě)一個(gè)Combiner組件，演示如何創(chuàng)建和使用Combiner組件，具體代碼，如文件所示。

　　文件 WordCountCombiner.java

  import java.io.IOException;

   import org.apache.hadoop.io.IntWritable;

   import org.apache.hadoop.io.Text;

   import org.apache.hadoop.mapreduce.Reducer;

   public class WordCountCombiner extends Reducer<Text, 

                     IntWritable, Text, IntWritable> {

     @Override

     protected void reduce(Text key, Iterable<IntWritable> values,

         Reducer<Text, IntWritable, Text, IntWritable>.Context 

             context) throws IOException, InterruptedException {

      // 局部匯總

      int count = 0;

      for (IntWritable v : values) {

        count += v.get();

      }

      context.write(key, new IntWritable(count));

    }

  }

　　文件是自定義Combiner類(lèi)，它的作用就是將key相同的單詞匯總(這與WordCountReducer類(lèi)的reduce()方法相同，也可以直接指定WordCountReducer作為Combiner類(lèi))，另外還需要在主運(yùn)行類(lèi)中為Job設(shè)置Combiner組件即可，具體代碼如下：

　　wcjob.setCombinerClass(WordCountCombiner.class);

　　小提示：

　　執(zhí)行MapReduce程序，添加與不添加Combiner結(jié)果是一致的。通俗的講，無(wú)論調(diào)用多少次Combiner，Reduce的輸出結(jié)果都是一樣的，因?yàn)镃ombiner組件不允許改變業(yè)務(wù)邏輯。

猜你喜歡

MapReduce程序怎樣設(shè)置模式才能在在本地運(yùn)行

InputFormat接口的定義代碼如何設(shè)置