This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.spnotes.storm.trident; | |
import backtype.storm.Config; | |
import backtype.storm.LocalCluster; | |
import backtype.storm.LocalDRPC; | |
import backtype.storm.tuple.Fields; | |
import com.spnotes.storm.LineReaderSpout; | |
import storm.trident.TridentTopology; | |
import storm.trident.operation.builtin.Count; | |
import storm.trident.operation.builtin.Debug; | |
import storm.trident.testing.Split; | |
/** | |
* Created by Sunil Patil on 8/12/14. | |
*/ | |
public class TridentWordCountDriver { | |
public static void main(String[] argv){ | |
if(argv.length != 1){ | |
System.out.println("Please provide input file path"); | |
System.exit(-1); | |
} | |
String inputFile = argv[0]; | |
System.out.println("TridentWordCountDriver is starting for " + inputFile); | |
Config config = new Config(); | |
config.put("inputFile",inputFile); | |
LocalDRPC localDRPC = new LocalDRPC(); | |
LocalCluster localCluster = new LocalCluster(); | |
LineReaderSpout lineReaderSpout = new LineReaderSpout(); | |
TridentTopology topology = new TridentTopology(); | |
topology.newStream("spout",lineReaderSpout) | |
.each(new Fields("line"), new Split(), new Fields("word")) | |
.groupBy(new Fields("word")) | |
.aggregate(new Fields("word"), new Count(), new Fields("count")) | |
.each(new Fields("word","count"), new Debug()); | |
localCluster.submitTopology("WordCount",config,topology.build());; | |
} | |
} |
TridentTopology
, you still need LineReaderSpout
that takes file path as input, reads and emits one line of file at a time. But the part that is different is you dont need WordSpitterBolt
and WordCounterBolt
, instead you can use compact code like this
topology.newStream("spout",lineReaderSpout)
.each(new Fields("line"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.aggregate(new Fields("word"), new Count(), new Fields("count"))
.each(new Fields("word","count"), new Debug());
The each(new Fields("line"), new Split(), new Fields("word"))
line takes the line emitted by the LineReaderSpout
and uses built in storm.trident.operation.builtin.Split
function to split the lines into words and emits each word as Tuple.
The groupBy(new Fields("word"))
line takes the tuples and groups them by word's. The aggregate(new Fields("word"), new Count(), new Fields("count"))
line takes care of aggregating the words and counts them(At this point you have a tuple like {word,count}
), for that it uses storm.trident.operation.builtin.Count
class. The last part is .each(new Fields("word","count"), new Debug());
, which takes care of printing each tuple which in WORD count
format.
Trident API provides set of sample classes that makes developing WordCount type of program very easy. But you could have created your own version of Split and Count program and the code would still look significantly compact