Sometime back i blogged about
HelloWorld - Apache Storm Word Counter program , which demonstrates how to build WordCount program using Apache Storm. Now problem with that project was that it was not Maven project instead i had screen shot of all the jars that you will have to include in the program. So i changed it to use Apache Maven as build framework. You can download the
source code.
In addition to normal API, storm also provides trident API, which allows us to build much compact code, i wanted to try that out so i built this simple Word Count program using Trident API.
While using Trident API you will have to start by creating object of
TridentTopology
, you still need
LineReaderSpout
that takes file path as input, reads and emits one line of file at a time. But the part that is different is you dont need
WordSpitterBolt
and
WordCounterBolt
, instead you can use compact code like this
topology.newStream("spout",lineReaderSpout)
.each(new Fields("line"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.aggregate(new Fields("word"), new Count(), new Fields("count"))
.each(new Fields("word","count"), new Debug());
The
each(new Fields("line"), new Split(), new Fields("word"))
line takes the line emitted by the
LineReaderSpout
and uses built in
storm.trident.operation.builtin.Split
function to split the lines into words and emits each word as Tuple.
The
groupBy(new Fields("word"))
line takes the tuples and groups them by word's. The
aggregate(new Fields("word"), new Count(), new Fields("count"))
line takes care of aggregating the words and counts them(At this point you have a tuple like
{word,count}
), for that it uses
storm.trident.operation.builtin.Count
class. The last part is
.each(new Fields("word","count"), new Debug());
, which takes care of printing each tuple which in
WORD count
format.
Trident API provides set of sample classes that makes developing WordCount type of program very easy. But you could have created your own version of Split and Count program and the code would still look significantly compact
2 comments:
Nicely explained... +1 Bro
Rakesh
my trident topology works in local mode but in cluster mode it is not working.
i am using storm-cassandra-cql to insert into cassandra.Data gets inserted in local mode, but when we run the topology in cluster mode data is not getting inserted.what could be the reason?
Post a Comment