I wanted to figure out how to Configure Twitter as source for Flume so i tried these steps
- First go to Twitter Application Management page and configure application. This should give you consumerKey, consumerSecret, accessToken and accessTokenSecret
-
Next create twitterflume.properties, that looks like this. You should create source of
org.apache.flume.source.twitter.TwitterSource
type and use the 4 values you got in the last step to configure access to twitter
agent1.sources = twitter1
agent1.sinks = logger1
agent1.channels = memory1
agent1.sources.twitter1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.twitter1.consumerKey =<consumerkey>
agent1.sources.twitter1.consumerSecret =<consumerSecret>
agent1.sources.twitter1.accessToken =<accessToken>
agent1.sources.twitter1.accessTokenSecret =<accessTokenSecret>
agent1.sources.twitter1.keywords = bigdata, hadoop
agent1.sources.twitter1.maxBatchSize = 10
agent1.sources.twitter1.maxBatchDurationMillis = 200
# Describe the sink
agent1.sinks.logger1.type = logger
# Use a channel which buffers events in memory
agent1.channels.memory1.type = memory
agent1.channels.memory1.capacity = 1000
agent1.channels.memory1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.twitter1.channels = memory1
agent1.sinks.logger1.channel = memory1
-
Now last step is to run the flume agent and you should see twitter messages being dumped to console
bin/flume-ng agent --conf conf --conf-file conf/twitterflume.properties --name agent1 -Dflume.root.logger=DEBUG,console
Note: When i tried this in the Hadoop Sandbox i started getting following authentication error, it seems the problem is that if your VM time is in the past then this causes this issue. Ex. when i did execute the
date
command on my sandbox i got date which was 3 days in the past. So i did restart the VM and after restart when i tried
date
command it gave me accurate time and the following error went away
[Twitter Stream consumer-1[Establishing connection]] ERROR
org.apache.flume.source.twitter.TwitterSource (TwitterSource.java:331) -
Exception while streaming tweets
stream.twitter.com
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=d0031b0b or
http://www.google.co.jp/search?q=1db75522
TwitterException{exceptionCode=[d0031b0b-1db75522 db667dea-99334ae4],
statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null,
version=3.0.3}
at
twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:192)
at
twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61)
at
twitter4j.internal.http.HttpClientWrapper.get(HttpClientWrapper.java:89)
at
twitter4j.TwitterStreamImpl.getSampleStream(TwitterStreamImpl.java:176)
at twitter4j.TwitterStreamImpl$4.getStream(TwitterStreamImpl.java:164)
at
twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run
(TwitterStreamImpl.java:462)
Caused by: java.net.UnknownHostException: stream.twitter.com
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:637)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.
getNewHttpClient
(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect
(HttpURLConnection.java:933)
at
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect
(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream
(HttpURLConnection.java:1301)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode
(HttpsURLConnectionImpl.java:338)
at twitter4j.internal.http.HttpResponseImpl.
(HttpResponseImpl.java:34)
at
twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:156)
I am getting the same error. My dates are fine. Restarted the system also... Does not work. Please help.
ReplyDeleteI've caught the same problem.And I have not saved it yet.
ReplyDeletei am facing the same error. can somebody help?
ReplyDeletejust perform
ReplyDeletehduser@ubuntu64server:~/apache-flume-1.6.0-bin/conf$ nslookup stream.twitter.com
;; connection timed out; no servers could be reached
This is what causing the issue.
As you know, businesses of all sizes right from McDonald’s and Coca-Cola down to your local hardware store are trying to get a presence on social media sites such as Facebook and Twitter. Think of how many ‘Fan pages’ and advertisements you have seen on Facebook
ReplyDeleterecently for businesses in your local area.
It’s a big thing right now and it’s making people just like you a lot of money.
https://clicktrix.com?david6258
Thanks for info....
ReplyDeleteWebsite development in Bangalore
I have done this successfully thanks for sharing, for my website development services website
ReplyDeleteetl testing online course
ReplyDeleteweb methods online course
business analyst training