- Change the value of
scp.host
to point to my hadoop vm, if you changed the username and password on your VM you will have to change it too - Next i have to change the value of
mainClass
attribute to point to correct class for the MapReduce program that i am developing. In this case name of the driver class itscom.spnotes.hadoop.WordCountDriver
- Then i have to change the value of command attribute in
sshexec
element. THe command is made up of different parts
in this ${scp.dirCopyTo}/${project.build.finalName}.jar points to the .jar file that is being scp to the VM. books/dickens.txt is path of the input text file, in this case i am using hdfs as input location which points tohadoop jar ${scp.dirCopyTo}/${project.build.finalName}.jar books/dickens.txt wordcount/outputs
hdfs://localhost/user/cloudera/books/dickens.txt
and the output of mapreduce will get generated inhdfs://localhost/user/cloudera/wordcount/outputs
maven antrun:run
command for executing the maven script task that deploys the maperduce jar to the cloudera vm and executes it. You can execute the full project from here
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<project xmlns="http://maven.apache.org/POM/4.0.0" | |
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | |
<modelVersion>4.0.0</modelVersion> | |
<groupId>com.spnotes.hadoop</groupId> | |
<artifactId>HadoopWordCount</artifactId> | |
<version>1.0-SNAPSHOT</version> | |
<properties> | |
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | |
<hadoop.version>2.0.0-cdh4.0.0</hadoop.version> | |
<scp.user>cloudera</scp.user> | |
<scp.password>cloudera</scp.password> | |
<scp.host>172.16.225.176</scp.host> | |
<scp.dirCopyTo>/home/cloudera/test</scp.dirCopyTo> | |
</properties> | |
<dependencies> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-hdfs</artifactId> | |
<version>${hadoop.version}</version> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-auth</artifactId> | |
<version>${hadoop.version}</version> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-common</artifactId> | |
<version>${hadoop.version}</version> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-core</artifactId> | |
<version>2.0.0-mr1-cdh4.0.1</version> | |
</dependency> | |
<dependency> | |
<groupId>junit</groupId> | |
<artifactId>junit</artifactId> | |
<version>4.10</version> | |
<scope>test</scope> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.mrunit</groupId> | |
<artifactId>mrunit</artifactId> | |
<version>1.0.0</version> | |
<classifier>hadoop2</classifier> | |
<scope>test</scope> | |
</dependency> | |
</dependencies> | |
<build> | |
<finalName>Sorting</finalName> | |
<plugins> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-compiler-plugin</artifactId> | |
<version>2.1</version> | |
<configuration> | |
<source>1.6</source> | |
<target>1.6</target> | |
</configuration> | |
</plugin> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-jar-plugin</artifactId> | |
<configuration> | |
<archive> | |
<manifest> | |
<mainClass>com.spnotes.hadoop.WordCountDriver</mainClass> | |
</manifest> | |
</archive> | |
</configuration> | |
</plugin> | |
<plugin> | |
<artifactId>maven-antrun-plugin</artifactId> | |
<configuration> | |
<tasks> | |
<scp todir="${scp.user}:${scp.password}@${scp.host}:/${scp.dirCopyTo}" | |
file="${project.build.directory}/${project.build.finalName}.jar" | |
trust="true" failonerror="false"> | |
</scp> | |
<sshexec host="${scp.host}" username="${scp.user}" | |
password="${scp.password}" | |
command="hadoop jar ${scp.dirCopyTo}/${project.build.finalName}.jar books/dickens.txt wordcount/outputs2" /> | |
</tasks> | |
</configuration> | |
<dependencies> | |
<dependency> | |
<groupId>ant</groupId> | |
<artifactId>ant-jsch</artifactId> | |
<version>1.6.5</version> | |
</dependency> | |
<dependency> | |
<groupId>com.jcraft</groupId> | |
<artifactId>jsch</artifactId> | |
<version>0.1.42</version> | |
</dependency> | |
</dependencies> | |
</plugin> | |
</plugins> | |
</build> | |
<repositories> | |
<repository> | |
<id>Apache Repository</id> | |
<url>https://repository.apache.org/content/repositories</url> | |
</repository> | |
<repository> | |
<id>cloudera</id> | |
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url> | |
</repository> | |
</repositories> | |
</project> |
1 comment:
Seems more research has been done to create this blog as the information is very good on this blog. To this I also attending hadoop online training, which is adding to my knowledge more.
Post a Comment