Showing posts with label aws. Show all posts
Showing posts with label aws. Show all posts

MapReduce program that reads input files from S3 and writes output to S3

In the WordCount(HelloWorld) MapReduce program entry i talked about how to create a simple WordCount Map Reducer program with Hadoop. I wanted to change it to so that it reads input files from Amazon S3 bucket and writes output back to Amazon S3 bucket, so i built S3MapReduce program, that you can download from here. I followed these steps
  1. First create 2 buckets one for storing input and other for storing output in your Amazon S3 account. Most important issue here is to make sure that you create your buckets in US Standard region, if you dont do that then additional steps might be required for Hadoop to be able to access your buckets Name of input bucket in my case is com.spnotes.hadoop.wordcount.books
    Name of the output bucket is com.spnotes.hadoop.wordcount.output
  2. Upload few .txt files that you want to use as input in your input bucket like this
  3. Next step is to create MapReduce program like this, In my case one Java class has code for Mapper, Reducer and driver class. Most of the code in the MapReduce is same only difference is for working with S3 you will have to add few S3 specific properties like this, basically you need to set your accessKey and secretAccessKey that you can get from AWS Security console and paste it here. You will also have to tell Hadoop to use s3n as file system.
    
    //Replace this value
    job.getConfiguration().set("fs.s3n.awsAccessKeyId", "awsaccesskey");
    //Replace this value
    job.getConfiguration().set("fs.s3n.awsSecretAccessKey","awssecretaccesskey");
    job.getConfiguration().set("fs.default.name","s3n://com.spnotes.hadoop.input.books");
    
  4. Now last step is to execute this program, it takes 2 inputs, You can just right click on your S3MapReduce program and say execute with following 2 parameters
    
    s3n://com.spnotes.hadoop.wordcount.books s3n://com.spnotes.hadoop.wordcount.output/output3
    
  5. Once the MapReduce is executed you can check the output by going to S3 console and looking at content of com.spnotes.hadoop.wordcount.output like this

Maven script for deploying project to Apache Tomcat on Amazon EC2

In the Deploying web application on remote tomcat post i listed steps for creating maven script that can install web application on remote Apache Tomcat. You could use the same concept to deploy web applications to Apache Tomcat running on Amazon EC2, by following these steps
  1. First change the tomcat-users.xml to add tomcatadmin user and give him manager-gui and manager-script roles
    
    <?xml version='1.0' encoding='utf-8'?>
    <tomcat-users>
    <role rolename="tomcat"/>
    
    <role rolename="manager-gui"/>
    <role rolename="manager-script"/>
        
    <user username="tomcatadmin" password="tomcatadmin" roles="tomcat,manager-gui,manager-script"/>
    
    <user username="both" password="tomcat" roles="tomcat"/>
    <user username="role1" password="tomcat" roles="role1"/>
    </tomcat-users>
    
  2. Next change your <MVN_HOME>/settings.xml file to add AWSServer server with tomcatadmin as username passowrd, you shoudl use the user id password that you set in your tomcat-users.xml file here
    
     <server>
          <id>AWSServer</id>
          <username>tomcatadmin</username>
          <password>tomcatadmin</password>
      </server>
    
  3. Last step is to add following plugin to you pom.xml, in this value of url will be the ec2 instance DNS name and the value of server would be equal to id of server you set in settings.xml
    
     <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>tomcat-maven-plugin</artifactId>
        <configuration>
            <url>http://ec2-54-84-128-249.compute-1.amazonaws.com:8080/manager/text</url>
            <server>AWSServer</server>
            <path>/simplewebapp</path>
        </configuration>
    </plugin>
    
  4. Now execute mvn tomcat:redeploy to deploy your code to tomcat

Install Apache Tomcat on Amazon EC2

Recently i wanted to install Apache Tomcat on my Amazon EC2 instance, i used Amazon Linux AMI for creating my Amazon Instance. Once the instance was started i had to
  1. While creating instance i checked the security group associated withe instance and changed it to add Custom TCP port 8080 in it. Its equivalent to opening 8080 port in firewall of your instance
  2. Once the instance was started, first thing that i did was to update yum by executing sudo yum update
  3. Next i typed yum list tomcat* command first to see what all tomcat packages are available for install and then i did execute sudo yum install tomcat7-webapp tomcat7-admin-webapp to install tomcat with deployer installed on it
  4. The yum installer takes care of installing Apache Tomcat and it creates a service that you can use for starting and stopping tomcat To start tomcat sudo service tomcat7 start To stop tomcat sudo service tomcat7 stop
When you use Yum for installing tomcat, you should be aware of following important directories
  1. /usr/share/tomcat7: This is the directory where your tomcat is installed
  2. /var/log/tomcat7: This is the directory where all your log files go. This folder is link to <TOMCAT_HOME>/logs
  3. /etc/tomcat7:This is the directory where you can find content of your <TOMCAT_HOME>/conf directory

Installing ElasticSearch on Amazon EC2

I wanted to set up ElasticSearch on Amazon EC2 and these are the steps that i followed to do that
  • Connect to your Amazon EC2 instance using SSH and execute following command to download ElasticSearch v 90.9 on my instance
    
    wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.9.tar.gz
    
    It took less than a minute to download the elasticsearch binary on the instance
  • Next i executed following command to unzip the elasticsearch-0.90.9.tar.gz
    
    tar -xf elasticsearch-0.90.9.tar.gz
    
  • After extracting the elasticsearch binaries start it by executing following command
    
    elasticsearch-0.90.9/bin/elasticsearch –f
    
  • After starting the ElasticSearch instance i could use the ES REST API from the SSH console on the same instance
    
    curl -XGET http://localhost:9200/_search
    
  • But since i want to access it from my local machine/outside the instance i had to open the 9200 port on that instances firewall that i could do by changing the security group and adding 9200 and 9300 ports to it
  • Then you can use the public DNS for your EC2 instance and query the REST instance from your local machine like this
    
    curl -XGET http://<yourinstancename>.us-west-1.compute.amazonaws.com:9200/_search