Importing data from RDBMS into Hadoop using sqoop

Apache Sqoop lets you import content of RDBMS into Hadoop. By default it will import content of a table into hadoop text file with columns separated by , and rows separated by new line. I wanted to try this feature out so i decided to import table from MySQL database on my local machine into HDFS using Sqoop
  1. First i created a CONTACT table in my local like this
    
    CREATE TABLE `CONTACT` (
      `contactid` int(11) NOT NULL,
      `FNAME` varchar(45) DEFAULT NULL,
      `LNAME` varchar(45) DEFAULT NULL,
      `EMAIL` varchar(45) DEFAULT NULL,
      PRIMARY KEY (`contactid`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
  2. Then i had to add few records into CONTACT table by using this syntax
    
    INSERT INTO `test`.`CONTACT`(`contactid`,`FNAME`,`LNAME`,`EMAIL`)VALUES(1,'Sunil','Patil','sdpatil@gmail.com');
    
  3. Then on the command line i had to execute following command to run Sqoop so that it imports content of
    
    sqoop import --connect jdbc:mysql://localhost/test --table Contact 
    
    This command tells sqoop to connect to test database in mysql on localhost jdbc:mysql://localhost/test and import content of CONTACT table.
  4. After executing the command when i looked into the HDFS i could see that there is Contact directory (same as table name, if you want to use different directory name then table name pass --target-dir argument ), that directory contains 4 files.
  5. Now if i look inside one of the part-m files i could see it has content of CONTACT table dumped inside it like this
  6. By default sqoop opens multiple threads to import content of the table. If you want you can control number of map jobs it runs. In my case the CONTACT table has only 12 rows so i want sqoop to run only 1 map job, so i used following command
    
    sqoop import --connect jdbc:mysql://localhost/test --table Contact --target-dir contact1 -m 1 
    

4 comments:

Unknown said...

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in IBM COGNOS VARICENT, kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on IBM COGNOS VARICENT. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Saurabh Srivastava
MaxMunus
E-mail: saurabh@maxmunus.com
Skype id: saurabhmaxmunus
Ph:+91 8553576305 / 080 - 41103383
http://www.maxmunus.com/

Tejuteju said...

Really nice blog post. provided a helpful information. I hope that you will post more updates like thisBig Data Hadoop Online Training

Abhi said...

Thanks for info....
Website development in Bangalore

lakshmibhucynix said...

Great, thanks for sharing this post.Much thanks again. Awesome.
best machine learning course in india
best machine learning course online