Importing data from RDBMS into Hadoop using sqoop

Apache Sqoop lets you import content of RDBMS into Hadoop. By default it will import content of a table into hadoop text file with columns separated by , and rows separated by new line. I wanted to try this feature out so i decided to import table from MySQL database on my local machine into HDFS using Sqoop
  1. First i created a CONTACT table in my local like this
      `contactid` int(11) NOT NULL,
      `FNAME` varchar(45) DEFAULT NULL,
      `LNAME` varchar(45) DEFAULT NULL,
      `EMAIL` varchar(45) DEFAULT NULL,
      PRIMARY KEY (`contactid`)
  2. Then i had to add few records into CONTACT table by using this syntax
    INSERT INTO `test`.`CONTACT`(`contactid`,`FNAME`,`LNAME`,`EMAIL`)VALUES(1,'Sunil','Patil','');
  3. Then on the command line i had to execute following command to run Sqoop so that it imports content of
    sqoop import --connect jdbc:mysql://localhost/test --table Contact 
    This command tells sqoop to connect to test database in mysql on localhost jdbc:mysql://localhost/test and import content of CONTACT table.
  4. After executing the command when i looked into the HDFS i could see that there is Contact directory (same as table name, if you want to use different directory name then table name pass --target-dir argument ), that directory contains 4 files.
  5. Now if i look inside one of the part-m files i could see it has content of CONTACT table dumped inside it like this
  6. By default sqoop opens multiple threads to import content of the table. If you want you can control number of map jobs it runs. In my case the CONTACT table has only 12 rows so i want sqoop to run only 1 map job, so i used following command
    sqoop import --connect jdbc:mysql://localhost/test --table Contact --target-dir contact1 -m 1 

No comments: