- First i created a CONTACT table in my local like this
CREATE TABLE `CONTACT` ( `contactid` int(11) NOT NULL, `FNAME` varchar(45) DEFAULT NULL, `LNAME` varchar(45) DEFAULT NULL, `EMAIL` varchar(45) DEFAULT NULL, PRIMARY KEY (`contactid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then i had to add few records into CONTACT table by using this syntax
INSERT INTO `test`.`CONTACT`(`contactid`,`FNAME`,`LNAME`,`EMAIL`)VALUES(1,'Sunil','Patil','email@example.com');
- Then on the command line i had to execute following command to run Sqoop so that it imports content of
This command tells sqoop to connect to test database in mysql on localhost
sqoop import --connect jdbc:mysql://localhost/test --table Contact
jdbc:mysql://localhost/testand import content of CONTACT table.
- After executing the command when i looked into the HDFS i could see that there is Contact directory (same as table name, if you want to use different directory name then table name pass --target-dir argument ), that directory contains 4 files.
- Now if i look inside one of the part-m files i could see it has content of CONTACT table dumped inside it like this
- By default sqoop opens multiple threads to import content of the table. If you want you can control number of map jobs it runs. In my case the CONTACT table has only 12 rows so i want sqoop to run only 1 map job, so i used following command
sqoop import --connect jdbc:mysql://localhost/test --table Contact --target-dir contact1 -m 1
Importing data from RDBMS into Hadoop using sqoop
Apache Sqoop lets you import content of RDBMS into Hadoop. By default it will import content of a table into hadoop text file with columns separated by , and rows separated by new line. I wanted to try this feature out so i decided to import table from MySQL database on my local machine into HDFS using Sqoop