[Sqoop]Sqoop使用

Sqoop的本质还是一个命令行工具,和HDFS,MapReduce相比,并没有什么高深的理论。

我们可以通过sqoop help命令来查看sqoop的命令选项,如下:

  1. 16/11/13 20:10:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. usage: sqoop COMMAND [ARGS]
  3. Available commands:
  4.  codegen            Generate code to interact with database records
  5.  create-hive-table  Import a table definition into Hive
  6.  eval               Evaluate a SQL statement and display the results
  7.  export             Export an HDFS directory to a database table
  8.  help               List available commands
  9.  import             Import a table from a database to HDFS
  10.  import-all-tables  Import tables from a database to HDFS
  11.  import-mainframe   Import datasets from a mainframe server to HDFS
  12.  job                Work with saved jobs
  13.  list-databases     List available databases on a server
  14.  list-tables        List available tables in a database
  15.  merge              Merge results of incremental imports
  16.  metastore          Run a standalone Sqoop metastore
  17.  version            Display version information
  18. See ‘sqoop help COMMAND‘ for information on a specific command.

其中使用频率最高的选项还是import 和 export 选项。

1. codegen

将关系型数据库表的记录映射为一个Java文件,Java class类以及相关的jar包,该命令将数据库表的记录映射为一个Java文件,在该Java文件中对应有表的各个字段。生成的jar和class文件在Metastore功能使用时会用到。该命令选项的参数如下图所示:

举例:

  1. sqoop codegen --connect jdbc:mysql://localhost:3306/test --table order_info -outdir /home/xiaosi/test/ --username root -password root

上面实例以test数据库的order_info表来生成Java代码,其中-outdir指定了Java代码生成的路径

运行结果信息如下:

  1. 16/11/13 21:50:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/13 21:50:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. 16/11/13 21:50:38 INFO tool.CodeGenTool: Beginning code generation
  5. 16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
  6. 16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
  7. 16/11/13 21:50:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2
  8. 注: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.java使用或覆盖了已过时的 API。
  9. 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
  10. 16/11/13 21:50:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.jar

我们还可以使用-bindir指定编译成的class文件以及将生成文件打包为jar的jar包文件输出路径:

  1. 16/11/13 21:53:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/13 21:53:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. 16/11/13 21:53:58 INFO tool.CodeGenTool: Beginning code generation
  5. 16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
  6. 16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
  7. 16/11/13 21:53:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2
  8. 注: /home/xiaosi/data/order_info.java使用或覆盖了已过时的 API。
  9. 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
  10. 16/11/13 21:53:59 INFO orm.CompilationManager: Writing jar file: /home/xiaosi/data/order_info.jar

上面实例指定编译成的class文件(order_info.class)以及将生成文件打包为jar的jar包文件(order_info.jar)输出路径为/home/xiaosi/data路径,java文件(order_info.java)路径为/home/xiaosi/test

2. create-hive-table

这个命令上一篇文章[Sqoop导入与导出]中已经使用过了,作用就是生成与关系数据库表的表结构对应的Hive表。该命令选项的参数如下图所示:

举例:

  1. sqoop create-hive-table --connect jdbc:mysql://localhost:3306/test --table employee --username root -password root --fields-terminated-by ‘,‘

3. eval

eval命令选项可以让Sqoop使用SQL语句对关系性数据库进行操作,在使用import这种工具进行数据导入的时候,可以预先了解相关的SQL语句是否正确,并能将结果显示在控制台。

3.1 选择查询评估计算

使用eval工具,我们可以评估计算任何类型的SQL查询。我们以test数据库的order_info表为例子:

  1. sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "select * from order_info limit 3" -P

运行结果信息:

  1. 16/11/13 22:25:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/13 22:25:22 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. ------------------------------------------------------------
  5. | id                   | order_time           | business   |
  6. ------------------------------------------------------------
  7. | 358574046793404      | 2016-04-05           | flight     |
  8. | 358574046794733      | 2016-08-03           | hotel      |
  9. | 358574050631177      | 2016-05-08           | vacation   |
  10. ------------------------------------------------------------

3.2 插入评估计算

Sqoop的eval工具可以适用于两个模拟和定义的SQL语句。这意味着,我们可以使用eval的INSERT语句了。下面的命令用于在test数据库的order_info表中插入新行:

  1. sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "insert into order_info (id, order_time, business) values(‘358574050631166‘, ‘2016-11-13‘, ‘hotel‘)" -P

运行结果信息输出:

  1. 16/11/13 22:29:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/13 22:29:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. 16/11/13 22:29:44 INFO tool.EvalSqlTool: 1 row(s) updated.

如果命令成功执行,会在控制台上显示更新的行的状态。或者我们可以在mysql中查询我们刚插入的那条信息:

  1. mysql> select * from order_info where id = "358574050631166";
  2. +-----------------+------------+----------+
  3. | id              | order_time | business |
  4. +-----------------+------------+----------+
  5. | 358574050631166 | 2016-11-13 | hotel    |
  6. +-----------------+------------+----------+
  7. 1 row in set (0.00 sec)

4. export

从HDFS中将数据导出到关系性数据库中。该命令选项的参数如下图所示:

举例:

在HDFS文件中的员工数据的一个例子,数据如下:

  1. hadoop fs -text /user/xiaosi/employee/* | less
  2. yoona,qunar,创新事业部
  3. xiaosi,qunar,创新事业部
  4. jim,ali,淘宝
  5. kom,ali,淘宝
  6. lucy,baidu,搜索
  7. jim,ali,淘宝

在将HDFS中数据导出到关系性数据库时,必须在关系性数据库中新建一张来接受数据的表,如下:

  1. CREATE TABLE `employee` (
  2.  `name` varchar(255) DEFAULT NULL,
  3.  `company` varchar(255) DEFAULT NULL,
  4.  `depart` varchar(255) DEFAULT NULL
  5. );

下面执行导出操作,命令如下:

  1. sqoop export --connect jdbc:mysql://localhost:3306/test --table employee --export-dir /user/xiaosi/employee --username root -m 1 --fields-terminated-by ‘,‘ -P

运行结果信息输出:

  1. 16/11/13 23:40:49 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
  2. 16/11/13 23:40:49 INFO mapreduce.Job: Running job: job_local611430785_0001
  3. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter set in config null
  4. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter
  5. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Waiting for map tasks
  6. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Starting task: attempt_local611430785_0001_m_000000_0
  7. 16/11/13 23:40:49 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
  8. 16/11/13 23:40:49 INFO mapred.MapTask: Processing split: Paths:/user/xiaosi/employee/part-m-00000:0+120
  9. 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
  10. 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
  11. 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
  12. 16/11/13 23:40:49 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
  13. 16/11/13 23:40:49 INFO mapred.LocalJobRunner:
  14. 16/11/13 23:40:49 INFO mapred.Task: Task:attempt_local611430785_0001_m_000000_0 is done. And is in the process of committing
  15. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: map
  16. 16/11/13 23:40:49 INFO mapred.Task: Task ‘attempt_local611430785_0001_m_000000_0‘ done.
  17. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local611430785_0001_m_000000_0
  18. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: map task executor complete.
  19. 16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 running in uber mode : false
  20. 16/11/13 23:40:50 INFO mapreduce.Job:  map 100% reduce 0%
  21. 16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 completed successfully
  22. 16/11/13 23:40:50 INFO mapreduce.Job: Counters: 20
  23. File System Counters
  24. FILE: Number of bytes read=22247825
  25. FILE: Number of bytes written=22732498
  26. FILE: Number of read operations=0
  27. FILE: Number of large read operations=0
  28. FILE: Number of write operations=0
  29. HDFS: Number of bytes read=126
  30. HDFS: Number of bytes written=0
  31. HDFS: Number of read operations=12
  32. HDFS: Number of large read operations=0
  33. HDFS: Number of write operations=0
  34. Map-Reduce Framework
  35. Map input records=6
  36. Map output records=6
  37. Input split bytes=136
  38. Spilled Records=0
  39. Failed Shuffles=0
  40. Merged Map outputs=0
  41. GC time elapsed (ms)=0
  42. Total committed heap usage (bytes)=245366784
  43. File Input Format Counters
  44. Bytes Read=0
  45. File Output Format Counters
  46. Bytes Written=0
  47. 16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Transferred 126 bytes in 2.3492 seconds (53.6344 bytes/sec)
  48. 16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Exported 6 records.

导出完毕之后,我们可以在mysql中通过employee表进行查询:

  1. mysql> select name, company from employee;
  2. +--------+---------+
  3. | name   | company |
  4. +--------+---------+
  5. | yoona  | qunar   |
  6. | xiaosi | qunar   |
  7. | jim    | ali     |
  8. | kom    | ali     |
  9. | lucy   | baidu   |
  10. | jim    | ali     |
  11. +--------+---------+
  12. 6 rows in set (0.00 sec)

5. import

将数据表中的数据导入HDFS或者Hive中,该命令选项的参数如下图所示:

举例:

  1. sqoop import --connect jdbc:mysql://localhost:3306/test --target-dir /user/xiaosi/data/order_info --query ‘select * from order_info where $CONDITIONS‘ -m 1 --username root -P

如上代码从查询结果中导入数据到HDFS中,存储路径由--target-dir参数指定。这里,使用了--query选项,不能同时与--table选项使用。同时,变量$CONDITIONS必须在WHERE语句之后,供Sqoop进程运行命令过程中使用。

运行结果信息如下:

  1. 16/11/14 12:08:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
  2. 16/11/14 12:08:50 INFO mapreduce.Job: Running job: job_local127577466_0001
  3. 16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null
  4. 16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
  5. 16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
  6. 16/11/14 12:08:50 INFO mapred.LocalJobRunner: Waiting for map tasks
  7. 16/11/14 12:08:50 INFO mapred.LocalJobRunner: Starting task: attempt_local127577466_0001_m_000000_0
  8. 16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
  9. 16/11/14 12:08:50 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
  10. 16/11/14 12:08:50 INFO db.DBInputFormat: Using read commited transaction isolation
  11. 16/11/14 12:08:50 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
  12. 16/11/14 12:08:50 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
  13. 16/11/14 12:08:50 INFO db.DBRecordReader: Executing query: select * from order_info where ( 1=1 ) AND ( 1=1 )
  14. 16/11/14 12:08:50 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
  15. 16/11/14 12:08:50 INFO mapred.LocalJobRunner:
  16. 16/11/14 12:08:51 INFO mapred.Task: Task:attempt_local127577466_0001_m_000000_0 is done. And is in the process of committing
  17. 16/11/14 12:08:51 INFO mapred.LocalJobRunner:
  18. 16/11/14 12:08:51 INFO mapred.Task: Task attempt_local127577466_0001_m_000000_0 is allowed to commit now
  19. 16/11/14 12:08:51 INFO output.FileOutputCommitter: Saved output of task ‘attempt_local127577466_0001_m_000000_0‘ to hdfs://localhost:9000/user/xiaosi/data/order_info/_temporary/0/task_local127577466_0001_m_000000
  20. 16/11/14 12:08:51 INFO mapred.LocalJobRunner: map
  21. 16/11/14 12:08:51 INFO mapred.Task: Task ‘attempt_local127577466_0001_m_000000_0‘ done.
  22. 16/11/14 12:08:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local127577466_0001_m_000000_0
  23. 16/11/14 12:08:51 INFO mapred.LocalJobRunner: map task executor complete.
  24. 16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 running in uber mode : false
  25. 16/11/14 12:08:51 INFO mapreduce.Job:  map 100% reduce 0%
  26. 16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 completed successfully
  27. 16/11/14 12:08:51 INFO mapreduce.Job: Counters: 20
  28. File System Counters
  29. FILE: Number of bytes read=22247784
  30. FILE: Number of bytes written=22732836
  31. FILE: Number of read operations=0
  32. FILE: Number of large read operations=0
  33. FILE: Number of write operations=0
  34. HDFS: Number of bytes read=0
  35. HDFS: Number of bytes written=3710
  36. HDFS: Number of read operations=4
  37. HDFS: Number of large read operations=0
  38. HDFS: Number of write operations=3
  39. Map-Reduce Framework
  40. Map input records=111
  41. Map output records=111
  42. Input split bytes=87
  43. Spilled Records=0
  44. Failed Shuffles=0
  45. Merged Map outputs=0
  46. GC time elapsed (ms)=0
  47. Total committed heap usage (bytes)=245366784
  48. File Input Format Counters
  49. Bytes Read=0
  50. File Output Format Counters
  51. Bytes Written=3710
  52. 16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Transferred 3.623 KB in 2.5726 seconds (1.4083 KB/sec)
  53. 16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Retrieved 111 records.

我们可以查看HDFS由参数--target-dir指定的路径查看导入的数据:

  1. hadoop fs -text /user/xiaosi/data/order_info/* | less
  2. 358574046793404,2016-04-05,flight
  3. 358574046794733,2016-08-03,hotel
  4. 358574050631177,2016-05-08,vacation
  5. 358574050634213,2015-04-28,train
  6. 358574050634692,2016-04-05,tuan
  7. 358574050650524,2015-07-26,hotel
  8. 358574050654773,2015-01-23,flight
  9. 358574050668658,2015-01-23,hotel
  10. 358574050730771,2016-11-06,train
  11. 358574050731241,2016-05-08,car
  12. 358574050743865,2015-01-23,vacation
  13. 358574050767666,2015-04-28,train
  14. 358574050767971,2015-07-26,flight
  15. 358574050808288,2016-05-08,hotel
  16. 358574050816828,2015-01-23,hotel
  17. 358574050818220,2015-04-28,car
  18. 358574050821877,2013-08-03,flight

再看一个例子:

  1. sqoop import --connect jdbc:mysql://localhost:3306/test --table order_info --columns "business,id,order_time"  -m 1 --username root -P

HDFS上会在/user/xiaosi/目录下新增一个目录order_info,与关系性数据库的表名一致,内容如下:

  1. flight,358574046793404,2016-04-05
  2. hotel,358574046794733,2016-08-03
  3. vacation,358574050631177,2016-05-08
  4. train,358574050634213,2015-04-28
  5. tuan,358574050634692,2016-04-05

6. import-all-tables

将数据库里的所有表导入HDFS中,每个表在HDFS中对应一个独立的目录。该命令选项的参数如下图所示:

7. list-databases

该命令选项可以列出关系性数据库的所有数据库名,命令如下:

  1. sqoop list-databases --connect jdbc:mysql://localhost:3306 --username root -P

运行结果信息如下:

  1. 16/11/14 14:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/14 14:30:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. information_schema
  5. hive_db
  6. mysql
  7. performance_schema
  8. phpmyadmin
  9. test

8. list-tables

该命令选项可以列出关系性数据库的某一个数据库的所有表名,命令如下:

  1. sqoop list-tables --connect jdbc:mysql://localhost:3306/test --username root -P

运行结果信息如下:

  1. 16/11/14 14:32:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/14 14:32:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. PageView
  5. book
  6. bookID
  7. cc
  8. city_click
  9. country
  10. country2
  11. cup
  12. employee
  13. flightOrder
  14. hotel_book_info
  15. hotel_info
  16. order_info
  17. stu
  18. stu2
  19. stu3
  20. stuInfo
  21. student

9. merge

该命令选项的作用是将HDFS上的两份数据进行合并,在合并的同时进行数据去重。该命令选项的参数如下图所示:

例如,在HDFS的路径/user/xiaosi/old下由一份导入数据,如下:

  1. id name
  2. 1 a
  3. 2 b
  4. 3 c

在HDFS的路径/user/xiaosi/new下也有一份数据,但是在导入时间在第一份之后,如下:

  1. id name
  2. 1 a2
  3. 2 b
  4. 3 c

那么合并的结果为:

  1. id name
  2. 1 a2
  3. 2 b
  4. 3 c

运行如下命令:

  1. sqoop merge -new-data /user/xiaosi/new/part-m-00000 -onto /user/xiaosi/old/part-m-00000 -target-dir /user/xiaosi/final -jar-file /home/xiaosi/test/testmerge.jar -class-name testmerge -merge-key id

备注:

在一份数据集中,多行不应具有相同的主键,否则会发生数据丢失。

10. metastore

记录Sqoop作业的元数据信息,如果不启动Metastore实例,则默认的元数据存储目录为~/.sqoop。如果要更改存储目录,可以在配置文件sqoop-site.xml中进行更改。

启动Metastore实例:

  1. sqoop metastore

运行结果信息如下:

  1. 16/11/14 14:44:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. 16/11/14 14:44:40 WARN hsqldb.HsqldbMetaStore: The location for metastore data has not been explicitly set. Placing shared metastore files in /home/xiaosi/.sqoop/shared-metastore.db
  3. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) entered
  4. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) exited
  5. [[email protected]]: [Thread[main,5,main]]: setDatabasePath(0,file:/home/xiaosi/.sqoop/shared-metastore.db)
  6. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) entered
  7. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) exited
  8. [[email protected]]: [Thread[main,5,main]]: setDatabaseName(0,sqoop)
  9. [[email protected]]: [Thread[main,5,main]]: putPropertiesFromString(): [hsqldb.write_delay=false]
  10. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) entered
  11. [[email protected]]: [Thread[main,5,main]]: checkRunning(false) exited
  12. [[email protected]]: Initiating startup sequence...
  13. [[email protected]]: Server socket opened successfully in 3 ms.
  14. [[email protected]]: Database [index=0, id=0, db=file:/home/xiaosi/.sqoop/shared-metastore.db, alias=sqoop] opened sucessfully in 153 ms.
  15. [[email protected]]: Startup sequence completed in 157 ms.
  16. [[email protected]]: 2016-11-14 14:44:40.414 HSQLDB server 1.8.0 is online
  17. [[email protected]]: To close normally, connect and execute SHUTDOWN SQL
  18. [[email protected]]: From command line, use [Ctrl]+[C] to abort abruptly
  19. 16/11/14 14:44:40 INFO hsqldb.HsqldbMetaStore: Server started on port 16000 with protocol HSQL
11. job

该命令选项可以生产一个Sqoop的作业,但是不会立即执行,需要手动执行,该命令选项目的在于尽可能的服用Sqoop命令。该命令选项的参数如下图所示:

举例:

  1. sqoop job -create listTablesJob -- list-tables --connect jdbc:mysql://localhost:3306/test --username root -P

上面代码实现一个job,显示关系性数据库test数据库中所有的表。

  1. sqoop job -exec listTablesJob

上面代码执行我们已经定义好的Job,输出结果信息如下:

  1. 16/11/14 19:51:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  2. Enter password:
  3. 16/11/14 19:51:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  4. PageView
  5. book
  6. bookID
  7. cc
  8. city_click
  9. country
  10. country2
  11. cup
  12. employee
  13. flightOrder
  14. hotel_book_info
  15. hotel_info
  16. order_info
  17. stu
  18. stu2
  19. stu3
  20. stuInfo
  21. student

备注:

-- 和 list-tables(Job 所要执行的Sqoop命令) 不能挨着。

?

来自于《Hadoop海量数据处理  技术详解与项目实战》

时间: 2024-12-25 12:29:32

[Sqoop]Sqoop使用的相关文章

错误: 找不到或无法加载主类 org.apache.sqoop.Sqoop

环境 hadoop 2.5.2 sqoop 1.4.7 异常  解决方案: 千万不要信什么 一:sqoop目录下的sqoop-1.4.4.jar拷贝到hadoop的lib目录下解决 二:把sqoop的lib目录下的sqoop的(sqoop-1.4.5.jar或sqoop-1.4.6.jar)包放到hadoop的lib下,并且也放到sqoop的lib下(这个是我自己琢磨的,不知道用不同放) 三: 解决方法: 1.sqoop-1.4.x.tar.gz在1.4.5以上的安装包已经没有了这个jar包,在

[sqoop] sqoop 小试牛刀

sqoop 1.4.6  小试牛刀 sqoop import 参数 1. mysql导入 到hdfs中 ./sqoop import --connect jdbc:mysql://mysql:3306/part --username root --password 123456 --table big_hdfs -m 1 --target-dir /sqoop/import/mysql/ 2. mysql 导入 到 hive 指定库 中 ./sqoop import --hive-import

[sqoop]sqoop入门-下载、安装、运行和常用命令

一.简介 Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store.

[sqoop] sqoop命令

官网文档:http://sqoop.apache.org/docs/1.99.7/user/CommandLineClient.html#delete-link-function 一.了解sqoop数据导入的几个重要概念 (1)connector:sqoop2中预定了各种链接,这些链接是一些配置模板. #查看预定制的连接器 sqoop:000> show connector +------------------------+---------+-------------------------

[sqoop] sqoop

sqoop相关整理记录

生产背景: 在从mysql导入到hive中,遇到如下问题: 1) 源mysql和集群机器不在同一个网段中,导致执行导入命令,网络连接失败. 2) 某些字符导入到hive中,出现报错终止. 2.1  sqoop使用的JDBC-connector 版本太低(更换版本). 从hive导出到mysql中,遇到如下问题: 1)某些字符插入mysql,出现报错终止. 1.1 可能mysql本身编码的限制,某些字符不支持,比如uft8和utf8mb4 1.2  sqoop使用的JDBC-connector 版

解决Sqoop报错Could not load db driver class: com.intersys.jdbc.CacheDriver

报错栈: 17/06/16 17:46:01 INFO tool.CodeGenTool: Beginning code generation 17/06/16 17:46:01 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.intersys.jdbc.CacheDriver java.lang.RuntimeExcep

sqoop搭建

sqoop版本1.99.7#此搭建过程在最后启动job的时候失败了,每个版本的差异性蛮大的. 版本下载链接:http://pan.baidu.com/s/1pKYrusz 密码:7ib5 搭建sqoop之前,已经配置好了hadoop和java的环境 当第一次启动后,KILL掉HADOOP进程后出现的各种问题,重启机器解决问题. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 错误: 找不到或无法加载主类

Sqoop

1. 安装 tar -zxvf sqoop-1.4.6.tar.gz 参考:http://www.cnblogs.com/edisonchou/p/4440216.html http://www.cnblogs.com/wgp13x/p/5028220.html 2. 配置 vi conf/sqoop-env.sh #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/home/hadoop-2.5/ #Set