将PostgreSQL数据库的表导入到elasticsearch中

1.查看PostgreSQL表结构和数据信息

edbstore=# \d customers
                                          Table "edbstore.customers"
        Column        |         Type          |                           Modifiers
----------------------+-----------------------+----------------------------------------------------------------
 customerid           | integer               | not null default nextval(‘customers_customerid_seq‘::regclass)
 firstname            | character varying(50) | not null
 lastname             | character varying(50) | not null
 address1             | character varying(50) | not null
 address2             | character varying(50) |
 city                 | character varying(50) | not null
 state                | character varying(50) |
 zip                  | integer               |
 country              | character varying(50) | not null
 region               | smallint              | not null
 email                | character varying(50) |
 phone                | character varying(50) |
 creditcardtype       | integer               | not null
 creditcard           | character varying(50) | not null
 creditcardexpiration | character varying(50) | not null
 username             | character varying(50) | not null
 password             | character varying(50) | not null
 age                  | smallint              |
 income               | integer               |
 gender               | character varying(1)  |
Indexes:
    "customers_pkey" PRIMARY KEY, btree (customerid)
    "ix_cust_username" UNIQUE, btree (username)
Referenced by:
    TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
    TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL

edbstore=# select count(1) from customers;
 count
-------
 20000
(1 row)

2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

edbstore=# \t
Tuples only is on.
edbstore=# \o customer.json
edbstore=# select row_to_json(r) from customers as r;
edbstore=# \q

[[email protected] dba]$ ls -lh customer.json
-rw-r--r-- 1 postgres appuser 7.7M Dec  7 22:37 customer.json

$ head -1 customer.json
 {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

此时customer表虽然转储为json格式文件,但是并不能直接导入到elasticsearch,否则会报错如下

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
  },
  "status" : 400
}

根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明,我们的json数据里并未指定每行数据唯一的文档id值

3.为json格式的表数据添加id字段

因为之前我们看到该customer表共有2000行,所以我们需要生成对应的20000个id值,我们借助python实现,新建build_id.py文件,并写入如下内容,看清楚是20001,因为包头不包尾原则,1-20000实际打印出来是1-19999,所以我们写1-20001

for i in range(1,20001):
    print(‘{"index":{"_id":"%s"}}‘ %i ) 

为该文件添加可执行权限,然后执行即可

$ python build_id.py > build_id.txt

$ head -3 build_id.txt
{"index":{"_id":"1"}}
{"index":{"_id":"2"}}
{"index":{"_id":"3"}}

利用linux “paste"命令,将id文件和表文件合并

$ paste -d‘\n‘ build_id.txt customer.json > customer_new.json

$ head -4 customer_new.json
{"index":{"_id":"1"}}
 {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"[email protected]","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}
{"index":{"_id":"2"}}
 {"customerid":2,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":11802,"country":"US","region":1,"email":"[email protected]","phone":"5119315633","creditcardtype":1,"creditcard":"3144519586581737","creditcardexpiration":"2012/11","username":"user2","password":"password","age":80,"income":40000,"gender":"M"}

4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了,测试

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"
$ curl http://172.16.101.55:9200/_cat/indices?v
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer DvLoM7NjSYyjTwD5BSkK3A   1   1      20000            0       10mb           10mb

原文地址:https://www.cnblogs.com/ilifeilong/p/12003888.html

时间: 2024-08-29 06:05:44

将PostgreSQL数据库的表导入到elasticsearch中的相关文章

oracle 表导入到powerDesigner 中

最近不忙,之前一直是用powerDesigner看表结构,还没自己导入过,今天试试 oracle 表导入到powerDesigner 中步骤: 1.File--->reverse Enginner ---> datebase 弹出new physical data model 窗口,DBMS选择数据库版本. 2.弹出database reverse engineering options 窗口,选中usiong a data source  点击右侧选中数据源 3.下拉选择数据源,如果数据源已

Logstash:把MySQL数据导入到Elasticsearch中

前提条件 需要安装好Elasticsearch及Kibana. MySQL安装 根据不同的操作系统我们分别对MySQL进行安装.我们可以访问网页来对MySQL进行安装.等我们安装完我们的MySQL后,在我们的terminal中,打入如下的命令来检查MySQL的版本: $ /usr/local/mysql/bin/mysql -V /usr/local/mysql/bin/mysql Ver 8.0.17 for macos10.14 on x86_64 (MySQL Community Serv

pg_dumpall - 抽出一个 PostgreSQL 数据库集群到脚本文件中

SYNOPSIS pg_dumpall [ option...] DESCRIPTION 描述 pg_dumpall 是一个用于写出("转储")一个数据库集群里的所有 PostgreSQL 数据库到一个脚本文件的工具. 该脚本文件包含可以用于作为 psql(1) 的输入恢复数据库的SQL命令. 它通过对数据库集群里的每个数据库调用 pg_dump(1) 实现这个功能. pg_dumpall 还转储出所有数据库公用的全局对象. (pg_dump(1) 并不保存这些对象.) 这些信息目前包

如何将数据库中的表导入到PowerDesigner中

1.        打开PowerDesigner12,在菜单中按照如下方式进行操作file->Reverse Engineer->DataBase 点击后,弹出 New Physical Data Model 的对话框,如下图: 2.        在General选项卡中Model name:模板名字,自己命名.DMBMS :根据需要选择,我选择的是Microsoft SQL Server 2005点确定后弹出 Database Reverse EngineeringOption 对话框

使用navicat把一个数据库的表导入到另外一个数据库

第一步:右击数据库名,选择数据传输 第二步:全选要导的数据库表 第三步:选择目标中的数据库,然后开始就可以了

将excel表导入到mysql中

//导入excel表 方法一: 1)打开Excel另存为CSV文件 2)将文件编码转化为utf8,用NotePad++打开csv文件,选择格式-转为utf8编码格式-保存 2)在MySQL建表,字段的顺序要跟Excel保持一致 3)load data local infile '[你的csv文件路径]' into table [表名] fields terminated by ','; 例如: load data local infile 'E:\\1.csv' into table gift_

Sqoop1.4.4实现关系型数据库多表同时导入HDFS或Hive中

问题导读: 1.使用Sqoop哪个工具实现多表导入? 2.满足多表导入的三个条件是? 3.如何指定导入HDFS某个目录?如何指定导入Hive某个数据库? 一.介绍 有时候我们需要将关系型数据库中多个表一起导入到HDFS或者Hive中,这个时候可以使用Sqoop的另一个工具sqoop-import-all-tables.每个表数据被分别存储在以表名命名的HDFS上的不同目录中. 在使用多表导入之前,以下三个条件必须同时满足: 1.每个表必须都只有一个列作为主键: 2.必须将每个表中所有的数据导入,

MYSQL数据库建表注意事项

1.库名.表名.字段名必须使用小写字母,"_"分割. 原因: MySQL在Linux下数据库名.表名.列名.别名大小写规则是这样的: 1.数据库名与表名是严格区分大小写的: 2.表的别名是严格区分大小写的: 3.列名与列的别名在所有的情况下均是忽略大小写的: 4.变量名也是严格区分大小写的:MySQL在Windows下都不区分大小写. 所以在不同操作系统中为了能使程序和数据库都能正常运行,最好的办法是在设计的时候都转为小写,但是如果在设计的时候已经规范化大小写了,那么在Windows环

MySQL外键及级联删除 && 表的存储引擎与创建索引 && 删除数据库和表

Messages表: mysql>create table Messages( ->message_id int auto_increment primary key, ->user_name varchar(50) not null, ->author_id int not null, ->body text, ->forum_id int not null); Forums表: mysql>create table Forums( ->forum_id