ElasticSearch导入txt文本或者json文本

前段时间做的东西,闲下来做一下整理记录。



业务:将数据从本地恢复到ES上,本地文件较大,解压后数据量在10个G左右的数据。



逻辑处理:针对业务需求,共尝试过三次实践。

  一、使用bulk:ES本地支持的批量导入方式,推荐文本大小在10-15M左右,文件的上限应该是不能超过200M(不确定)。

  二、使用logstash:ES官方的另一个产品,将数据文本转换为ES的数据源。

  三、使用Java:springData-ES的java方式。第三种方式使用线程池+缓存队列+springData对Es的封装逻辑,晚点另更



一、使用bulk(win7+es6.6.1+json文本)

1.准备正确的json数据格式

es对于json文本的格式要求是很严格的,合理的json数据格式如下:

{"index":"demo","id":0}
{"id":null,"dev_id":"1","rcv_time":1557303257,"date":null,"dname":null,"logtype":"1","pri":null,"mod":"pf","sa":null,"sport":null,"ttype":null,"da":null,"dport":null,"code":null,"proto":null,"policy":null,"duration":"0","rcvd":null,"sent":null,"fwlog":null,"dsp_msg":"包过滤日志","failmsg":null,"custom":null,"smac":null,"dmac":null,"type":null,"in_traffic":"52","out_traffic":"52","gen_time":"1557303257","src_ip":"710191296","dest_ip":"896426877","src_port":"51411","dest_port":"443","protocol_id":"1","action_id":"2","filter_policy_id":"0","sat_ip":"0","sat_port":"0","i_ip":"0","i_port":"0","insert_time":"0","p_ip":"0","p_port":"0","rulename_id":"3","min_id":"25955054","svm":null,"dvm":null,"repeat_num":null,"event_type_id":216001001,"event_level_id":1,"org_log":"devid=2 date=\"2019/05/08 16:14:17\" dname=venus logtype=1 pri=5 ver=0.3.0 rule_name=网关产品线 mod=pf sa=192.168.84.42 sport=51411 type=NULL da=125.99.110.53 dport=443 code=NULL proto=IPPROTO_TCP policy=允许 duration=0 rcvd=52 sent=52 fwlog=0 dsp_msg=\"包过滤日志\"","stauts":"success","failMsg":null}
{"index":"demo","id":1}
{"id":null,"dev_id":"1","rcv_time":1557303257,"date":null,"dname":null,"logtype":"1","pri":null,"mod":"pf","sa":null,"sport":null,"ttype":null,"da":null,"dport":null,"code":null,"proto":null,"policy":null,"duration":"0","rcvd":null,"sent":null,"fwlog":null,"dsp_msg":"包过滤日志","failmsg":null,"custom":null,"smac":null,"dmac":null,"type":null,"in_traffic":"52","out_traffic":"52","gen_time":"1557303257","src_ip":"710191296","dest_ip":"896426877","src_port":"51411","dest_port":"443","protocol_id":"1","action_id":"2","filter_policy_id":"0","sat_ip":"0","sat_port":"0","i_ip":"0","i_port":"0","insert_time":"0","p_ip":"0","p_port":"0","rulename_id":"3","min_id":"25955054","svm":null,"dvm":null,"repeat_num":null,"event_type_id":216001001,"event_level_id":1,"org_log":"devid=2 date=\"2019/05/08 16:14:17\" dname=venus logtype=1 pri=5 ver=0.3.0 rule_name=网关产品线 mod=pf sa=192.168.84.42 sport=51411 type=NULL da=125.99.110.53 dport=443 code=NULL proto=IPPROTO_TCP policy=允许 duration=0 rcvd=52 sent=52 fwlog=0 dsp_msg=\"包过滤日志\"","stauts":"success","failMsg":null}
{"index":"demo","id":2}
{"id":null,"dev_id":"1","rcv_time":1557303257,"date":null,"dname":null,"logtype":"1","pri":null,"mod":"pf","sa":null,"sport":null,"ttype":null,"da":null,"dport":null,"code":null,"proto":null,"policy":null,"duration":"0","rcvd":null,"sent":null,"fwlog":null,"dsp_msg":"包过滤日志","failmsg":null,"custom":null,"smac":null,"dmac":null,"type":null,"in_traffic":"52","out_traffic":"52","gen_time":"1557303257","src_ip":"710191296","dest_ip":"896426877","src_port":"51411","dest_port":"443","protocol_id":"1","action_id":"2","filter_policy_id":"0","sat_ip":"0","sat_port":"0","i_ip":"0","i_port":"0","insert_time":"0","p_ip":"0","p_port":"0","rulename_id":"3","min_id":"25955054","svm":null,"dvm":null,"repeat_num":null,"event_type_id":216001001,"event_level_id":1,"org_log":"devid=2 date=\"2019/05/08 16:14:17\" dname=venus logtype=1 pri=5 ver=0.3.0 rule_name=网关产品线 mod=pf sa=192.168.84.42 sport=51411 type=NULL da=125.99.110.53 dport=443 code=NULL proto=IPPROTO_TCP policy=允许 duration=0 rcvd=52 sent=52 fwlog=0 dsp_msg=\"包过滤日志\"","stauts":"success","failMsg":null}

官方所要求标准的json格式就是如上

2.cmd运行(如果使用curl异常可百度下载curl插件)

curl -H "Content-Type:appliaction/json"  -XPOST localhost:9200/index/mapping/_bulk --data-binary @xxx.json

需注意:cmd突突突的滚动起来就是成功了!



二、使用logstash

1.安装logstash(官网下载即可)

2.进入logstash中bin目录下,创建logstash_def.conf文件(提供启动logstash启动时加载的配置文件)

3.文件如下:

input{
	file{
		path => "D:/log/packet.json"
		type => "log"

		start_position => "beginning"
		codec => json{
		charset => "UTF-8"
		}
	}
}

output{
	elasticsearch{
		hosts => "http://127.0.0.1:9200"
		index => "venus"
		document_type => "log_packet"
	}
}

4.cmd进入logstash下bin目录(ES已经启动的前提)

命令:logstash -f logstash_def.conf

需注意:不成功会抛错,不然会一直在加载,查看状态可以使用head插件查看数据增加情况

原文地址:https://www.cnblogs.com/ttzsqwq/p/11077574.html

时间: 2024-12-09 16:28:54

ElasticSearch导入txt文本或者json文本的相关文章

21SpringMvc_异步发送表单数据到Bean,并响应JSON文本返回(这篇可能是最重要的一篇了)

这篇文章实现三个功能:1.在jsp页面点击一个按钮,然后跳转到Action,在Action中把Emp(int id ,String salary,Data data)这个实体变成JSON格式返回到页面上. 2.在jsp页面点击第二个按钮,然后跳转到Action,在Action中把List<Emp>这个集合变成JSON格式返回到页面上. 3.在jsp页面点击第三个按钮,然后跳转到Action, List<Emp> empList = new ArrayList<Emp>(

JSON文本转换为JSONArray 转换为 List&lt;Object&gt;

1 package com.beijxing.TestMain; 2 3 import java.io.File; 4 import java.io.IOException; 5 import java.util.ArrayList; 6 import java.util.List; 7 8 import org.apache.commons.io.FileUtils; 9 10 import com.beijxing.entity.Student; 11 12 import net.sf.js

json文本装换为JSONArray

1 package com.beijxing.TestMain; 2 3 import java.io.File; 4 import java.io.IOException; 5 6 import org.apache.commons.io.FileUtils; 7 8 import net.sf.json.JSONArray; 9 import net.sf.json.JSONObject; 10 11 /** 12 * JSON文本转换为JSONArray 13 * @author 作者 :

Android技能-创建Json文本及Json解析

摘要:Json数据在安卓开发过程中是非常常见的,在请求服务器端数据的时候,服务器端返回的无非就是三种类型:HTML,XML,JSON.所以学习JSON对安卓程序员来说是非常重要的. 什么是JSON JSON:JavaScript Object Notation.顾名思义,JSON数据是源自于JavaScript,学习过JavaScript(以下简称JS)的人都知道,我们在JS脚本里面创建对象时,都是以键值对的形式编写的.例如,我们在JS里面创建一个Json对象时是这么定义的: var perso

关于json文本数据的一些使用方法

1.对象的存取 如果是对象的存取,可能需要序列化和反序列化对象的属性. NSDictionary params = @{@"hello":@"world"}; NSArray arr = @[@"1",@"2",@"3"]; [arr addObject:params]; NSData *data = [NSKeyedArchiver archivedDataWithRootObject:arr]; NSA

JSON之三:获取JSON文本并解释(以google的天气API为例)

google提供了天气的api,以广州天气为例,地址为: http://api.openweathermap.org/data/2.5/weather?q=guangzhou 返回的结果为: { "coord": { "lon": 113.25, "lat": 23.12 }, "sys": { "message": 0.2088, "country": "CN",

异步发送表单数据到JavaBean,并响应JSON文本返回

1)  提交表单后,将JavaBean信息以JSON文本形式返回到浏览器 <form> 编号:<input type="text" name="id" value="1"/><br/> 姓名:<input type="text" name="name" value="哈哈"/><br/> 薪水:<input type=&q

java界面编程(2) ------ 按钮,文本输入框和文本区域

本文是自己学习所做笔记,欢迎转载,但请注明出处:http://blog.csdn.net/jesson20121020 上节创建了视窗,这是其他组件的容器,这节就来创建按钮. 创建按钮,只需要在希望出现的地方调用JButton的构造器即可. JButton是一个组件,它有自己的小窗口,能作为整个更新过程的一部分而自动被重绘.也就是说,你不必显示绘制一个按钮或者别的类型的控件,只要将其放在窗体上,它们可以自动绘制自己.采用上节的例子,在其基础上修改如下: public class SwingTes

使用C#将HTML文本转换为普通文本,去掉所有的Html标记(转)

using System; using System.Collections.Generic; using System.Linq; using System.Text; //首先需要导入命名空间 using System.Text.RegularExpressions; namespace WindowsFormsApplication1 { public class Class1 { /// <summary> /// 将html文本转化为 文本内容方法NoHTML /// </su