1.使用WaterDrop 从kafka中消费数据,写入到ClickHouse
1.1 环境
SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179
clickhouse-1.1.54236-4.el7.x86_64
waterdrop-1.4.2
waterdrop 配置文件
spark {
spark.streaming.batchDuration = 5
spark.app.name = "Waterdrop"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
}
input {
kafkaStream {
topics = "waterdrop"
consumer.bootstrap.servers = "cdh01:9092,cdh02:9092,cdh03:9092"
consumer.group.id = "waterdrop_group"
}
}
filter {
split {
fields = ["FlightDate", "Year"]
delimiter = ","
}
}
output {
clickhouse {
host = "cdh03:8123"
clickhouse.socket_timeout = 50000
database = "ontime"
table = "ontime_test"
fields = ["FlightDate", "Year"]
username = "default"
password = "default"
bulk_size = 20000
}
}
启动waterdrop 报错:
Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1023)
at ru.yandex.clickhouse.util.ClickHouseHttpClientBuilder.buildClient(ClickHouseHttpClientBuilder.java:59)
at ru.yandex.clickhouse.
原因:
- clickhouse 依赖于 clickhouse-jdbc2.0,clickhouse-jdbc2.0 又依赖于 httpclient 4.5.2
参考地址:https://www.mvnjar.com/ru.yandex.clickhouse/clickhouse-jdbc/0.2/detail.html - SPARK2-2.3.0.cloudera4-1.cdh5.13.3 没有 httpclient 4.5.2,只有 httpcore
解决方法:
- 下载httpclient 4.5.2
- httpclient 放入 Spark 的安装目录;如:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/jars
原文地址:https://www.cnblogs.com/wuning/p/12121158.html
时间: 2024-10-16 05:24:40