SparkStreaming整合Flume的pull方式之启动报错解决方案

Flume配置文件:

simple-agent.sources = netcat-source
simple-agent.sinks = spark-sink
simple-agent.channels = memory-channel

#Describe/configure the source
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = centos
simple-agent.sources.netcat-source.port= 44444

# Describe the sink
simple-agent.sinks.spark-sink.type=org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.spark-sink.hostname= centos
simple-agent.sinks.spark-sink.port= 41414

simple-agent.channels.memory-channel.type = memory
simple-agent.channels.memory-channel.capacity = 1000
simple-agent.channels.memory-channel.transactionCapacity = 100

simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.spark-sink.channel = memory-channel

但是在启动Flume时,报以下错误:

2019-10-16 11:35:14,559 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:142)] Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load sink type: org.apache.spark.streaming.flume.sink.SparkSink, class: org.apache.spark.streaming.flume.sink.SparkSink
    at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:71)
    at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:43)
    at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:410)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.flume.sink.SparkSink
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:69)
    ... 11 more

解决方案:

由于用到了agent的sink是 org.apache.spark.streaming.flume.sink.SparkSink类型,需要把spark-streaming-flume-sink_2.11-2.4.3.jar复制到flume的lib目录,否则,会报找不到org.apache.spark.streaming.flume.sink.SparkSink类的错误。

欢迎关注我的公号:彪悍大蓝猫,持续分享大数据、Java、安全干货~

原文地址:https://www.cnblogs.com/skywp/p/11684598.html

时间: 2024-08-29 21:27:09

SparkStreaming整合Flume的pull方式之启动报错解决方案的相关文章

Mysql启动报错解决方案:Failed to open log (file './mysql-bin.000901', errno 2)

ps -ef|grep mysql 发现里边没有mysql进程,于是进行重启. service mysqld start报错 查看错误日志 tail -100 /usr/local/mysql/var/iZ11yohng9aZ.err(主机名.err) 180223 15:31:51 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/var 180223 15:31:51 InnoDB: The Inn

MAMP 10.10下启动报错解决方案

? cd /Applications/MAMP/Library/binmv envvars _envvars

SparkStreaming整合flume

SparkStreaming整合flume 在实际开发中push会丢数据,因为push是由flume将数据发给程序,程序出错,丢失数据.所以不会使用不做讲解,这里讲解poll,拉去flume的数据,保证数据不丢失. 1.首先你得有flume 比如你有:[如果没有请走这篇:搭建flume集群(待定)] 这里使用的flume的版本是apache1.6 cdh公司集成 这里需要下载 (1).我这里是将spark-streaming-flume-sink_2.11-2.0.2.jar放入到flume的l

hive启动报错(整合spark)

spark整合hive后,hive启动报错: ls: cannot access /export/servers/spark/lib/spark-assembly-*.jar: No such file or directory 原因:spark版本升级到2.x以后,原有lib目录下的大JAR包被分散成多个小JAR包,原来的spark-assembly-*.jar已经不存在,所以hive没有办法找到这个JAR包. 解决方法:打开hive下面的bin目录,找到hive文件,编辑hive文件,找到如

tomcat启动报错:java.net.BindException: Permission denied <null>:80

1,启动报错显示 [org.springframework.web.servlet.DispatcherServlet]FrameworkServlet 'springMvc': initialization completed in 382 ms Jun 01, 2015 6:39:06 PM org.apache.coyote.http11.Http11Protocol start SEVERE: Error starting endpoint java.net.BindException:

VMware启动报错:The VMware Authorization Service is not running

The VMware Authorization Service is not running 今天在使用虚拟机的时候,竟然发现不能启动,而且伴有启动虚拟系统报错: The VMware Authorization Service is not running. 截图如下: 故障排除思路: 之前为系统做最大优化时将相关无用启动项及相关服务启动方式做了相应修改: 导致VMware Authorization Service 未启动 系统本地VMware相关服务截图: 解决故障步骤: 1:打开CMD

CentOS6.x中vmware workstation 虚拟机启动报错:Could not open /dev/vmmon

最初安装报错,但是界面可以打开,可以正常安装,但是安装过后,启动报错 Gtk-Message: Failed to load module "canberra-gtk-module": libcanberra-gtk-module.so:cannot open shared object file: No such file or directory Gtk-Message: Failed to load module "pk-gtk-module": libpk-

解决window7 x64位Anaconda启动报错:AttributeError: '_NamespacePath' object has no attribute 'sort'

最近论文需要用到python做数据分析,python语法简单,但是Windows下安装第三方包恶心的要命,statsmodels用pip死活安装不上,网上查了说包相互依赖windows下的pip不能下载全,还有好几个其他的统计包也是如此,整晕了算. 看网上有些python大牛推荐Anaconda,可以解决包的问题,于是卸载本地的python,从官网上下了个Anconda玩玩,结果遇到新问题. 问题如下: An unexpected error has occurred. Please consi

spring boot + jersey工程由jar包转为war包在tomcat中启动报错问题

第一步: 在maven下,将Spring Boot工程由jar转换为war包启动,很简单,将pom.xml文件中的packaging改为war <packaging>war</packaging> 如果你使用Gradle,你需要修改build.gradle来将war插件应用到项目上: apply plugin: 'war'第二步: 产生一个可部署war包的第一步是提供一个SpringBootServletInitializer子类,并覆盖它的configure方法.这充分利用了Sp