mllib:Exception in thread "main" org.apache.spark.SparkException: Input validation failed.

当我们使用mllib做分类,用到逻辑回归或线性支持向量机做分类时,可能会出现下面的错误:

15/04/09 21:27:25 ERROR DataValidators: Classification labels should be 0 or 1. Found 3000000 invalid labels
Exception in thread "main" org.apache.spark.SparkException: Input validation failed.

由于做调试时,debug输出的信息很多,我们常常忽略了上边的ERROR,而特别关注Input validation failed。

寻找源码,先关校验数据代码如下:

// Check the data properties before running the optimizer
    if (validateData && !validators.forall(func => func(input))) {
      throw new SparkException("Input validation failed.")
    }

源码仅此而已,并未能得到解决问题的办法。然后,后来才发现错误信息还有上边儿的error。

错误信息的意思是分类标签应该是0或者1,而不能是其他值。当时我的类别标签中包含了2,正好是3000000条信息;于是将类别标签替换成0或1,编译通过。

这也证明了为什么说线性支持向量机适合做二分类的数据。当然,修改算法它也可以支持三种类别,网上有大量相关实现。

时间: 2024-10-10 22:13:07

mllib:Exception in thread "main" org.apache.spark.SparkException: Input validation failed.的相关文章

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=Mypc, access=WRITE, inode="/":fan:supergroup:drwxr-xr-x

在window上编程提示没有写Hadoop的权限 Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=Mypc, access=WRITE, inode="/":fan:supergroup:drwxr-xr-x 曾经踩过的坑: 保存结果到hdfs上没有写的权限* 通过修改权限将文件写入到指定的目录下* * $HAD

MyBatis笔记----报错:Exception in thread "main" org.apache.ibatis.binding.BindingException: Invalid bound statement (not found)解决方法

报错 Exception in thread "main" org.apache.ibatis.binding.BindingException: Invalid bound statement (not found): com.ij34.model.UserMapper.selectarticle at org.apache.ibatis.binding.MapperMethod$SqlCommand.<init>(MapperMethod.java:230) at or

MyBatis笔记----报错Exception in thread &quot;main&quot; org.apache.ibatis.binding.BindingException: Invalid bound statement (not found): com.ij34.model.UserMapper.selectUser

信息: Refreshing org[email protected]41cf53f9: startup date [Wed Apr 05 16:48:12 CST 2017]; root of context hierarchy 四月 05, 2017 4:48:12 下午 org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions 信息: Loading XML bean definiti

spark 插入数据到mysql时遇到的问题 org.apache.spark.SparkException: Task not serializable

报错问题:Exception in thread "main" org.apache.spark.SparkException: Task not serializableCaused by: java.io.NotSerializableException: org.apache.commons.dbcp2.PoolingDataSource$PoolGuardConnectionWrapper 出错的代码: def saveMonthToMysql(everymonth_avg:R

spark程序异常:Exception in thread &quot;main&quot; java.io.IOException: No FileSystem for scheme: hdfs

命令: java -jar myspark-1.0-SNAPSHOT.jar myspark-1.0-SNAPSHOT.jar hdfs://single:9000/input/word.txt hdfs://single:9000/output/out1 错误信息: .......... 14/11/23 06:14:18 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141123061418-0011/0 on hos

spark运行java-jar:Exception in thread &quot;main&quot; java.io.IOException: No FileSystem for scheme: hdfs

今天碰到的一个 spark问题,困扰好久才解决 首先我的spark集群部署使用的部署包是官方提供的 spark-1.0.2-bin-hadoop2.tgz 部署在hadoop集群上. 在运行java jar包的时候使用命令 java -jar chinahadoop-1.0-SNAPSHOT.jar  chinahadoop-1.0-SNAPSHOT.jar  hdfs://node1:8020/user/ning/data.txt /user/ning/output 出现了如下错误 14/08

Exception in thread &quot;main&quot; java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

MyEclipse运行的时候报错,菜鸟不理解是什么意思,最后找了一些资料才知道是因为缺少commons-logging.jar包 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.commons.httpclient.HttpClient.<clinit>(HttpClient.java:66) at c

MyEclipse8.5集成Tomcat7时的启动错误:Exception in thread “main” java.lang.NoClassDefFoundError org/apache/commons/logging/LogFactory

今天,安装Tomcat7.0.21后,单独用D:\apache-tomcat-7.0.21\bin\startup.bat启动web服务正常.但在MyEclipse8.5中集成配置Tomcat7后,在MyEclipse启动Tomcat服务则出现如下错误提示: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/juli/logging/LogFactoryat org.apache.catalina.star

Exception in thread &quot;main&quot; java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps Hadoop2.6.0编程问题与解决

从hadoop 1.2.1升级到 Hadoop2.6.0,调试写代码,还是遇到一些问题的.这里记录一下,后续如果自己再遇到类似问题,那也好找原因了. 在eclipse里编译运行 WordCount,出现以下错误. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps at java.lang.ClassLoader.defineClass1(Native M