在使用一些开源框架的时候(比如spark、hadoop、lucene等),偶尔会见到说找不到某个具体实现类或者某个配置(比如spark的akka配置)不见了。
部分例子如下:
【Lucene】An SPI class of type org.apache.lucene.codecs.PostingsFormat with name ‘Lucene50‘ does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter]
【Hadoop】An exception occured while performing the indexing job : java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
小部分情况下,可能确实是配置问题或依赖问题没解决好,所以第一步是检查依赖或配置,比如上面的mapreduce错误,首先检查yarn-site.xml的相应framework配置是不是为yarn。如果配置或依赖确认已经打包进去了,那么本质上其实是打包时同名文件被覆盖的原因,尤其是maven assembly/shade常会遇见此类问题。
现象:其实配置是对的存在的,但是被覆盖了;其实spi类是指定了的,但是manifest被覆盖了。
解决办法:先从报错的源码或配置等去定位,查看打包后jar里实际的配置或manifest是什么,根据实际情况使用shade插件打包并添加transform。
一个我常用的shade插件的示例
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> </transformers> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>jar-with-dependencies</shadedClassifierName> </configuration> </execution> </executions> </plugin>
其中Appending那里的reference.conf是专门解决spark reference.conf文件同名导致akka配置被覆盖的,另外两个Manifest和Service是解决es、lucene的manifest配置备覆盖的。其实这一类transform的解决文件冲突思路是把多个同名的根据一定规则拼成一个,比如manifest文件只拼接内容、Appending则是直接追加等。