solr 3.4解析xlsx文件报错的解决办法
在使用solr3.4进行搜索测试的时候,解析Excel 2007报异常:
?
2012-3-20 10:06:02 org.apache.solr.common.SolrException log严重: org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@4f14b0 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:291) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619)Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeExceptionfrom org.apache.tika.parser.microsoft.ooxml.OOXMLParser@4f14b0 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:213) ... 18 moreCaused by: java.lang.NullPointerException at org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:183) at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:536) at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:516) at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:106) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:88) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:83) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) ... 21 more2012-3-20 10:06:02 org.apache.solr.core.SolrCore execute
?
这个异常是Tika项目早期的一个bug,现在已经解决了,具体地址:https://issues.apache.org/jira/browse/TIKA-348
?
解决办法1:如果项目允许,将solr升级到最新的3.5即可解决
?
解决办法2:只更新solr3.4里面跟文件解析相关的库即可。可以从solr3.5里面抽取出tika0.10和poi3.8的库,然后替换掉solr3.4里面对应的库即可。
?