Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1810

ClassCastException when using tika text extractor

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 3.1.2.Final, 3.2.0.Final
    • 3.1.0.Final
    • None
    • None

    Description

      I am getting the following exception when using the tika text extractor to extract contents an excel document.

      Exception in thread "modeshape-text-extractor-7-thread-1" java.lang.ExceptionInInitializerError
      at org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.<clinit>(PackagePropertiesUnmarshaller.java:49)
      at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
      at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
      at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
      at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
      at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
      at org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194)
      at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134)
      at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77)
      at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
      at org.modeshape.jcr.mimetype.TikaMimeTypeDetector.mimeTypeOf(TikaMimeTypeDetector.java:126)
      at org.modeshape.jcr.mimetype.MimeTypeDetectors.mimeTypeOf(MimeTypeDetectors.java:74)
      at org.modeshape.jcr.value.binary.AbstractBinaryStore.getMimeType(AbstractBinaryStore.java:161)
      at org.modeshape.jcr.value.binary.StoredBinaryValue.getMimeType(StoredBinaryValue.java:69)
      at org.modeshape.jcr.TextExtractors$Worker.run(TextExtractors.java:175)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:722)
      Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.DocumentFactory
      at org.dom4j.DocumentFactory.getInstance(DocumentFactory.java:97)
      at org.dom4j.tree.AbstractNode.<clinit>(AbstractNode.java:39)

      Steps to reproduce
      1. Try to read/parse an excel spread sheet
      2. While the read/parse is in progress, try to save another excel spread sheet as attachment into JCR repository.

      Attachments

        Activity

          People

            hchiorean Horia Chiorean (Inactive)
            satyakishor.m Satya M (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: