Uploaded image for project: 'FUSE Mediation Router'
  1. FUSE Mediation Router
  2. MR-920

Camel hdfs2 component reads a normal file from HDFS twice when acting as a consumer

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.14.0.redhat-62-xx
    • 2.14.0.redhat-62-xx
    • None
    • None
    • Hide

      Steps to reproduce:

      1. Install Hadoop and configure an HDFS filesystem – a single local node is fine for this test, we don't need a cluster
      2. Download the test case camel_hadoop.zip and expand it to reveal a Maven project
      3. Built the Maven project and run it with mvn exec:ava
      4. Use Hadoop utilities to copy a large (more than chunk size) file into the HDFS filesystem at root level, with name test.txt
      5. Look at the output file temp/test.txt – it will be twice the size of the original file

      Note that it will be necessary to delete the file temp/test.txt between tests, else it will get appended to, which will confuse the results. The hdfs2 component renames the file on HDFS so that it ends in .read. To repeat the test it will be necessary to delete this .read file from HDFS, otherwise any later one will not be consumed.

      Show
      Steps to reproduce: 1. Install Hadoop and configure an HDFS filesystem – a single local node is fine for this test, we don't need a cluster 2. Download the test case camel_hadoop.zip and expand it to reveal a Maven project 3. Built the Maven project and run it with mvn exec:ava 4. Use Hadoop utilities to copy a large (more than chunk size) file into the HDFS filesystem at root level, with name test.txt 5. Look at the output file temp/test.txt – it will be twice the size of the original file Note that it will be necessary to delete the file temp/test.txt between tests, else it will get appended to, which will confuse the results. The hdfs2 component renames the file on HDFS so that it ends in .read. To repeat the test it will be necessary to delete this .read file from HDFS, otherwise any later one will not be consumed.

      The Camel hdfs2: component reads a normal file twice when used as a consumer endpoint in a Camel route. For example, consider this Camel route:

      from("hdfs2://127.0.0.1:9000/test.txt")
      .to("log:XXX?showAll=true&multiline=true")
      .to("file://temp?fileName=test.txt&fileExist=Append");

      This should copy the file test.txt from HDFS. However, the copied file ends up exactly twice the expected length, with the content duplicated. The log output shows that twice as many messages are read as would be expected from the file size and the chunk size (the hdfs2: component delivers one message per chunk).

            pantinor@redhat.com Paolo Antinori
            rhn-support-kboone Kevin Boone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: