Teiid
  1. Teiid
  2. TEIID-2450

Like criteria with 'Z' before wildcards throw an exception

    Details

    • Type: Bug Bug
    • Status: Closed Closed (View Workflow)
    • Priority: Major Major
    • Resolution: Done
    • Affects Version/s: 8.2
    • Fix Version/s: 8.4
    • Component/s: Query Engine
    • Labels:
      None
    • Similar Issues:
      Show 10 results 

      Description

      If query contains letter 'Z' before wildcard (for example, query contains following part: MY_COLUMN LIKE 'BAZ_BAR') then exception is thrown:

      Caused by: java.lang.IllegalArgumentException: fromKey > toKey
      at java.util.TreeMap$NavigableSubMap.<init>(TreeMap.java:1240)
      ...
      at org.teiid.dqp.internal.process.RecordTable.processQuery(RecordTable.java:189)
      ...

      The reason of the problem is following.
      BaseIndexInfo.processCriteria() in order to build "less then" conditon from LIKE increments the code of last character in condition prefix:

      this.addCondition(i, new Constant(prefix.substring(0, prefix.length() -1) + (char) (prefix.charAt(prefix.length()-1)+1)), CompareCriteria.LE);

      If prefix happens to end in uppercase letter 'Z' then after increment it becomes '['.
      But TempMetadataStore uses String.CaseInsensitiveComparator:
      public TempMetadataStore()

      { this(new TreeMap<String, TempMetadataID>(String.CASE_INSENSITIVE_ORDER)); }

      And String API documentation clearly says that comparison is performed against lowercase versions. Therefore 'Z' is converted to 'z'.
      And later when trying to get subtree with lower bound = 'Z' and upper bound = '[' and exception is thrown because in fact 'z' is lesser then '['

      Other than this exception it also has another problem. If we do query MY_COLUMN LIKE 'BA@_BAR' then upper condition would be 'A' (next character after '@').
      But due to lowercase comparison it will be converted to 'a', so values like 'BAB_BAR', 'BAC_BAR', ... 'BA^_BAR' all would be returned because 'B', 'C', ..., '^' are between '@' and 'a'.

      I suggest either using case sensitive map (with prior conversion of all values to upper case) or converting values in BaseIndexInfo.processCriteria() into lower case.

      I also worry about CompareCriteria.LE in BaseIndexInfo.processCriteria(). Shouldn't it be strictly less? with less or equal when querying 'AAA_' you will get 'AAB_' values as well.

        Activity

        Hide
        Steven Hawkins
        added a comment -

        Yes there is a regression here with TEIID-2181. However only the exception with the Z case is an issue.

        > Other than this exception it also has another problem. If we do query MY_COLUMN LIKE 'BA@_BAR' then upper condition would be 'A' (next character after '@').
        But due to lowercase comparison it will be converted to 'a', so values like 'BAB_BAR', 'BAC_BAR', ... 'BA^_BAR' all would be returned because 'B', 'C', ..., '^' are between '@' and 'a'.

        That will not happen for two reasons. The first is just as you have stated above that the map comparison will be done using effectively lower case, thus upper B, C, and '^' are not between @ and A. The next is that the predicate will be reevaluated (RecordTable.isValid) to ensure only case sensitive results apply (for example if Ba@ABAR exists it will match the partial index condition, but will not be returned).

        I have already confirmed this locally, but if you are not speaking hypothetically here, can you provide a test case that shows bad behavior?

        > I also worry about CompareCriteria.LE in BaseIndexInfo.processCriteria(). Shouldn't it be strictly less? with less or equal when querying 'AAA_' you will get 'AAB_' values as well.

        The same reasoning as above applies here as well.

        Show
        Steven Hawkins
        added a comment - Yes there is a regression here with TEIID-2181 . However only the exception with the Z case is an issue. > Other than this exception it also has another problem. If we do query MY_COLUMN LIKE 'BA@_BAR' then upper condition would be 'A' (next character after '@'). But due to lowercase comparison it will be converted to 'a', so values like 'BAB_BAR', 'BAC_BAR', ... 'BA^_BAR' all would be returned because 'B', 'C', ..., '^' are between '@' and 'a'. That will not happen for two reasons. The first is just as you have stated above that the map comparison will be done using effectively lower case, thus upper B, C, and '^' are not between @ and A. The next is that the predicate will be reevaluated (RecordTable.isValid) to ensure only case sensitive results apply (for example if Ba@ABAR exists it will match the partial index condition, but will not be returned). I have already confirmed this locally, but if you are not speaking hypothetically here, can you provide a test case that shows bad behavior? > I also worry about CompareCriteria.LE in BaseIndexInfo.processCriteria(). Shouldn't it be strictly less? with less or equal when querying 'AAA_' you will get 'AAB_' values as well. The same reasoning as above applies here as well.
        Hide
        Steven Hawkins
        added a comment -

        updated the baseindexinfo to check for case insensitive ordering and generally catching illegalargumentexceptions from obtaining submaps to return empty results rather than an error.

        Show
        Steven Hawkins
        added a comment - updated the baseindexinfo to check for case insensitive ordering and generally catching illegalargumentexceptions from obtaining submaps to return empty results rather than an error.
        Hide
        Paul Lysak
        added a comment -

        In practice I've only faced the problem with 'Z'. Other issues are merely hypotetical - what I've spotted from the code.
        If there is some post-query check that eliminates items that don't satisfy LIKE condition then it should be OK.
        But there should be some reason for creating submap. I guess it's speed, am I right?
        So it would be great to have that submap work right if it isn't a complex task.
        If you look at ASCII table you'll see that characters follow in such order:
        ... @ A B ... Y Z [ \ ] ^ _ ` a b ... y z

        { | }

        ~
        I agree that B, C, ... Z will not get into submap with lowercase comparison, that was my mistake.
        But characters [ \ ] ^ _ ` still will be there.

        Thanks for your effort!

        Show
        Paul Lysak
        added a comment - In practice I've only faced the problem with 'Z'. Other issues are merely hypotetical - what I've spotted from the code. If there is some post-query check that eliminates items that don't satisfy LIKE condition then it should be OK. But there should be some reason for creating submap. I guess it's speed, am I right? So it would be great to have that submap work right if it isn't a complex task. If you look at ASCII table you'll see that characters follow in such order: ... @ A B ... Y Z [ \ ] ^ _ ` a b ... y z { | } ~ I agree that B, C, ... Z will not get into submap with lowercase comparison, that was my mistake. But characters [ \ ] ^ _ ` still will be there. Thanks for your effort!
        Hide
        Steven Hawkins
        added a comment -

        > But there should be some reason for creating submap. I guess it's speed, am I right?

        Yes the whole point is to avoid full scans when possible regardless of whether we're targeting a real table or a memory backed construct like the system tables.

        > So it would be great to have that submap work right if it isn't a complex task.

        It is working as intended.

        > If you look at ASCII table you'll see that characters follow in such order:

        Yes, I am aware of the character ordering. Here again it does not matter if if ^ _ etc. are initially included in the submap. What matters is that the partial prefix matching reduces the amount of the table that we scan.

        Show
        Steven Hawkins
        added a comment - > But there should be some reason for creating submap. I guess it's speed, am I right? Yes the whole point is to avoid full scans when possible regardless of whether we're targeting a real table or a memory backed construct like the system tables. > So it would be great to have that submap work right if it isn't a complex task. It is working as intended. > If you look at ASCII table you'll see that characters follow in such order: Yes, I am aware of the character ordering. Here again it does not matter if if ^ _ etc. are initially included in the submap. What matters is that the partial prefix matching reduces the amount of the table that we scan.

          People

          • Assignee:
            Steven Hawkins
            Reporter:
            Paul Lysak
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: