Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5354

Different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Done
    • Affects Version/s: 10.2
    • Fix Version/s: 11.0, 10.3.2, 10.2.3, 8.12.14.6_4
    • Component/s: Query Engine
    • Labels:
      None
    • Environment:

      teiid-10.2.0 on WildFly Full 11.0.0.Final (WildFly Core 3.0.8.Final)

    • Steps to Reproduce:
      Hide

      1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database.

      2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml:

      <datasource jndi-name="java:/test_pg" pool-name="test_pg" enabled="true" use-java-context="true">
                          <connection-url>jdbc:postgresql://localhost:5432/test_db?charSet=utf8</connection-url>
                          <driver-class>org.postgresql.Driver</driver-class>
                          <driver>org.postgresql</driver>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                              <prefill>false</prefill>
                              <use-strict-min>false</use-strict-min>
                              <flush-strategy>FailingConnectionOnly</flush-strategy>
                          </pool>
                          <security>
                              <user-name>postgres</user-name>
                              <password>xxxxxx</password>
                          </security>
                          <validation>
                              <check-valid-connection-sql>select 0</check-valid-connection-sql>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      
      <datasource jndi-name="java:/adventureworks" pool-name="adventureworks" enabled="true" use-java-context="true">
                          <connection-url>jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull</connection-url>
                          <driver>mysql</driver>
                          <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                          </pool>
                          <security>
                              <user-name>root</user-name>
                              <password>xxxxxx</password>
                          </security>
                          <validation>
                              <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                              <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources:

      <model name="test_pg">
              <property name="importer.useFullSchemaName" value="false"/>
      	<property name="importer.tableTypes" value="TABLE,VIEW"/>
      	<property name="importer.importKeys" value="false"/>
              <source name="test_pg" translator-name="myPg" connection-jndi-name="java:/test_pg"/>
          </model>
      
      <model name="adventureworks">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="adventureworks" translator-name="mylobs" connection-jndi-name="java:/adventureworks"/>
          </model>
      
      <translator name="mylobs" type="mysql5">
              <property name="CopyLobs" value="true" />
              <property name="SupportsNativeQueries" value="true"/>
              <!--<property name="collationLocale" value="UTF-8"/>-->
          </translator>
      
          <translator name="myPg" type="postgresql">
              <property name="SupportsNativeQueries" value="true"/>
              <!--<property name="collationLocale" value="UTF-8"/>-->
          </translator>
      

      4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message:

      Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2).  Please check the translator settings to ensure character columns used for joining are sorted as expected.
      

      The check was introduced in scope of TEIID-4129 issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped.

      5. Run the following queries and compare theSum and theCount column values:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      
      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      but the following queries return correct results:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      the second one is also correct (LEFT JOIN) but uses window function:

      select  distinct city,
      		sum(1) OVER (PARTITION BY city) as theSum
      		,count(*) OVER (PARTITION BY city) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      order by r.city ;;
      
      Show
      1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database. 2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml: <datasource jndi-name= "java:/test_pg" pool-name= "test_pg" enabled= "true" use-java-context= "true" > <connection-url> jdbc:postgresql://localhost:5432/test_db?charSet=utf8 </connection-url> <driver-class> org.postgresql.Driver </driver-class> <driver> org.postgresql </driver> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> <prefill> false </prefill> <use-strict-min> false </use-strict-min> <flush-strategy> FailingConnectionOnly </flush-strategy> </pool> <security> <user-name> postgres </user-name> <password> xxxxxx </password> </security> <validation> <check-valid-connection-sql> select 0 </check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> <datasource jndi-name= "java:/adventureworks" pool-name= "adventureworks" enabled= "true" use-java-context= "true" > <connection-url> jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull </connection-url> <driver> mysql </driver> <new-connection-sql> set SESSION sql_mode = 'ANSI' </new-connection-sql> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> </pool> <security> <user-name> root </user-name> <password> xxxxxx </password> </security> <validation> <valid-connection-checker class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker" /> <exception-sorter class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter" /> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources: <model name= "test_pg" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_pg" translator-name= "myPg" connection-jndi-name= "java:/test_pg" /> </model> <model name= "adventureworks" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "adventureworks" translator-name= "mylobs" connection-jndi-name= "java:/adventureworks" /> </model> <translator name= "mylobs" type= "mysql5" > <property name= "CopyLobs" value= "true" /> <property name= "SupportsNativeQueries" value= "true" /> <!--<property name= "collationLocale" value= "UTF-8" /> --> </translator> <translator name= "myPg" type= "postgresql" > <property name= "SupportsNativeQueries" value= "true" /> <!--<property name= "collationLocale" value= "UTF-8" /> --> </translator> 4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message: Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2). Please check the translator settings to ensure character columns used for joining are sorted as expected. The check was introduced in scope of TEIID-4129 issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped. 5. Run the following queries and compare theSum and theCount column values: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; but the following queries return correct results: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r right join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; the second one is also correct (LEFT JOIN) but uses window function: select distinct city, sum (1) OVER ( PARTITION BY city) as theSum , count (*) OVER ( PARTITION BY city) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" order by r.city ;;
    • Bugzilla Update:
      Perform

      Description

      Teiid returns different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved.
      There are two problems: a) incorrect check in MergeJoinStrategy.compareToPrevious method which generates TEIID31202 exception and b) a bug in algorithm of join itself which leads to incorrect results.
      To reproduce the bug, please, run the following queries and compare theSum and theCount column values:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      
      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      but the following queries return correct results:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      the second one is also correct (LEFT JOIN) but uses window function:

      select  distinct city,
      		sum(1) OVER (PARTITION BY city) as theSum
      		,count(*) OVER (PARTITION BY city) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      order by r.city ;;
      

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                shawkins Steven Hawkins
                Reporter:
                dalex005 Dmitrii Pogorelov
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: