Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5354

Different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved

    • Hide

      1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database.

      2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml:

      <datasource jndi-name="java:/test_pg" pool-name="test_pg" enabled="true" use-java-context="true">
                          <connection-url>jdbc:postgresql://localhost:5432/test_db?charSet=utf8</connection-url>
                          <driver-class>org.postgresql.Driver</driver-class>
                          <driver>org.postgresql</driver>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                              <prefill>false</prefill>
                              <use-strict-min>false</use-strict-min>
                              <flush-strategy>FailingConnectionOnly</flush-strategy>
                          </pool>
                          <security>
                              <user-name>postgres</user-name>
                              <password>xxxxxx</password>
                          </security>
                          <validation>
                              <check-valid-connection-sql>select 0</check-valid-connection-sql>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      
      <datasource jndi-name="java:/adventureworks" pool-name="adventureworks" enabled="true" use-java-context="true">
                          <connection-url>jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull</connection-url>
                          <driver>mysql</driver>
                          <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                          </pool>
                          <security>
                              <user-name>root</user-name>
                              <password>xxxxxx</password>
                          </security>
                          <validation>
                              <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                              <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources:

      <model name="test_pg">
              <property name="importer.useFullSchemaName" value="false"/>
      	<property name="importer.tableTypes" value="TABLE,VIEW"/>
      	<property name="importer.importKeys" value="false"/>
              <source name="test_pg" translator-name="myPg" connection-jndi-name="java:/test_pg"/>
          </model>
      
      <model name="adventureworks">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="adventureworks" translator-name="mylobs" connection-jndi-name="java:/adventureworks"/>
          </model>
      
      <translator name="mylobs" type="mysql5">
              <property name="CopyLobs" value="true" />
              <property name="SupportsNativeQueries" value="true"/>
              <!--<property name="collationLocale" value="UTF-8"/>-->
          </translator>
      
          <translator name="myPg" type="postgresql">
              <property name="SupportsNativeQueries" value="true"/>
              <!--<property name="collationLocale" value="UTF-8"/>-->
          </translator>
      

      4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message:

      Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2).  Please check the translator settings to ensure character columns used for joining are sorted as expected.
      

      The check was introduced in scope of TEIID-4129 issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped.

      5. Run the following queries and compare theSum and theCount column values:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      
      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      but the following queries return correct results:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      the second one is also correct (LEFT JOIN) but uses window function:

      select  distinct city,
      		sum(1) OVER (PARTITION BY city) as theSum
      		,count(*) OVER (PARTITION BY city) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      order by r.city ;;
      
      Show
      1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database. 2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml: <datasource jndi-name= "java:/test_pg" pool-name= "test_pg" enabled= "true" use-java-context= "true" > <connection-url> jdbc:postgresql://localhost:5432/test_db?charSet=utf8 </connection-url> <driver-class> org.postgresql.Driver </driver-class> <driver> org.postgresql </driver> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> <prefill> false </prefill> <use-strict-min> false </use-strict-min> <flush-strategy> FailingConnectionOnly </flush-strategy> </pool> <security> <user-name> postgres </user-name> <password> xxxxxx </password> </security> <validation> <check-valid-connection-sql> select 0 </check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> <datasource jndi-name= "java:/adventureworks" pool-name= "adventureworks" enabled= "true" use-java-context= "true" > <connection-url> jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull </connection-url> <driver> mysql </driver> <new-connection-sql> set SESSION sql_mode = 'ANSI' </new-connection-sql> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> </pool> <security> <user-name> root </user-name> <password> xxxxxx </password> </security> <validation> <valid-connection-checker class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker" /> <exception-sorter class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter" /> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources: <model name= "test_pg" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_pg" translator-name= "myPg" connection-jndi-name= "java:/test_pg" /> </model> <model name= "adventureworks" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "adventureworks" translator-name= "mylobs" connection-jndi-name= "java:/adventureworks" /> </model> <translator name= "mylobs" type= "mysql5" > <property name= "CopyLobs" value= "true" /> <property name= "SupportsNativeQueries" value= "true" /> <!--<property name= "collationLocale" value= "UTF-8" /> --> </translator> <translator name= "myPg" type= "postgresql" > <property name= "SupportsNativeQueries" value= "true" /> <!--<property name= "collationLocale" value= "UTF-8" /> --> </translator> 4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message: Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2). Please check the translator settings to ensure character columns used for joining are sorted as expected. The check was introduced in scope of TEIID-4129 issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped. 5. Run the following queries and compare theSum and theCount column values: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; but the following queries return correct results: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r right join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; the second one is also correct (LEFT JOIN) but uses window function: select distinct city, sum (1) OVER ( PARTITION BY city) as theSum , count (*) OVER ( PARTITION BY city) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" order by r.city ;;

      Teiid returns different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved.
      There are two problems: a) incorrect check in MergeJoinStrategy.compareToPrevious method which generates TEIID31202 exception and b) a bug in algorithm of join itself which leads to incorrect results.
      To reproduce the bug, please, run the following queries and compare theSum and theCount column values:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      
      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      but the following queries return correct results:

      select  r.city,
      		sum(1) as theSum
      		,count(*) as theCount
      from "dsp.address_pg" r
      	right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      group by r.city
      order by r.city ;;
      

      the second one is also correct (LEFT JOIN) but uses window function:

      select  distinct city,
      		sum(1) OVER (PARTITION BY city) as theSum
      		,count(*) OVER (PARTITION BY city) as theCount
      from "dsp.address_pg" r
      	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
      order by r.city ;;
      

            [TEIID-5354] Different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved

            Jan Stastny added a comment -

            rhn-engineering-shawkins Ok, I am able to reproduce the issue. I am getting rows containing number 1 instead of the same value as the count(*) aggregate.

            Jan Stastny added a comment - rhn-engineering-shawkins Ok, I am able to reproduce the issue. I am getting rows containing number 1 instead of the same value as the count(*) aggregate.

            Jan Stastny added a comment -

            Hello rhn-engineering-shawkins,
            > b) a bug in algorithm of join itself which leads to incorrect results.
            what was the issue about? with the example query:

            select  r.city,
            		sum(1) as theSum
            		,count(*) as theCount
            from "dsp.address_pg" r
            	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
            group by r.city
            order by r.city ;;
            

            were there different values for sum(1) and count(*) showing?

            Jan Stastny added a comment - Hello rhn-engineering-shawkins , > b) a bug in algorithm of join itself which leads to incorrect results. what was the issue about? with the example query: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; were there different values for sum(1) and count(*) showing?

            rhn-engineering-shawkins thx a lot for the quick fix, I've just checked it and these changes helped me.

            Dmitrii Pogorelov added a comment - rhn-engineering-shawkins thx a lot for the quick fix, I've just checked it and these changes helped me.

            Generalized the logic dealing with aggregates over literals with similar handling to the count star case.

            Steven Hawkins added a comment - Generalized the logic dealing with aggregates over literals with similar handling to the count star case.

            Yes the partial aggregation logic is not considering the break down of an aggregate on a literal when there are other other decompositions. I'll have this addressed today.

            Steven Hawkins added a comment - Yes the partial aggregation logic is not considering the break down of an aggregate on a literal when there are other other decompositions. I'll have this addressed today.

            > 4. Comment out the line in MergeJoinStrategy.compareToPrevious method

            This was corrected by TEIID-5339

            I should be able to access the rest shortly.

            Steven Hawkins added a comment - > 4. Comment out the line in MergeJoinStrategy.compareToPrevious method This was corrected by TEIID-5339 I should be able to access the rest shortly.

              rhn-engineering-shawkins Steven Hawkins
              dalex005 Dmitrii Pogorelov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: