[TEIID-5354] Different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 11.0, 10.3.2, 10.2.3, 8.12.14.6_4
Affects Version/s: 10.2
Component/s: Query Engine
Labels:
None

Steps to Reproduce:
Hide

1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database.

2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml:

<datasource jndi-name="java:/test_pg" pool-name="test_pg" enabled="true" use-java-context="true"> <connection-url>jdbc:postgresql://localhost:5432/test_db?charSet=utf8</connection-url> <driver-class>org.postgresql.Driver</driver-class> <driver>org.postgresql</driver> <pool> <min-pool-size>2</min-pool-size> <max-pool-size>70</max-pool-size> <prefill>false</prefill> <use-strict-min>false</use-strict-min> <flush-strategy>FailingConnectionOnly</flush-strategy> </pool> <security> <user-name>postgres</user-name> <password>xxxxxx</password> </security> <validation> <check-valid-connection-sql>select 0</check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis>120000</blocking-timeout-millis> <idle-timeout-minutes>5</idle-timeout-minutes> </timeout> </datasource> <datasource jndi-name="java:/adventureworks" pool-name="adventureworks" enabled="true" use-java-context="true"> <connection-url>jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull</connection-url> <driver>mysql</driver> <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql> <pool> <min-pool-size>2</min-pool-size> <max-pool-size>70</max-pool-size> </pool> <security> <user-name>root</user-name> <password>xxxxxx</password> </security> <validation> <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/> <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/> </validation> <timeout> <blocking-timeout-millis>120000</blocking-timeout-millis> <idle-timeout-minutes>5</idle-timeout-minutes> </timeout> </datasource>

3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources:

<model name="test_pg"> <property name="importer.useFullSchemaName" value="false"/> <property name="importer.tableTypes" value="TABLE,VIEW"/> <property name="importer.importKeys" value="false"/> <source name="test_pg" translator-name="myPg" connection-jndi-name="java:/test_pg"/> </model> <model name="adventureworks"> <property name="importer.useFullSchemaName" value="false"/> <property name="importer.tableTypes" value="TABLE,VIEW"/> <property name="importer.importKeys" value="false"/> <source name="adventureworks" translator-name="mylobs" connection-jndi-name="java:/adventureworks"/> </model> <translator name="mylobs" type="mysql5"> <property name="CopyLobs" value="true" /> <property name="SupportsNativeQueries" value="true"/>  </translator> <translator name="myPg" type="postgresql"> <property name="SupportsNativeQueries" value="true"/>  </translator>

4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message:

Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2). Please check the translator settings to ensure character columns used for joining are sorted as expected.

The check was introduced in scope of ~~TEIID-4129~~ issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped.

5. Run the following queries and compare theSum and theCount column values:

select r.city, sum(1) as theSum ,count(*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" group by r.city order by r.city ;;

select r.city, sum(1) as theSum ,count(*) as theCount from "dsp.address_pg" r inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" group by r.city order by r.city ;;

but the following queries return correct results:

select r.city, sum(1) as theSum ,count(*) as theCount from "dsp.address_pg" r right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" group by r.city order by r.city ;;

the second one is also correct (LEFT JOIN) but uses window function:

select distinct city, sum(1) OVER (PARTITION BY city) as theSum ,count(*) OVER (PARTITION BY city) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" order by r.city ;;
Show
1. Create address_pg in PostgreSQL DB and stateprovince in MySQL DB tables (address_pg.sql and stateprovince_mysql.sql scripts are in attachments). Actually these tables are pieces from free and very well known adventureworks database. 2. Add postgresql (in my example I use test_db database) and mysql (in my case I use adventureworks database) database configurations in standalone-teiid.xml: <datasource jndi-name= "java:/test_pg" pool-name= "test_pg" enabled= "true" use-java-context= "true" > <connection-url> jdbc:postgresql://localhost:5432/test_db?charSet=utf8 </connection-url> <driver-class> org.postgresql.Driver </driver-class> <driver> org.postgresql </driver> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> <prefill> false </prefill> <use-strict-min> false </use-strict-min> <flush-strategy> FailingConnectionOnly </flush-strategy> </pool> <security> <user-name> postgres </user-name> <password> xxxxxx </password> </security> <validation> <check-valid-connection-sql> select 0 </check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> <datasource jndi-name= "java:/adventureworks" pool-name= "adventureworks" enabled= "true" use-java-context= "true" > <connection-url> jdbc:mysql://localhost:3306/adventureworks?zeroDateTimeBehavior=convertToNull </connection-url> <driver> mysql </driver> <new-connection-sql> set SESSION sql_mode = 'ANSI' </new-connection-sql> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> </pool> <security> <user-name> root </user-name> <password> xxxxxx </password> </security> <validation> <valid-connection-checker class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker" /> <exception-sorter class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter" /> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_pg and java:/adventureworks configured in previous step as data sources: <model name= "test_pg" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_pg" translator-name= "myPg" connection-jndi-name= "java:/test_pg" /> </model> <model name= "adventureworks" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "adventureworks" translator-name= "mylobs" connection-jndi-name= "java:/adventureworks" /> </model> <translator name= "mylobs" type= "mysql5" > <property name= "CopyLobs" value= "true" /> <property name= "SupportsNativeQueries" value= "true" />  </translator> <translator name= "myPg" type= "postgresql" > <property name= "SupportsNativeQueries" value= "true" />  </translator> 4. Comment out the line in MergeJoinStrategy.compareToPrevious method where Teiid generates the TEIID31202 exception. Otherwise Teiid will show the following error message: Unable to find source-code formatter for language: noformat. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml Error: TEIID31202 Detected that an already sorted set of values was not in the expected order (typically UTF-16 / UCS-2). Please check the translator settings to ensure character columns used for joining are sorted as expected. The check was introduced in scope of TEIID-4129 issue but there was a case related to join of two varchar fields, in our case we have a join through two integer fields. So I tried to turn on/off the org.teiid.assumeMatchingCollation system property, also tried to set the collationLocale property in MySQL and PostgreSQL translators - nothing helped. 5. Run the following queries and compare theSum and theCount column values: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; but the following queries return correct results: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r right join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; the second one is also correct (LEFT JOIN) but uses window function: select distinct city, sum (1) OVER ( PARTITION BY city) as theSum , count (*) OVER ( PARTITION BY city) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" order by r.city ;;
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=1584391

Teiid returns different results for a query with GROUP BY and SUM(1) (summation of the number 1) when there is a LEFT or INNER JOIN involved.
There are two problems: a) incorrect check in MergeJoinStrategy.compareToPrevious method which generates TEIID31202 exception and b) a bug in algorithm of join itself which leads to incorrect results.
To reproduce the bug, please, run the following queries and compare theSum and theCount column values:

select  r.city,
		sum(1) as theSum
		,count(*) as theCount
from "dsp.address_pg" r
	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
group by r.city
order by r.city ;;

select  r.city,
		sum(1) as theSum
		,count(*) as theCount
from "dsp.address_pg" r
	inner join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
group by r.city
order by r.city ;;

but the following queries return correct results:

select  r.city,
		sum(1) as theSum
		,count(*) as theCount
from "dsp.address_pg" r
	right join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
group by r.city
order by r.city ;;

the second one is also correct (LEFT JOIN) but uses window function:

select  distinct city,
		sum(1) OVER (PARTITION BY city) as theSum
		,count(*) OVER (PARTITION BY city) as theCount
from "dsp.address_pg" r
	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
order by r.city ;;

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

address_pg.sql
5.73 MB
2018/05/29 7:00 AM
stateprovince_mysql.sql
19 kB
2018/05/29 7:00 AM

Jan Stastny added a comment - 2018/06/12 6:07 AM

rhn-engineering-shawkins Ok, I am able to reproduce the issue. I am getting rows containing number 1 instead of the same value as the count(*) aggregate.

Jan Stastny added a comment - 2018/06/12 6:07 AM rhn-engineering-shawkins Ok, I am able to reproduce the issue. I am getting rows containing number 1 instead of the same value as the count(*) aggregate.

Jan Stastny added a comment - 2018/06/12 3:51 AM

Hello rhn-engineering-shawkins,
> b) a bug in algorithm of join itself which leads to incorrect results.
what was the issue about? with the example query:

select  r.city,
		sum(1) as theSum
		,count(*) as theCount
from "dsp.address_pg" r
	left join "adventureworks.stateprovince" c on "r.stateprovinceid" = "c.stateprovinceid" 
group by r.city
order by r.city ;;

were there different values for sum(1) and count(*) showing?

Jan Stastny added a comment - 2018/06/12 3:51 AM Hello rhn-engineering-shawkins , > b) a bug in algorithm of join itself which leads to incorrect results. what was the issue about? with the example query: select r.city, sum (1) as theSum , count (*) as theCount from "dsp.address_pg" r left join "adventureworks.stateprovince" c on "r.stateprovinceid" = " c .stateprovinceid" group by r.city order by r.city ;; were there different values for sum(1) and count(*) showing?

Dmitrii Pogorelov added a comment - 2018/05/30 1:25 PM

rhn-engineering-shawkins thx a lot for the quick fix, I've just checked it and these changes helped me.

Dmitrii Pogorelov added a comment - 2018/05/30 1:25 PM rhn-engineering-shawkins thx a lot for the quick fix, I've just checked it and these changes helped me.

Steven Hawkins added a comment - 2018/05/30 10:53 AM

Generalized the logic dealing with aggregates over literals with similar handling to the count star case.

Steven Hawkins added a comment - 2018/05/30 10:53 AM Generalized the logic dealing with aggregates over literals with similar handling to the count star case.

Steven Hawkins added a comment - 2018/05/29 10:45 AM

Yes the partial aggregation logic is not considering the break down of an aggregate on a literal when there are other other decompositions. I'll have this addressed today.

Steven Hawkins added a comment - 2018/05/29 10:45 AM Yes the partial aggregation logic is not considering the break down of an aggregate on a literal when there are other other decompositions. I'll have this addressed today.

Steven Hawkins added a comment - 2018/05/29 9:46 AM

> 4. Comment out the line in MergeJoinStrategy.compareToPrevious method

This was corrected by ~~TEIID-5339~~

I should be able to access the rest shortly.

Steven Hawkins added a comment - 2018/05/29 9:46 AM > 4. Comment out the line in MergeJoinStrategy.compareToPrevious method This was corrected by TEIID-5339 I should be able to access the rest shortly.

Assignee:: Steven Hawkins

Reporter:: Dmitrii Pogorelov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2018/05/29 7:00 AM

Updated:: 2021/10/24 5:59 AM

Resolved:: 2018/05/30 10:53 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

Collapse comment: Jan Stastny added a comment - 2018/06/12 6:07 AM

Expand comment: Jan Stastny added a comment - 2018/06/12 6:07 AM

Collapse comment: Jan Stastny added a comment - 2018/06/12 3:51 AM

Expand comment: Jan Stastny added a comment - 2018/06/12 3:51 AM

Collapse comment: Dmitrii Pogorelov added a comment - 2018/05/30 1:25 PM

Expand comment: Dmitrii Pogorelov added a comment - 2018/05/30 1:25 PM

Collapse comment: Steven Hawkins added a comment - 2018/05/30 10:53 AM

Expand comment: Steven Hawkins added a comment - 2018/05/30 10:53 AM

Collapse comment: Steven Hawkins added a comment - 2018/05/29 10:45 AM

Expand comment: Steven Hawkins added a comment - 2018/05/29 10:45 AM

Collapse comment: Steven Hawkins added a comment - 2018/05/29 9:46 AM

Expand comment: Steven Hawkins added a comment - 2018/05/29 9:46 AM

People

Dates