Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5783

OUTER JOIN yields wrong results when used with GROUP BY, CASE and COUNT DISTINCT

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Done
    • Affects Version/s: 12.0
    • Fix Version/s: 12.3, 12.1.2, 12.2.2
    • Component/s: Query Engine
    • Labels:
      None
    • Environment:

      teiid-12.0.0 on WildFly Full 14.0.1.Final (WildFly Core 6.0.2.Final)

    • Steps to Reproduce:
      Hide

      1. Add file resource adapter in standalone-teiid.xml:

                      <resource-adapter id="file">
                          <module slot="main" id="org.jboss.teiid.resource-adapter.file"/>
                          <connection-definitions>
                              <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/files" enabled="true" pool-name="files">
                                  <config-property name="AllowParentPaths">
                                      true
                                  </config-property>
                                  <config-property name="ParentDirectory">
                                      C:/testdata
                                  </config-property>
                              </connection-definition>
                          </connection-definitions>
                      </resource-adapter>
      

      2. Add MySQL database configuration in standalone-teiid.xml:

                      <datasource jndi-name="java:/test_dwh_my" pool-name="test_dwh_my" enabled="true" use-java-context="true">
                          <connection-url>jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull</connection-url>
                          <driver>mysql</driver>
                          <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                          </pool>
                          <security>
                              <user-name>root</user-name>
                              <password>XXXXXX</password>
                          </security>
                          <validation>
                              <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                              <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource:

          <model name="dwh_my">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="test_dwh_my" translator-name="mylobs" connection-jndi-name="java:/test_dwh_my"/>
          </model>
      
          <translator name="mylobs" type="mysql5">
              <property name="CopyLobs" value="true" />
              <property name="SupportsNativeQueries" value="true"/>
          </translator>
      

      4. Configure in the test-vdb.xml the following virtual view:

          <model visible = "true" type = "VIRTUAL" name = "views">
              <metadata type = "DDL"><![CDATA[
                create view vLeft as
      SELECT
      "csv_table"."created_at",
      "csv_table"."order_id",
      "csv_table"."store_id" 
      FROM
      (call files.getFiles('test1.csv')) f,
      	TEXTTABLE(to_chars(f.file,'UTF-8') 
      		COLUMNS 
      		"created_at" STRING ,
      		"order_id" STRING ,
      		"store_id" STRING 
      		DELIMITER ';' 
      		QUOTE '"' 
      		HEADER 1 
      	)
      "csv_table"
              ]]>
              </metadata>
          </model>
      

      5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types).

      6. Run the statement below (rebuy_check column contains no NULLs, that's correct):

      select 
      count( a."order_id") anzahl_orders, 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      

      7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect):

      select 
      count(distinct a."order_id") anzahl_orders, 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      

      8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect):

      select 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      
      Show
      1. Add file resource adapter in standalone-teiid.xml: <resource-adapter id= "file" > <module slot= "main" id= "org.jboss.teiid.resource-adapter.file" /> <connection-definitions> <connection-definition class-name= "org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name= "java:/files" enabled= "true" pool-name= "files" > <config-property name= "AllowParentPaths" > true </config-property> <config-property name= "ParentDirectory" > C:/testdata </config-property> </connection-definition> </connection-definitions> </resource-adapter> 2. Add MySQL database configuration in standalone-teiid.xml: <datasource jndi-name= "java:/test_dwh_my" pool-name= "test_dwh_my" enabled= "true" use-java-context= "true" > <connection-url> jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull </connection-url> <driver> mysql </driver> <new-connection-sql> set SESSION sql_mode = 'ANSI' </new-connection-sql> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> </pool> <security> <user-name> root </user-name> <password> XXXXXX </password> </security> <validation> <valid-connection-checker class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker" /> <exception-sorter class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter" /> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource: <model name= "dwh_my" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_dwh_my" translator-name= "mylobs" connection-jndi-name= "java:/test_dwh_my" /> </model> <translator name= "mylobs" type= "mysql5" > <property name= "CopyLobs" value= "true" /> <property name= "SupportsNativeQueries" value= "true" /> </translator> 4. Configure in the test-vdb.xml the following virtual view: <model visible = "true" type = "VIRTUAL" name = "views" > <metadata type = "DDL" > <![CDATA[ create view vLeft as SELECT "csv_table" . "created_at" , "csv_table" . "order_id" , "csv_table" . "store_id" FROM (call files.getFiles( 'test1.csv' )) f, TEXTTABLE(to_chars(f.file, 'UTF-8' ) COLUMNS "created_at" STRING , "order_id" STRING , "store_id" STRING DELIMITER ';' QUOTE '"' HEADER 1 ) "csv_table" ]]> </metadata> </model> 5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types). 6. Run the statement below (rebuy_check column contains no NULLs, that's correct): select count ( a . "order_id" ) anzahl_orders, ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;; 7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect): select count ( distinct a . "order_id" ) anzahl_orders, ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;; 8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect): select ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;;

      Description

      Some queries that involve an OUTER JOIN between objects from different data sources, a GROUP BY with a CASE statement and a COUNT DISTINCT yield inconsistent results on a SELECT expression which is identical to the GROUP BY expression with CASE.

      • Reproduced in combination of <File> LEFT JOIN <MySQL> and <File> LEFT JOIN <MS SQL> and the same for the other OUTER JOIN variants.
      • screenshot for the different queries are attached
      • sample data for reproduction is attached

      The same problem happens with LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN.

        Gliffy Diagrams

          Attachments

          1. test1.csv
            13 kB
          2. test2.csv
            4 kB

            Activity

              People

              • Assignee:
                shawkins Steven Hawkins
                Reporter:
                dalex005 Dmitrii Pogorelov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: