Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5783

OUTER JOIN yields wrong results when used with GROUP BY, CASE and COUNT DISTINCT

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Done
    • Affects Version/s: 12.0
    • Fix Version/s: 12.3, 12.1.2, 12.2.2
    • Component/s: Query Engine
    • Labels:
      None
    • Environment:

      teiid-12.0.0 on WildFly Full 14.0.1.Final (WildFly Core 6.0.2.Final)

    • Steps to Reproduce:
      Hide

      1. Add file resource adapter in standalone-teiid.xml:

                      <resource-adapter id="file">
                          <module slot="main" id="org.jboss.teiid.resource-adapter.file"/>
                          <connection-definitions>
                              <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/files" enabled="true" pool-name="files">
                                  <config-property name="AllowParentPaths">
                                      true
                                  </config-property>
                                  <config-property name="ParentDirectory">
                                      C:/testdata
                                  </config-property>
                              </connection-definition>
                          </connection-definitions>
                      </resource-adapter>
      

      2. Add MySQL database configuration in standalone-teiid.xml:

                      <datasource jndi-name="java:/test_dwh_my" pool-name="test_dwh_my" enabled="true" use-java-context="true">
                          <connection-url>jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull</connection-url>
                          <driver>mysql</driver>
                          <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                          </pool>
                          <security>
                              <user-name>root</user-name>
                              <password>XXXXXX</password>
                          </security>
                          <validation>
                              <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                              <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource:

          <model name="dwh_my">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="test_dwh_my" translator-name="mylobs" connection-jndi-name="java:/test_dwh_my"/>
          </model>
      
          <translator name="mylobs" type="mysql5">
              <property name="CopyLobs" value="true" />
              <property name="SupportsNativeQueries" value="true"/>
          </translator>
      

      4. Configure in the test-vdb.xml the following virtual view:

          <model visible = "true" type = "VIRTUAL" name = "views">
              <metadata type = "DDL"><![CDATA[
                create view vLeft as
      SELECT
      "csv_table"."created_at",
      "csv_table"."order_id",
      "csv_table"."store_id" 
      FROM
      (call files.getFiles('test1.csv')) f,
      	TEXTTABLE(to_chars(f.file,'UTF-8') 
      		COLUMNS 
      		"created_at" STRING ,
      		"order_id" STRING ,
      		"store_id" STRING 
      		DELIMITER ';' 
      		QUOTE '"' 
      		HEADER 1 
      	)
      "csv_table"
              ]]>
              </metadata>
          </model>
      

      5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types).

      6. Run the statement below (rebuy_check column contains no NULLs, that's correct):

      select 
      count( a."order_id") anzahl_orders, 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      

      7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect):

      select 
      count(distinct a."order_id") anzahl_orders, 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      

      8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect):

      select 
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
      FROM views.vLeft a
      left JOIN dwh_my.vRight b on b.order_id=a.order_id
      group by
      (case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
      
      Show
      1. Add file resource adapter in standalone-teiid.xml: <resource-adapter id= "file" > <module slot= "main" id= "org.jboss.teiid.resource-adapter.file" /> <connection-definitions> <connection-definition class-name= "org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name= "java:/files" enabled= "true" pool-name= "files" > <config-property name= "AllowParentPaths" > true </config-property> <config-property name= "ParentDirectory" > C:/testdata </config-property> </connection-definition> </connection-definitions> </resource-adapter> 2. Add MySQL database configuration in standalone-teiid.xml: <datasource jndi-name= "java:/test_dwh_my" pool-name= "test_dwh_my" enabled= "true" use-java-context= "true" > <connection-url> jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull </connection-url> <driver> mysql </driver> <new-connection-sql> set SESSION sql_mode = 'ANSI' </new-connection-sql> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> </pool> <security> <user-name> root </user-name> <password> XXXXXX </password> </security> <validation> <valid-connection-checker class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker" /> <exception-sorter class-name= "org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter" /> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource: <model name= "dwh_my" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_dwh_my" translator-name= "mylobs" connection-jndi-name= "java:/test_dwh_my" /> </model> <translator name= "mylobs" type= "mysql5" > <property name= "CopyLobs" value= "true" /> <property name= "SupportsNativeQueries" value= "true" /> </translator> 4. Configure in the test-vdb.xml the following virtual view: <model visible = "true" type = "VIRTUAL" name = "views" > <metadata type = "DDL" > <![CDATA[ create view vLeft as SELECT "csv_table" . "created_at" , "csv_table" . "order_id" , "csv_table" . "store_id" FROM (call files.getFiles( 'test1.csv' )) f, TEXTTABLE(to_chars(f.file, 'UTF-8' ) COLUMNS "created_at" STRING , "order_id" STRING , "store_id" STRING DELIMITER ';' QUOTE '"' HEADER 1 ) "csv_table" ]]> </metadata> </model> 5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types). 6. Run the statement below (rebuy_check column contains no NULLs, that's correct): select count ( a . "order_id" ) anzahl_orders, ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;; 7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect): select count ( distinct a . "order_id" ) anzahl_orders, ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;; 8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect): select ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) rebuy_check FROM views.vLeft a left JOIN dwh_my.vRight b on b.order_id= a .order_id group by ( case when b. "order_id" is not null then 'Direct Rebuy' else 'Standard' end ) ;;

      Description

      Some queries that involve an OUTER JOIN between objects from different data sources, a GROUP BY with a CASE statement and a COUNT DISTINCT yield inconsistent results on a SELECT expression which is identical to the GROUP BY expression with CASE.

      • Reproduced in combination of <File> LEFT JOIN <MySQL> and <File> LEFT JOIN <MS SQL> and the same for the other OUTER JOIN variants.
      • screenshot for the different queries are attached
      • sample data for reproduction is attached

      The same problem happens with LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN.

        Gliffy Diagrams

          Attachments

          1. test1.csv
            13 kB
            Dmitrii Pogorelov
          2. test2.csv
            4 kB
            Dmitrii Pogorelov

            Activity

              People

              • Assignee:
                shawkins Steven Hawkins
                Reporter:
                dalex005 Dmitrii Pogorelov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: