[TEIID-3748] Impala translator - SELECT and HAVING statements are translating differently for Case statements Created: 09/Oct/15  Updated: 06/Dec/17  Resolved: 03/Feb/16

Status: Closed
Project: Teiid
Component/s: JDBC Connector
Affects Version/s: 8.11.4
Fix Version/s: 9.0, 8.12.5, 8.13.1

Type: Bug Priority: Major
Reporter: Don Krapohl Assignee: Steven Hawkins
Resolution: Done Votes: 0
Labels: Impala_Translator, Translators
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu Trusty


Attachments: Text File server.log    
Steps to Reproduce:

Execute

SELECT user_key, sum(firstcol),count(distinct case when secondcol >= 0 then 1 end)
FROM sometable
WHERE customer_key=6
GROUP BY user_key
HAVING sum(firstcol)>100
AND count(distinct case when secondcol >= 0 then 1 end)=0


 Description   

Error from Impala-
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (CASE WHEN (secondcol >= 0) THEN 1 ELSE CAST(NULL AS STRING) END))
deviating function: count(DISTINCT (CASE WHEN (secondcol >= 0) THEN 1 ELSE NULL END))

Query:
SELECT user_key, sum(firstcol),count(distinct case when secondcol >= 0 then 1 end)
FROM sometable
WHERE customer_key=6
GROUP BY user_key
HAVING sum(firstcol)>100
AND count(distinct case when secondcol >= 0 then 1 end)=0

Query explanation:
For all users
Add up values in the firstcol column (integer column)
count distinct values in secondcol where secondcol value zero or more
otherwise return null (output is string)

Translated Teiid query:
SELECT user_key, SUM(firstcol) as `EXPR_0`, COUNT(DISTINCT (CASE WHEN (secondcol >= 0) THEN '1' ELSE CAST(NULL AS STRING) END)) as `EXPR_1`
FROM sometable
WHERE customer_key` = 6
HAVING (EXPR_0 > 100) AND (COUNT(DISTINCT (CASE WHEN (secondcol >= 0) THEN '1' ELSE NULL END)) = 0))

Note the difference between the select and having for EXPR_1:
Select - THEN '1' ELSE CAST(NULL AS STRING) END
Having - THEN '1' ELSE NULL END

Impala doesn't accept that these are the same aggregate function. Aliases aren't accepted in the HAVING.

One further observation- if we swap the translation and write the statement in the select as
COUNT(DISTINCT (CASE WHEN (secondcol >= 0) THEN '1' ELSE NULL END))

Teiid translates the SELECT to
COUNT(DISTINCT (CASE WHEN (secondcol >= 0) THEN '1' ELSE CAST(NULL AS STRING) END))

So it always makes these mismatched.



 Comments   
Comment by Steven Hawkins [ 09/Oct/15 ]

I am not able to reproduce this on 8.11/12. Would it be possible to provide some of a log that shows the user query and the source query / exception? Generally I don't see why you would have cast(null as string) as there's no need to introduce the string type.

Comment by Don Krapohl [ 13/Oct/15 ]

Agree on the cast(null as string). We're not doing that. We're doing:
count(distinct case when secondcol >= 0 then 1 end)

Since there's no ELSE in the statement the translator adds one (the cast(null as string)).

Will get logs and upload.

Comment by Steven Hawkins [ 13/Oct/15 ]

To reproduce I'm running a similar query against a simple source model and the impala translator, and not seeing the else being added nor the cast. Ideally I'll at least need the query plan (showing the pushdown query) which would clarify if the translator or engine is adding the else. The query planning debug log would clarify if this is happening in the engine. And just to make sure, it would be good to see the incoming query to the server to make sure there isn't something unexpected from you client side.

Comment by Steven Hawkins [ 20/Oct/15 ]

Any update on this?

Comment by Don Krapohl [ 20/Oct/15 ]

Sorry, no. Stuck in my own stories at the moment. We can close and I'll reopen when I have time to collect logs.

Comment by Steven Hawkins [ 20/Oct/15 ]

Marking as not reproducible for now. Will reopen if needed.

Comment by Don Krapohl [ 29/Oct/15 ]

Adding log.

Query executed:
SELECT publisher_key,sum(num_clicks),count(distinct case when order_pub_comm_base >= 0 then orderid end) FROM ActivityAdvertiserTransDate WHERE trans_date_key BETWEEN '2015-09-01' AND '2015-09-10' AND advertiser_key=2417 GROUP BY publisher_key HAVING sum(num_clicks)>100 AND count(distinct case when order_pub_comm_base >= 0 then orderid end)=0

See exception at line 4297

Caused by: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (CASE WHEN (g_0.order_pub_comm_base >= 0) THEN g_0.orderid ELSE CAST(NULL AS STRING) END)); deviating function: count(DISTINCT (CASE WHEN (g_0.order_pub_comm_base >= 0) THEN g_0.orderid ELSE NULL END))
), Query: SELECT `g_0`.`publisher_key`, SUM(ALL `g_0`.`num_clicks`) as `EXPR_0`, COUNT(DISTINCT (CASE WHEN (`g_0`.`order_pub_comm_base` >= 0) THEN `g_0`.`orderid` ELSE CAST(NULL AS STRING) END)) as `EXPR_1` FROM `mart`.`fact_activity_advertiser_trans_date` `g_0` WHERE ((`g_0`.`advertiser_key` = 2417) AND ((`g_0`.`trans_date_key` >= '2015-09-01') AND (`g_0`.`trans_date_key` <= '2015-09-10'))) GROUP BY `g_0`.`publisher_key` HAVING ((SUM(ALL `g_0`.`num_clicks`) > 100) AND (COUNT(DISTINCT (CASE WHEN (`g_0`.`order_pub_comm_base` >= 0) THEN `g_0`.`orderid` ELSE NULL END)) = 0)).

Comment by Steven Hawkins [ 29/Oct/15 ]

This is not reproducible for me. There is no logic anywhere in the impala or general jdbc translator that will take the Teiid sql source query:

SELECT g_0.publisher_key, SUM(g_0.num_clicks), COUNT(DISTINCT CASE WHEN g_0.order_pub_comm_base >= 0 THEN g_0.orderid END) FROM fact_activity_advertiser_trans_date.fact_activity_advertiser_trans_date AS g_0 WHERE (g_0.trans_date_key >= '2015-09-01') AND (g_0.trans_date_key <= '2015-09-10') AND (g_0.advertiser_key = 2417) GROUP BY g_0.publisher_key HAVING (SUM(g_0.num_clicks) > 100) AND (COUNT(DISTINCT CASE WHEN g_0.order_pub_comm_base >= 0 THEN g_0.orderid END) = 0)

and insert the else clauses.

It would certainly appear that this and TEIID-3794 are due to a custom translator. Can you confirm if you are using the standard impala translator?

Comment by Don Krapohl [ 30/Oct/15 ]

We are using the default translator. Any other logs I can provide?

Comment by Steven Hawkins [ 01/Nov/15 ]

Actually I should have noticed before that more logs aren't necessary. From the log it is clear that the offending sql is not being introduced by Teiid.

The SQL Teiid sends to Impala is:

14:05:33,330 DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue1) VH2vuUmDSwHy Source-specific command: SELECT g_0.publisher_key, SUM(g_0.num_clicks), COUNT(DISTINCT CASE WHEN g_0.order_pub_comm_base >= 0 THEN g_0.orderid END) FROM mart.fact_activity_advertiser_trans_date g_0 WHERE g_0.trans_date_key >= '2015-09-01' AND g_0.trans_date_key <= '2015-09-10' AND g_0.advertiser_key = 2417 GROUP BY g_0.publisher_key HAVING SUM(g_0.num_clicks) > 100 AND COUNT(DISTINCT CASE WHEN g_0.order_pub_comm_base >= 0 THEN g_0.orderid END) = 0

That's the final string that is handed off to the driver. The error message comes back with the else clauses added.

What impala client/server are you using? I've tested a similar scenario with the JDBC Hive Driver 1.2.1 and Impala 2.2 and do not receive an error.

Comment by Steven Hawkins [ 03/Nov/15 ]

Changing issue to a quality risk and pulling from 8.12.2 as it appears this is behavior introduced by the driver/impala. Will need more details to confirm.

Comment by Steven Hawkins [ 03/Feb/16 ]

I was able to reproduce this locally. With the initial query:

SELECT stringkey, sum(intnum),count(distinct case when floatnum >= 0 then 1 end) FROM smalla WHERE intkey=6 GROUP BY stringkey HAVING sum(intnum)>100 AND count(distinct case when floatnum >= 0 then 1 end)=0

The pushdown is:

SELECT g_0.stringkey, SUM(g_0.intnum),

COUNT(DISTINCT CASE WHEN g_0.floatnum >= 0.0 THEN 1 END)

FROM `smalla` AS g_0 WHERE g_0.intkey = 6

GROUP BY g_0.stringkey HAVING SUM(g_0.intnum) > 100 AND

COUNT(DISTINCT CASE WHEN g_0.floatnum >= convert(0, float) THEN 1 END) = 0

So the non-evaluated cast is then causing an issue for Impala

Comment by Steven Hawkins [ 03/Feb/16 ]

Updated the rewriter logic to also rewrite aggregate arguments - regardless of where they appear in the query.

Generated at Mon Dec 17 10:58:07 EST 2018 using Jira 7.12.1#712002-sha1:609a50578ba6bc73dbf8b05dddd7c04a04b6807c.