Details
-
Feature Request
-
Resolution: Obsolete
-
Major
-
8.10
-
None
Description
Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions. Two examples I know of would be Netezza and Hive.
In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance. Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
Similarly, Hive DDL has both partitions and buckets (pre-sorted).