Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-3454

Dependent Join optimizations for Netezza and Hive

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Obsolete
    • Icon: Major Major
    • Backlog
    • 8.10
    • Query Engine
    • None

      Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions. Two examples I know of would be Netezza and Hive.

      In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance. Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.

      Similarly, Hive DDL has both partitions and buckets (pre-sorted).

            Unassigned Unassigned
            blue666man_jira John Muller (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: