Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5682

Raise aggregation to allow for join pushdown

    • Icon: Enhancement Enhancement
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • None
    • Query Engine
    • None

      In some plans, for example those created by odata expand see TEIID-5680, the resulting query has the form of:

      select tbl1-cols from tbl1 left outer join (select cols, aggregates ... from tbl2 group by cols) on (some non-aggregate predicate)

      If the aggregates or other intervening constructs are not able to be pushed the result will be a federated join.

      In most instances (where the tbl1 columns are sortable) it's possible to change this to:

      select tbl1-cols aggregates from tbl1 left outer join tbl1 on (...) group by tbl1-cols, cols

      which can allow the pushdown of the join to proceed.

            [TEIID-5682] Raise aggregation to allow for join pushdown

            Adding this into the optimizer seems too special purpose. A sketch of what this looks like in RuleRaiseAccess when you can't raise over a grouping:

            //perform the pullup of aggregation if necessary
                            PlanNode source = NodeEditor.findParent(accessNode, NodeConstants.Types.SOURCE, NodeConstants.Types.JOIN);
                            if (source == null) {
                                return null;
                            }
                            
                            //intervening nodes ...
                            //project node must be simple 
                            
                            PlanNode join = source.getParent();
                            if (join == null || join.getType() != NodeConstants.Types.JOIN) {
                                return null;
                            }
                            
                            JoinType type = (JoinType) join.getProperty(NodeConstants.Info.JOIN_TYPE);
                            //what about correlated references / lateral join
                            boolean left = join.getFirstChild() == source;
                            PlanNode other = left?join.getLastChild():join.getFirstChild();
                            if (other.getType() != NodeConstants.Types.ACCESS) {
                                return null;
                            }
                            List<Criteria> criteria = (List<Criteria>) join.getProperty(NodeConstants.Info.JOIN_CRITERIA);
                            if (canRaiseOverJoin(
                                Arrays.asList(left ? accessNode : other, left ? other : accessNode),
                                metadata, capFinder,
                                criteria,
                                type, record, context, false, false) == null) {
                                return null;
                            }
                            // on can't be on the aggregates
                            for (Criteria crit : criteria) {
                                //look at the symbol map to find aggregates, check against the symbols here
                                //...
                            }
            
                            //the other select columns must be sortable
            

            Altering the expand logic sql generation is possible - but adds a lot of additional logic and is only valid for a single expand.

            So I'm marking as deferred.

            Instead I've added a makeind hint to the outer side. For sources that don't support array_agg this ensures a relatively bounded set is used to perform the join. A potential refinement is to look for the inner cardinality and not use the hint if it is "small".

            Steven Hawkins added a comment - Adding this into the optimizer seems too special purpose. A sketch of what this looks like in RuleRaiseAccess when you can't raise over a grouping: //perform the pullup of aggregation if necessary PlanNode source = NodeEditor.findParent(accessNode, NodeConstants.Types.SOURCE, NodeConstants.Types.JOIN); if (source == null ) { return null ; } //intervening nodes ... //project node must be simple PlanNode join = source.getParent(); if (join == null || join.getType() != NodeConstants.Types.JOIN) { return null ; } JoinType type = (JoinType) join.getProperty(NodeConstants.Info.JOIN_TYPE); //what about correlated references / lateral join boolean left = join.getFirstChild() == source; PlanNode other = left?join.getLastChild():join.getFirstChild(); if (other.getType() != NodeConstants.Types.ACCESS) { return null ; } List<Criteria> criteria = (List<Criteria>) join.getProperty(NodeConstants.Info.JOIN_CRITERIA); if (canRaiseOverJoin( Arrays.asList(left ? accessNode : other, left ? other : accessNode), metadata, capFinder, criteria, type, record, context, false , false ) == null ) { return null ; } // on can't be on the aggregates for (Criteria crit : criteria) { //look at the symbol map to find aggregates, check against the symbols here //... } //the other select columns must be sortable Altering the expand logic sql generation is possible - but adds a lot of additional logic and is only valid for a single expand. So I'm marking as deferred. Instead I've added a makeind hint to the outer side. For sources that don't support array_agg this ensures a relatively bounded set is used to perform the join. A potential refinement is to look for the inner cardinality and not use the hint if it is "small".

            This seems a little involved for the problem that it is solving. With an incoming query such as:

            SELECT g0.e1, g0.e2, g0.e3, /*+ MJ */ (SELECT ARRAY_AGG((g1.e1, g1.e2) ORDER BY g1.e1) FROM PM1.G2 AS g1 WHERE g0.e2 = g1.e2) FROM PM1.G1 AS g0 ORDER BY g0.e2

            SELECT g0.e1, g0.e2, g0.e3, X__1.expr1 AS expr4
            FROM PM1.G1 AS g0 LEFT OUTER JOIN
            (SELECT ARRAY_AGG((g1.e1, g1.e2) ORDER BY g1.e1) AS expr1, g1.e2 FROM PM1.G2 AS g1 GROUP BY g1.e2) AS X_1 ON g0.e2 = X_1.e2
            ORDER BY g0.e2

            We need to get to:

            SELECT g0.e1, g0.e2, g0.e3, ARRAY_AGG((X_1.e1, X1.e2) ORDER BY X1.e1) ) FILTER (WHERE X_1.e1 IS NOT NULL) AS expr4
            FROM PM1.G1 AS g0 LEFT OUTER JOIN
            (SELECT g1.e1, g1.e2 FROM PM1.G2 AS g1) AS X_1 ON g0.e2 = X_1.e2
            GROUP BY g0.e1, g0.e2, g0.e3 ORDER BY g0.e2

            Note that all projected g0 columns must be sortable - this is not always the case. It also requires attaching a filter to aggregates that can be null dependent when there is an outer join. And finally it can't have any intervening usage of aggregate values or even complex projection between the logical subquery grouping root and it join parent - this will generally be the case for expand, but can't generally be assumed.

            A simpler approach would either be to either already aggregate the other top level columns: select g0.e2, array_agg(g0.e1 ...)
            Or not use the nested aggregation and process the expand like the older style - but that only makes sense for a single expand.

            Steven Hawkins added a comment - This seems a little involved for the problem that it is solving. With an incoming query such as: SELECT g0.e1, g0.e2, g0.e3, /*+ MJ */ (SELECT ARRAY_AGG((g1.e1, g1.e2) ORDER BY g1.e1) FROM PM1.G2 AS g1 WHERE g0.e2 = g1.e2) FROM PM1.G1 AS g0 ORDER BY g0.e2 SELECT g0.e1, g0.e2, g0.e3, X__1.expr1 AS expr4 FROM PM1.G1 AS g0 LEFT OUTER JOIN (SELECT ARRAY_AGG((g1.e1, g1.e2) ORDER BY g1.e1) AS expr1, g1.e2 FROM PM1.G2 AS g1 GROUP BY g1.e2) AS X_ 1 ON g0.e2 = X _1.e2 ORDER BY g0.e2 We need to get to: SELECT g0.e1, g0.e2, g0.e3, ARRAY_AGG((X_ 1.e1, X 1.e2) ORDER BY X 1.e1) ) FILTER (WHERE X _1.e1 IS NOT NULL) AS expr4 FROM PM1.G1 AS g0 LEFT OUTER JOIN (SELECT g1.e1, g1.e2 FROM PM1.G2 AS g1) AS X_ 1 ON g0.e2 = X _1.e2 GROUP BY g0.e1, g0.e2, g0.e3 ORDER BY g0.e2 Note that all projected g0 columns must be sortable - this is not always the case. It also requires attaching a filter to aggregates that can be null dependent when there is an outer join. And finally it can't have any intervening usage of aggregate values or even complex projection between the logical subquery grouping root and it join parent - this will generally be the case for expand, but can't generally be assumed. A simpler approach would either be to either already aggregate the other top level columns: select g0.e2, array_agg(g0.e1 ...) Or not use the nested aggregation and process the expand like the older style - but that only makes sense for a single expand.

            With cardinalities set this issue is somewhat mitigated as the join will be planned as a dependent join. The rationale for raising the aggregation is if the cardinalties aren't set and the join can be pushed. The grouping and order by operations can then be combined.

            Steven Hawkins added a comment - With cardinalities set this issue is somewhat mitigated as the join will be planned as a dependent join. The rationale for raising the aggregation is if the cardinalties aren't set and the join can be pushed. The grouping and order by operations can then be combined.

              rhn-engineering-shawkins Steven Hawkins
              rhn-engineering-shawkins Steven Hawkins
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: