HIVE-28675: Maximize the removal of redundant columns from GROUP BY clauses #5586

zabetak · 2024-12-20T22:26:48Z

What changes were proposed in this pull request?

Enhance HiveRelFieldTrimmer to remove the maximum number of redundant columns from the GROUP BY clause.

Why are the changes needed?

Generate more efficient plans by pruning as many columns as possible (less CPU/IO/network cost).
Avoid missing optimization opportunities by examining all candidates.

For more see HIVE-28675.

Does this PR introduce any user-facing change?

More efficient query plans.

Is the change a dependency upgrade?

No

How was this patch tested?

mvn test -Dtest=TestMiniLlapLocalCliDriver.java -Dqfile="cbo_groupby_remove_key.q"

…lauses

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java

sonarqubecloud · 2025-01-03T14:00:54Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

soumyakanti3578

LGTM!

ramesh0201 · 2025-01-03T20:38:39Z

LGTM +1. Just left a minor comment for my understanding, please feel free to merge if this question is irrelevant. :)

ramesh0201 · 2025-01-03T20:40:43Z

ql/src/test/queries/clientpositive/cbo_groupby_remove_key.q

+EXPLAIN CBO SELECT passport, COUNT(1) FROM passenger GROUP BY id, fname, lname, passport;
+EXPLAIN CBO SELECT fname, COUNT(1) FROM passenger GROUP BY id, fname, lname, passport;
+EXPLAIN CBO SELECT lname, COUNT(1) FROM passenger GROUP BY id, fname, lname, passport;
+EXPLAIN CBO SELECT fname, lname, COUNT(1) FROM passenger GROUP BY id, fname, lname, passport;


In this case, is having group by fname,lname always a better plan? -- even if there is a different aggregate function?

If we change the aggregate function the query changes as well.

If we assume that the set of redundant columns remains the same after changing the aggregate function (e.g., MAX(fname)) then it's always a good idea to drop unnecessary processing so grouping on fname, lname is the best option.

If the set of redundant columns changes (e.g., MAX(passport)) then this optimization will not trigger in the same way (passport column cannot be dropped).

HIVE-28675: Maximize the removal of redundant columns from GROUP BY c…

0859de2

…lauses

asf-ci-hive added tests pending tests passed and removed tests pending labels Dec 20, 2024

zabetak marked this pull request as ready for review December 31, 2024 16:09

soumyakanti3578 reviewed Jan 2, 2025

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java Outdated Show resolved Hide resolved

HIVE-28675: Compute unused grouping keys outside of the for-loop

e89386d

asf-ci-hive added tests pending and removed tests passed labels Jan 3, 2025

asf-ci-hive added tests passed and removed tests pending labels Jan 3, 2025

soumyakanti3578 approved these changes Jan 3, 2025

View reviewed changes

ramesh0201 reviewed Jan 3, 2025

View reviewed changes

zabetak closed this in b7a3e8b Jan 6, 2025

zabetak deleted the gby_remove branch January 6, 2025 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-28675: Maximize the removal of redundant columns from GROUP BY clauses #5586

HIVE-28675: Maximize the removal of redundant columns from GROUP BY clauses #5586

zabetak commented Dec 20, 2024 •

edited

Loading

sonarqubecloud bot commented Jan 3, 2025

soumyakanti3578 left a comment

ramesh0201 commented Jan 3, 2025 •

edited

Loading

ramesh0201 Jan 3, 2025

zabetak Jan 6, 2025

HIVE-28675: Maximize the removal of redundant columns from GROUP BY clauses #5586

HIVE-28675: Maximize the removal of redundant columns from GROUP BY clauses #5586

Conversation

zabetak commented Dec 20, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

sonarqubecloud bot commented Jan 3, 2025

Quality Gate passed

soumyakanti3578 left a comment

Choose a reason for hiding this comment

ramesh0201 commented Jan 3, 2025 • edited Loading

ramesh0201 Jan 3, 2025

Choose a reason for hiding this comment

zabetak Jan 6, 2025

Choose a reason for hiding this comment

zabetak commented Dec 20, 2024 •

edited

Loading

ramesh0201 commented Jan 3, 2025 •

edited

Loading