-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix grouping key reordering during spilling #12395
Conversation
This pull request was exported from Phabricator. Differential Revision: D69860326 |
✅ Deploy Preview for meta-velox canceled.
|
9306ed1
to
d8d8839
Compare
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
This pull request was exported from Phabricator. Differential Revision: D69860326 |
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
d8d8839
to
d7ded2a
Compare
This pull request was exported from Phabricator. Differential Revision: D69860326 |
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
d7ded2a
to
a5bdf09
Compare
This pull request was exported from Phabricator. Differential Revision: D69860326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zation99 good catch % minors. Thanks!
velox/exec/GroupingSet.cpp
Outdated
@@ -1215,6 +1259,8 @@ bool GroupingSet::mergeNextWithoutAggregates( | |||
// less than 'numDistinctSpillFilesPerPartition_'. | |||
bool newDistinct{true}; | |||
int32_t numOutputRows{0}; | |||
prepareSpillDataLoad(maxOutputRows, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/prepareSpillDataLoad/prepareSpillResultWithoutAggregates/
velox/exec/GroupingSet.cpp
Outdated
@@ -1239,13 +1285,14 @@ bool GroupingSet::mergeNextWithoutAggregates( | |||
} | |||
if (newDistinct) { | |||
// Yield result for new distinct. | |||
result->copy( | |||
spillDataLoader_->copy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/spillDataLoader_/spillResultWitoutAggregates_/
velox/exec/GroupingSet.cpp
Outdated
@@ -1239,13 +1285,14 @@ bool GroupingSet::mergeNextWithoutAggregates( | |||
} | |||
if (newDistinct) { | |||
// Yield result for new distinct. | |||
result->copy( | |||
spillDataLoader_->copy( | |||
&stream->current(), numOutputRows++, stream->currentIndex(), 1); | |||
} | |||
stream->pop(); | |||
newDistinct = true; | |||
} | |||
result->resize(numOutputRows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spillResultWitoutAggregates_->resize(numOutputRows);
restoreResult(result);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong but do we need to maintain spillResultWitoutAggregates_, since it won't be used until the next merge where we call prepareForReuse
anyway?
velox/exec/GroupingSet.cpp
Outdated
void GroupingSet::prepareSpillDataLoad( | ||
int32_t maxOutputRows, | ||
const RowVectorPtr& result) { | ||
if (!spillConfig_->prefixSortEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this unconditionally
velox/exec/GroupingSet.cpp
Outdated
|
||
unsigned int size = result->type()->size(); | ||
if (spillDataLoader_ == nullptr) { | ||
std::vector<std::string> names(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shall set the row type once in GroupingSet::createHashTable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated types construction by directly using table_->rows()->keyTypes()
.
a5bdf09
to
0bb230f
Compare
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
This pull request was exported from Phabricator. Differential Revision: D69860326 |
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spill data loader (`spillResultWitoutAggregates_`) which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
0bb230f
to
33b21b9
Compare
This pull request was exported from Phabricator. Differential Revision: D69860326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zation99 thanks for the update % minors
velox/exec/GroupingSet.h
Outdated
// In case of grouping key reordering, spilled data is first loaded into | ||
// 'spillResultWitoutAggregates_', which is then reordered back and load to | ||
// result. | ||
RowVectorPtr spillResultWitoutAggregates_{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: s/spillResultWitoutAggregates_/spillResultWithoutAggregates_/
velox/exec/GroupingSet.h
Outdated
@@ -336,6 +345,11 @@ class GroupingSet { | |||
// First row in remainingInput_ that needs to be processed. | |||
vector_size_t firstRemainingRow_; | |||
|
|||
// In case of grouping key reordering, spilled data is first loaded into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// In case of distinct aggregation without aggregates and the grouping key reordered, the spilled data ..., and is then projected for output.
velox/exec/GroupingSet.h
Outdated
|
||
// If prefixsort is enabled, loads the read data from spillDataLoader_ into | ||
// result. | ||
void restoreResult(const RowVectorPtr& result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void projectResult(const RowVectorPtr& result);
velox/exec/GroupingSet.h
Outdated
@@ -189,6 +189,15 @@ class GroupingSet { | |||
// index for this aggregation), otherwise it returns reference to activeRows_. | |||
const SelectivityVector& getSelectivityVector(size_t aggregateIndex) const; | |||
|
|||
// Prepare spillDataLoader_ for loading spilled data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comments as we don't use spillDataLoader_?
velox/exec/GroupingSet.cpp
Outdated
void GroupingSet::prepareSpillResultWithoutAggregates( | ||
int32_t maxOutputRows, | ||
const RowVectorPtr& result) { | ||
unsigned int size = result->type()->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto numColumns = result->type()->size();
velox/exec/GroupingSet.cpp
Outdated
std::vector<std::string> names(size); | ||
std::vector<TypePtr> types{table_->rows()->keyTypes()}; | ||
|
||
for (auto i = 0; i < size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto& resultType = dynamic_cast(result->type());
and use in the loop
velox/exec/GroupingSet.cpp
Outdated
maxOutputRows, | ||
&pool_); | ||
} else { | ||
VectorPtr spillDataLoader = std::move(spillResultWitoutAggregates_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/spillDataLoader/spillResultWitoutAggregates/
velox/exec/GroupingSet.cpp
Outdated
&stream->current(), numOutputRows++, stream->currentIndex(), 1); | ||
} | ||
stream->pop(); | ||
newDistinct = true; | ||
} | ||
restoreResult(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do
spillResultWitoutAggregates_->resize(numOutputRows);
projectResult(result); which resize result based on spillResultWitoutAggregates_ size internally?
Thanks!
…2395) Summary: When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720. However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures. This PR fixes by adding a spill data loader (`spillResultWitoutAggregates_`) which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading. Differential Revision: D69860326
33b21b9
to
d4725b8
Compare
This pull request was exported from Phabricator. Differential Revision: D69860326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zation99 LGTM % nit. Thanks!
This pull request has been merged in 4adec18. |
Summary:
When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in #11720.
However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.
This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.
Differential Revision: D69860326