Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix grouping key reordering during spilling #12395

Closed
wants to merge 1 commit into from

Conversation

zation99
Copy link
Contributor

Summary:
When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in #11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

Copy link

netlify bot commented Feb 20, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d4725b8
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67b78f68ef61d400084e66ed

zation99 added a commit to zation99/velox that referenced this pull request Feb 20, 2025
…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

zation99 added a commit to zation99/velox that referenced this pull request Feb 20, 2025
…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

zation99 added a commit to zation99/velox that referenced this pull request Feb 20, 2025
…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zation99 good catch % minors. Thanks!

@@ -1215,6 +1259,8 @@ bool GroupingSet::mergeNextWithoutAggregates(
// less than 'numDistinctSpillFilesPerPartition_'.
bool newDistinct{true};
int32_t numOutputRows{0};
prepareSpillDataLoad(maxOutputRows, result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/prepareSpillDataLoad/prepareSpillResultWithoutAggregates/

@@ -1239,13 +1285,14 @@ bool GroupingSet::mergeNextWithoutAggregates(
}
if (newDistinct) {
// Yield result for new distinct.
result->copy(
spillDataLoader_->copy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/spillDataLoader_/spillResultWitoutAggregates_/

@@ -1239,13 +1285,14 @@ bool GroupingSet::mergeNextWithoutAggregates(
}
if (newDistinct) {
// Yield result for new distinct.
result->copy(
spillDataLoader_->copy(
&stream->current(), numOutputRows++, stream->currentIndex(), 1);
}
stream->pop();
newDistinct = true;
}
result->resize(numOutputRows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spillResultWitoutAggregates_->resize(numOutputRows);
restoreResult(result);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but do we need to maintain spillResultWitoutAggregates_, since it won't be used until the next merge where we call prepareForReuse anyway?

void GroupingSet::prepareSpillDataLoad(
int32_t maxOutputRows,
const RowVectorPtr& result) {
if (!spillConfig_->prefixSortEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this unconditionally


unsigned int size = result->type()->size();
if (spillDataLoader_ == nullptr) {
std::vector<std::string> names(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shall set the row type once in GroupingSet::createHashTable()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated types construction by directly using table_->rows()->keyTypes().

zation99 added a commit to zation99/velox that referenced this pull request Feb 20, 2025
…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spillDataLoader which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

zation99 added a commit to zation99/velox that referenced this pull request Feb 20, 2025
…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spill data loader (`spillResultWitoutAggregates_`) which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zation99 thanks for the update % minors

// In case of grouping key reordering, spilled data is first loaded into
// 'spillResultWitoutAggregates_', which is then reordered back and load to
// result.
RowVectorPtr spillResultWitoutAggregates_{nullptr};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/spillResultWitoutAggregates_/spillResultWithoutAggregates_/

@@ -336,6 +345,11 @@ class GroupingSet {
// First row in remainingInput_ that needs to be processed.
vector_size_t firstRemainingRow_;

// In case of grouping key reordering, spilled data is first loaded into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// In case of distinct aggregation without aggregates and the grouping key reordered, the spilled data ..., and is then projected for output.


// If prefixsort is enabled, loads the read data from spillDataLoader_ into
// result.
void restoreResult(const RowVectorPtr& result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void projectResult(const RowVectorPtr& result);

@@ -189,6 +189,15 @@ class GroupingSet {
// index for this aggregation), otherwise it returns reference to activeRows_.
const SelectivityVector& getSelectivityVector(size_t aggregateIndex) const;

// Prepare spillDataLoader_ for loading spilled data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the comments as we don't use spillDataLoader_?

void GroupingSet::prepareSpillResultWithoutAggregates(
int32_t maxOutputRows,
const RowVectorPtr& result) {
unsigned int size = result->type()->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto numColumns = result->type()->size();

std::vector<std::string> names(size);
std::vector<TypePtr> types{table_->rows()->keyTypes()};

for (auto i = 0; i < size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto& resultType = dynamic_cast(result->type());

and use in the loop

maxOutputRows,
&pool_);
} else {
VectorPtr spillDataLoader = std::move(spillResultWitoutAggregates_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/spillDataLoader/spillResultWitoutAggregates/

&stream->current(), numOutputRows++, stream->currentIndex(), 1);
}
stream->pop();
newDistinct = true;
}
restoreResult(result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do

spillResultWitoutAggregates_->resize(numOutputRows);
projectResult(result); which resize result based on spillResultWitoutAggregates_ size internally?

Thanks!

…2395)

Summary:

When prefix sort is enabled, we sort the grouping keys to maximize the prefixsort benefit as introduced in facebookincubator#11720.

However, when spilling happens and when reading the spilled data, there are key order mismatch between the spilled data and the operator output. It can cause segmentation fault when there is RowType mismatch or other type mismatch failures.

This PR fixes by adding a spill data loader (`spillResultWitoutAggregates_`) which has the reordered grouping keys, loading the spilled data, and mapping the keys back to result after loading.

Differential Revision: D69860326
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69860326

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zation99 LGTM % nit. Thanks!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 4adec18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants