Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subquery unnesting duplicate row bug #858

Open
skyzh opened this issue Dec 1, 2024 · 3 comments
Open

subquery unnesting duplicate row bug #858

skyzh opened this issue Dec 1, 2024 · 3 comments

Comments

@skyzh
Copy link
Member

skyzh commented Dec 1, 2024

> create table t1(v1 int, v2 int);
created
in 0.028s
> create table t2(v3 int, v4 int);
created
in 0.009s
> insert into t1 values (1, 100), (2, 200), (3, 300), (3, 300);
2024-12-01T06:02:14.883496Z  INFO executor: risinglight::storage::secondary::transaction: RowSet #0 flushed, DV  flushed id=14 name="insert"
4 rows inserted
in 0.047s
> insert into t2 values (2, 200), (3, 300);
2024-12-01T06:02:29.516292Z  INFO executor: risinglight::storage::secondary::transaction: RowSet #1 flushed, DV  flushed id=11 name="insert"
2 rows inserted
in 0.044s
> select * from t1 where (select sum(v4) from t2 where v3 = v1) > 100;
+---+-----+
| 2 | 200 |
| 3 | 300 |
+---+-----+
in 0.038s

The result should be

2 200
3 300
3 300

which means that rule 8/9 doesn't seem correct.

// Figure 4 Rule (8)
rw!("pushdown-apply-group-agg";
"(apply inner ?left (hashagg ?keys ?aggs ?right))" =>
// ?new_keys = ?left || ?keys
{ extract_key("(hashagg ?new_keys ?aggs (apply inner ?left ?right))") }
// FIXME: this rule is correct only if
// 1. all aggregate functions satisfy: agg({}) = agg({null})
// 2. the left table has a key
),
// Figure 4 Rule (9)
rw!("pushdown-apply-scalar-agg";
"(apply inner ?left (agg ?aggs ?right))" =>
// ?new_keys = ?left
{ extract_key("(hashagg ?new_keys ?aggs (apply left_outer ?left ?right))") }
// FIXME: this rule is correct only if
// 1. all aggregate functions satisfy: agg({}) = agg({null})
// 2. the left table has a key
),

@skyzh
Copy link
Member Author

skyzh commented Dec 1, 2024

idea: passthrough storage row_id to the executors? (this is hard, remember the risingwave primary key things...)

and does passing row_id fully solve the problem? probably no, thinking about if we have something that repeats a row...

@skyzh
Copy link
Member Author

skyzh commented Dec 1, 2024

or, we need another join to join the result set back with the left table so that dup rows get dup result rows (using the Hyper unnesting algorithm)

https://github.com/cmu-db/optd/blob/2dd2a318fa78bacf7c9613ac174896510aecfa1f/optd-sqlplannertest/tests/subqueries/subquery_unnesting.planner.sql#L56-L75

@wangrunji0408
Copy link
Member

wangrunji0408 commented Dec 2, 2024

Maybe we can use the over window function row_number() to append a unique key column to the left node. But this only works with 1 partition. Considering multiple partitions, perhaps we need a dedicated operator to generate the column. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants