subquery unnesting duplicate row bug #858

skyzh · 2024-12-01T06:04:57Z

> create table t1(v1 int, v2 int);
created
in 0.028s
> create table t2(v3 int, v4 int);
created
in 0.009s
> insert into t1 values (1, 100), (2, 200), (3, 300), (3, 300);
2024-12-01T06:02:14.883496Z  INFO executor: risinglight::storage::secondary::transaction: RowSet #0 flushed, DV  flushed id=14 name="insert"
4 rows inserted
in 0.047s
> insert into t2 values (2, 200), (3, 300);
2024-12-01T06:02:29.516292Z  INFO executor: risinglight::storage::secondary::transaction: RowSet #1 flushed, DV  flushed id=11 name="insert"
2 rows inserted
in 0.044s
> select * from t1 where (select sum(v4) from t2 where v3 = v1) > 100;
+---+-----+
| 2 | 200 |
| 3 | 300 |
+---+-----+
in 0.038s

The result should be

2 200
3 300
3 300

which means that rule 8/9 doesn't seem correct.

risinglight/src/planner/rules/plan.rs

Lines 260 to 277 in b11a14f

    
           // Figure 4 Rule (8) 
        
           rw!("pushdown-apply-group-agg"; 
        
               "(apply inner ?left (hashagg ?keys ?aggs ?right))" => 
        
               // ?new_keys = ?left || ?keys 
        
               { extract_key("(hashagg ?new_keys ?aggs (apply inner ?left ?right))") } 
        
               // FIXME: this rule is correct only if 
        
               // 1. all aggregate functions satisfy: agg({}) = agg({null}) 
        
               // 2. the left table has a key 
        
           ), 
        
           // Figure 4 Rule (9) 
        
           rw!("pushdown-apply-scalar-agg"; 
        
               "(apply inner ?left (agg ?aggs ?right))" => 
        
               // ?new_keys = ?left 
        
               { extract_key("(hashagg ?new_keys ?aggs (apply left_outer ?left ?right))") } 
        
               // FIXME: this rule is correct only if 
        
               // 1. all aggregate functions satisfy: agg({}) = agg({null}) 
        
               // 2. the left table has a key 
        
           ),

The text was updated successfully, but these errors were encountered:

skyzh · 2024-12-01T06:07:34Z

idea: passthrough storage row_id to the executors? (this is hard, remember the risingwave primary key things...)

and does passing row_id fully solve the problem? probably no, thinking about if we have something that repeats a row...

skyzh · 2024-12-01T06:11:07Z

or, we need another join to join the result set back with the left table so that dup rows get dup result rows (using the Hyper unnesting algorithm)

https://github.com/cmu-db/optd/blob/2dd2a318fa78bacf7c9613ac174896510aecfa1f/optd-sqlplannertest/tests/subqueries/subquery_unnesting.planner.sql#L56-L75

wangrunji0408 · 2024-12-02T17:36:56Z

Maybe we can use the over window function row_number() to append a unique key column to the left node. But this only works with 1 partition. Considering multiple partitions, perhaps we need a dedicated operator to generate the column. 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subquery unnesting duplicate row bug #858

subquery unnesting duplicate row bug #858

skyzh commented Dec 1, 2024

skyzh commented Dec 1, 2024

skyzh commented Dec 1, 2024

wangrunji0408 commented Dec 2, 2024 •

edited

Loading

subquery unnesting duplicate row bug #858

subquery unnesting duplicate row bug #858

Comments

skyzh commented Dec 1, 2024

skyzh commented Dec 1, 2024

skyzh commented Dec 1, 2024

wangrunji0408 commented Dec 2, 2024 • edited Loading

wangrunji0408 commented Dec 2, 2024 •

edited

Loading