Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PullUpCorrelatedPredicateAggRule is wrong for non-null-propagating expression #19835

Closed
maingoh opened this issue Dec 17, 2024 · 8 comments · Fixed by #20012
Closed

PullUpCorrelatedPredicateAggRule is wrong for non-null-propagating expression #19835

maingoh opened this issue Dec 17, 2024 · 8 comments · Fixed by #20012
Labels
type/bug Something isn't working
Milestone

Comments

@maingoh
Copy link

maingoh commented Dec 17, 2024

Describe the bug

I couldn't create a sample as I don't know how to how to create an empty array of rows. But a query like below return a null map when the subquery return no rows. I expected it to return an empty map instead.

Also, couldn't we simplify the syntax by removing the explicit array() ?

Error message/log

No response

To Reproduce

select map_from_entries(array(select row(foo.id, foo.name) from foo where foo.id=unrelated.id))

Expected behavior

Return an empty map instead of null

How did you deploy RisingWave?

No response

The version of RisingWave

version | PostgreSQL 13.14.0-RisingWave-2.0.4 (2d75798)

Additional context

No response

@maingoh maingoh added the type/bug Something isn't working label Dec 17, 2024
@maingoh maingoh changed the title map_from_entries: empty subquery returns null map_from_entries: empty subquery returns null map Dec 17, 2024
@github-actions github-actions bot added this to the release-2.2 milestone Dec 17, 2024
@xxchan
Copy link
Member

xxchan commented Dec 26, 2024

It seems to return empty map, instead of null:

dev=> select * from t;
┌───┐
│ x │
├───┤
└───┘
(0 rows)

dev=> select map_from_entries( array( select row(x,x) from t ));
┌──────────────────┐
│ map_from_entries │
├──────────────────┤
│ {}               │
└──────────────────┘
(1 row)

Could you provide a reproduciable example?

@maingoh
Copy link
Author

maingoh commented Jan 2, 2025

Sorry I wasn't really sure of why I had null values sometimes, but I could reproduce now. I have 3 tables: foo, bar, baz and I am doing the following query:

CREATE TABLE foo (
       id int
);

CREATE TABLE bar (
       id int,
       foo_id int,
       baz_id int
);

CREATE TABLE baz (
       id int
);

INSERT INTO foo (id) VALUES (1);
INSERT INTO bar (id, foo_id, baz_id) VALUES (1, 1, 123);
INSERT INTO bar (id, foo_id, baz_id) VALUES (2, 1, 123);

foo has one entry. bar has one or multiple records linked to foo and baz has no record. Since baz a no record, the map should be empty.

select
    map_from_entries(array(
        select
             row('foo', 'bar') -- could be anything here
        from bar
        inner join baz
            on baz.id
            = bar.baz_id
        where bar.foo_id = foo.id
    )) as my_map
from foo
where foo.id = 1
;
-[ RECORD 1 ]--
my_map | <null>

The fact that I have one condition in the join and one in the where seem to create a null map
If instead I do inner join baz on baz.id = bar.baz_id AND bar.foo_id = foo.id (both condition in the ON close), I get the empty map.

select
    map_from_entries(array(
        select
             row('foo', 'bar') -- could be anything here
        from bar
        inner join baz
            on baz.id
            = bar.baz_id
        and bar.foo_id = foo.id
    )) as my_map
from foo
where foo.id = 1
;
-[ RECORD 1 ]
my_map | {}

In my opinion both expressions are equivalent no ? It actually generates different query plans whether the condition is in the where clause or in the on close.

@maingoh
Copy link
Author

maingoh commented Jan 2, 2025

where clause:

-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | BatchExchange { order: [], dist: Single }
-[ RECORD 2 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | └─BatchProject { exprs: [MapFromEntries($expr1) as $expr2] }
-[ RECORD 3 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   └─BatchHashJoin { type: LeftOuter, predicate: foo.id = bar.foo_id }
-[ RECORD 4 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     ├─BatchExchange { order: [], dist: HashShard(foo.id) }
-[ RECORD 5 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     │ └─BatchFilter { predicate: (foo.id = 1:Int32) }
-[ RECORD 6 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     │   └─BatchScan { table: foo, columns: [id] }
-[ RECORD 7 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     └─BatchProject { exprs: [Coalesce(array_agg('(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] })), ARRAY[]:List(Struct(StructType { field_names: [], field_types: [Varchar, Varchar] }))) as $expr1, bar.foo_id] }
-[ RECORD 8 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |       └─BatchHashAgg { group_key: [bar.foo_id], aggs: [array_agg('(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] }))] }
-[ RECORD 9 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         └─BatchExchange { order: [], dist: HashShard(bar.foo_id) }
-[ RECORD 10 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           └─BatchProject { exprs: ['(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] }), bar.foo_id] }
-[ RECORD 11 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |             └─BatchHashJoin { type: Inner, predicate: bar.baz_id = baz.id }
-[ RECORD 12 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ├─BatchExchange { order: [], dist: HashShard(bar.baz_id) }
-[ RECORD 13 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               │ └─BatchScan { table: bar, columns: [foo_id, baz_id] }
-[ RECORD 14 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               └─BatchExchange { order: [], dist: HashShard(baz.id) }
-[ RECORD 15 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                 └─BatchScan { table: baz, columns: [id] }

on clause:

QUERY PLAN | BatchExchange { order: [], dist: Single }
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | └─BatchProject { exprs: [MapFromEntries($expr1) as $expr2] }
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   └─BatchHashJoin { type: LeftOuter, predicate: foo.id IS NOT DISTINCT FROM foo.id }
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     ├─BatchExchange { order: [], dist: HashShard(foo.id) }
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     │ └─BatchFilter { predicate: (foo.id = 1:Int32) }
-[ RECORD 6 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     │   └─BatchScan { table: foo, columns: [id] }
-[ RECORD 7 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |     └─BatchProject { exprs: [foo.id, Coalesce(array_agg('(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] })) filter(IsNotNull(1:Int32)), ARRAY[]:List(Struct(StructType { field_names: [], field_types: [Varchar, Varchar] }))) as $expr1] }
-[ RECORD 8 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |       └─BatchHashAgg { group_key: [foo.id], aggs: [array_agg('(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] })) filter(IsNotNull(1:Int32))] }
-[ RECORD 9 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         └─BatchHashJoin { type: LeftOuter, predicate: foo.id IS NOT DISTINCT FROM bar.foo_id }
-[ RECORD 10 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           ├─BatchHashAgg { group_key: [foo.id], aggs: [] }
-[ RECORD 11 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           │ └─BatchExchange { order: [], dist: HashShard(foo.id) }
-[ RECORD 12 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           │   └─BatchFilter { predicate: (foo.id = 1:Int32) }
-[ RECORD 13 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           │     └─BatchScan { table: foo, columns: [id] }
-[ RECORD 14 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |           └─BatchExchange { order: [], dist: HashShard(bar.foo_id) }
-[ RECORD 15 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |             └─BatchProject { exprs: [bar.foo_id, '(foo,bar)':Struct(StructType { field_names: [], field_types: [Varchar, Varchar] }), 1:Int32] }
-[ RECORD 16 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               └─BatchHashJoin { type: Inner, predicate: bar.baz_id = baz.id }
-[ RECORD 17 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                 ├─BatchExchange { order: [], dist: HashShard(bar.baz_id) }
-[ RECORD 18 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                 │ └─BatchFilter { predicate: IsNotNull(bar.foo_id) }
-[ RECORD 19 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                 │   └─BatchScan { table: bar, columns: [foo_id, baz_id] }
-[ RECORD 20 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                 └─BatchExchange { order: [], dist: HashShard(baz.id) }
-[ RECORD 21 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                   └─BatchScan { table: baz, columns: [id] }

We can see some IS NOT DISTINCT FROM or IsNotNull predicates in the second plan that are not present in the first.

@xxchan
Copy link
Member

xxchan commented Jan 3, 2025

So the problem is not in map_from_entries, but array subquery:

CREATE TABLE foo (
       id int
);

CREATE TABLE bar (
       id int,
       foo_id int
);

insert into foo values (1),(2);

-- RW returns 2 null rows, but PG returns 2 empty arrays
select
    array(
        select
             1
        from bar
        where bar.foo_id = foo.id
    )
from foo;

PullUpCorrelatedPredicateAggRule is problematic in this case

│ Begin:                                                                                                  │
│                                                                                                         │
│ LogicalProject { exprs: [$expr1] }                                                                      │
│ └─LogicalApply { type: LeftOuter, on: true, correlated_id: 1, max_one_row: true }                       │
│   ├─LogicalScan { table: foo, columns: [id, _row_id] }                                                  │
│   └─LogicalProject { exprs: [Coalesce(array_agg(1:Int32), ARRAY[]:List(Int32)) as $expr1] }             │
│     └─LogicalAgg { aggs: [array_agg(1:Int32)] }                                                         │
│       └─LogicalProject { exprs: [1:Int32] }                                                             │
│         └─LogicalFilter { predicate: (bar.foo_id = CorrelatedInputRef { index: 0, correlated_id: 1 }) } │
│           └─LogicalScan { table: bar, columns: [id, foo_id, baz_id, _row_id] }                          │
│                                                                                                         │
│ Simple Unnesting:                                                                                       │
│                                                                                                         │
│ apply PullUpCorrelatedPredicateAggRule 1 time(s)                                                        │
│ apply MaxOneRowEliminateRule 1 time(s)                                                                  │
│                                                                                                         │
│ LogicalProject { exprs: [$expr1] }                                                                      │
│ └─LogicalJoin { type: LeftOuter, on: (foo.id = bar.foo_id) }                                            │
│   ├─LogicalScan { table: foo, columns: [id, _row_id] }                                                  │
│   └─LogicalProject { exprs: [Coalesce(array_agg(1:Int32), ARRAY[]:List(Int32)) as $expr1, bar.foo_id] } │
│     └─LogicalAgg { group_key: [bar.foo_id], aggs: [array_agg(1:Int32)] }                                │
│       └─LogicalProject { exprs: [1:Int32, bar.foo_id] }                                                 │
│         └─LogicalScan { table: bar, columns: [id, foo_id, baz_id, _row_id] }                            │
│           

@xxchan
Copy link
Member

xxchan commented Jan 3, 2025

PullUpCorrelatedPredicateAggRule has a special treatment for count (non-null output for empty input), array subquery is actually similar here.

@xxchan
Copy link
Member

xxchan commented Jan 3, 2025

Hmmm, another case:

 select foo.id, (select coalesce( max(id), 114514) from bar where bar.foo_id = foo.id) from foo;

-- got
┌────┬──────────┐
│ id │ ?column? │
├────┼──────────┤
│  1 │        ∅ │
│  2 │        ∅ │
└────┴──────────┘
-- expected
+----+----------+
| id | coalesce |
|----+----------|
| 1  | 114514   |
| 2  | 114514   |
+----+----------+

@xiangjinwu
Copy link
Contributor

PullUpCorrelatedPredicateAggRule has a special treatment for count (non-null output for empty input), array subquery is actually similar here.

Sounds similar to #15590?

@xxchan xxchan changed the title map_from_entries: empty subquery returns null map PullUpCorrelatedPredicateAggRule is wrong for non-null-propagating expression Jan 3, 2025
@maingoh
Copy link
Author

maingoh commented Jan 3, 2025

Wow I didn't expect a resolution as fast, thank you @xxchan !

So the problem is not in map_from_entries, but array subquery

Indeed, sorry as the map feature was quite new I thought it was likely the cause. I have a small question though, as the function is called map_from_entries, I expected that we could use it without having to wrap the subquery with array() but it doesn't seem possible:

select
    map_from_entries(
        select row(1, 1)
        from bar
        where bar.foo_id = foo.id
    )
from foo;

Or wrapped by parenthesis:

select
    map_from_entries((
        select row(1, 1)
        from bar
        where bar.foo_id = foo.id
    ))
from foo;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
3 participants