Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_concat([list<T>, list<T>]) gives list<T>, not list<list<T>> #17294

Closed
2 tasks done
NickCrews opened this issue Jun 29, 2024 · 10 comments
Closed
2 tasks done

list_concat([list<T>, list<T>]) gives list<T>, not list<list<T>> #17294

NickCrews opened this issue Jun 29, 2024 · 10 comments
Labels
invalid A bug report that is not actually a bug python Related to Python Polars

Comments

@NickCrews
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

pl.select(pl.lit([1]))
# Gives [1], of type list[i64], as expected

pl.select(pl.concat_list([pl.lit([1]), pl.lit([2])]))
# Gives [1, 2], of type list[i64]
# I would expect it to be [[1], [2]], of type list[list[i64]]

Log output

<nothing came out>

Issue description

see above

Expected behavior

see above

Installed versions

--------Version info---------
Polars:               1.0.0-rc.2
Index type:           UInt32
Platform:             macOS-14.5-arm64-arm-64bit
Python:               3.11.6 (main, Oct 23 2023, 09:04:08) [Clang 14.0.0 (clang-1400.0.29.202)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@NickCrews NickCrews added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 29, 2024
@ritchie46
Copy link
Member

It is a concat list operation. It concats the lists. If you want to create an extra level of nesting you need to implode() the arguments.

@ritchie46 ritchie46 added invalid A bug report that is not actually a bug and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Jun 30, 2024
@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 30, 2024

@ritchie46 I think this is a valid issue. Why do the following two produce the same result?

pl.DataFrame({"a": [1], "b": [2]}).select(
    pl.concat_list("a", "b")
)
# shape: (1, 1)
# ┌───────────┐
# │ a         │
# │ ---       │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2]    │
# └───────────┘

pl.DataFrame({"a": [[1]], "b": [[2]]}).select(  # note the [[1]]
    pl.concat_list("a", "b")
)
# shape: (1, 1)
# ┌───────────┐
# │ a         │
# │ ---       │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2]    │
# └───────────┘

The documentation says "Horizontally concatenate columns into a single list column." If the columns are lists, then they should form a list of lists.

Edit: I just noticed there is more to the documentation in 1.0 which isn't on the current 0.2 documentation, that clarifies.

@mcrumiller
Copy link
Contributor

FYI I feel that pl.list would be a better name if the intended behavior is what's written in the description. I agree that the word "concat" implies that you take existing lists and concatenate them together, but I believe the intent is to take existing columns and concanate them into a list, in which case the inner dtype of the list should be common supertype of the columns.

@ritchie46
Copy link
Member

The initial implosion is because there isn't any list yet. So in order to concat them, they are imploded.

I think we should improve the description here.

@mcrumiller
Copy link
Contributor

@ritchie46 the description's improved in the 1.0 docs, I had missed that.

@NickCrews
Copy link
Author

@mcrumiller pinpoints my exact confusion. I agree there should be two functions, one that is a constructor as in
Iterable[T] -> List[T] , and another which concatenates Iterable[List[T]] -> List[T]

@NickCrews
Copy link
Author

I'm looking for the constructor. What could I use for that currently?

@mcrumiller
Copy link
Contributor

Hi @NickCrews, can you be a little more specific about what you want to do?

@NickCrews
Copy link
Author

Yes, but I'll get to it in an hour or two once I can get back to my computer

@NickCrews
Copy link
Author

Actually, I think we should move all discussion to #8510.

Specifically, I responded in this comment in that issue.

gforsyth pushed a commit to ibis-project/ibis that referenced this issue Jul 1, 2024
Workaround for pola-rs/polars#17294

pl.concat_list(Iterable[T]) results in pl.List[T], EXCEPT when T is a
pl.List, in which case pl.concat_list(Iterable[pl.List[T]]) results in
pl.List[T].
If polars ever supports a more consistent array constructor,
we should switch to that.

Found this when working on
#9473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid A bug report that is not actually a bug python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants