Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding method to List datatype #10283

Open
balakhaniyan opened this issue Aug 3, 2023 · 11 comments · May be fixed by #20674
Open

Add padding method to List datatype #10283

balakhaniyan opened this issue Aug 3, 2023 · 11 comments · May be fixed by #20674
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@balakhaniyan
Copy link

Problem description

Is it possible to add a method to list class so we can justify (pad) list just like str?
example:

ser = pl.Series([[1, 2, 3, 4], [2, 3, 4, 5, 6], [3, 2, 1]])
ser.list.pad(5, 0)
# output:
# pl.Series([[0, 1, 2, 3, 4], [2, 3, 4, 5, 6], [0, 0, 3, 2, 1]])

It could have more feature like clipping lists with greater length.

@balakhaniyan balakhaniyan added the enhancement New feature or an improvement of an existing feature label Aug 3, 2023
@orlp orlp added the accepted Ready for implementation label Aug 4, 2023
@github-project-automation github-project-automation bot moved this to Ready in Backlog Aug 4, 2023
@orlp
Copy link
Collaborator

orlp commented Aug 4, 2023

I discussed this with @ritchie46 and we think it's a reasonable request.

@orlp orlp changed the title just list column Add padding method to List datatype Aug 4, 2023
@ion-elgreco
Copy link
Contributor

Problem description

Is it possible to add a method to list class so we can justify (pad) list just like str? example:

ser = pl.Series([[1, 2, 3, 4], [2, 3, 4, 5, 6], [3, 2, 1]])
ser.list.pad(5, 0)
# output:
# pl.Series([[0, 1, 2, 3, 4], [2, 3, 4, 5, 6], [0, 0, 3, 2, 1]])

It could have more feature like clipping lists with greater length.

In the mean time you could do something like this:

df = pl.DataFrame(ser)
(df
 .with_columns(pl.lit(0).alias('pad_value'))
 .with_columns(
     pl.concat_list(
         pl.col('pad_value').repeat_by(5 - pl.col('column_0').list.lengths()), 
         'column_0')
         .alias('column_0'))
 .drop('pad_value'))

@ion-elgreco
Copy link
Contributor

Related issue that was suggest by @ritchie46 (extend_constant): #2810

@ritchie46
Copy link
Member

@reswqa as discussed. Free to pick up. :)

@balakhaniyan
Copy link
Author

@ritchie46 I want to do it myself :) Can I?

@reswqa
Copy link
Collaborator

reswqa commented Dec 18, 2023

@balakhaniyan Of course, feel free to pick this up. ritchie and I will help review the code then.

@AndhikaWB
Copy link

AndhikaWB commented Dec 9, 2024

Would love this feature! I often need it for machine learning purpose

Can't believe string has pad_end and pad_start feature but list doesn't have it yet

I may be able to help, but I'm new to this kind of thing, not sure where to look the _pyexpr code

return wrap_expr(self._pyexpr.str_pad_start(length, fill_char))

Searched py-polars/polars/expr/expr.py but there doesn't seem to be a function called str_pad_start

@coastalwhite
Copy link
Collaborator

It is in rust, under crates/polars-python/src/expr

@cmdlineluser
Copy link
Contributor

It calls .str().pad_start() on the rust side:

pub(super) fn pad_start<'a>(


.list.lengths() has also since been renamed in the repeat_by approach mentioned above:

df = s.to_frame("x")

df.with_columns(
    pl.lit(0).repeat_by(5 - pl.col("x").list.len())
      .list.concat("x")
      .alias("list.pad_start")
)

# shape: (3, 2)
# ┌─────────────────┬─────────────────┐
# │ x               ┆ list.pad_start  │
# │ ---             ┆ ---             │
# │ list[i64]       ┆ list[i64]       │
# ╞═════════════════╪═════════════════╡
# │ [1, 2, 3, 4]    ┆ [0, 1, 2, 3, 4] │
# │ [2, 3, 4, 5, 6] ┆ [2, 3, 4, 5, 6] │
# │ [3, 2, 1]       ┆ [0, 0, 3, 2, 1] │
# └─────────────────┴─────────────────┘

@AndhikaWB
Copy link

Sorry, can't help with that, I'm not familiar enough with Rust 😅

I spouted my previous comment spontaneously, forgetting that Polars core is written in Rust

@ion-elgreco
Copy link
Contributor

@AndhikaWB if you try, you will learn it ;)

@etiennebacher etiennebacher linked a pull request Jan 12, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Status: Ready
Development

Successfully merging a pull request may close this issue.

8 participants