Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regression recognize strings/enums/categoricals and convert to dummies AND also make helper function for interaction. #312

Open
deanm0000 opened this issue Jan 17, 2025 · 0 comments

Comments

@deanm0000
Copy link

It'd be a nice feature if it would recognize str/enum/cat columns and automatically convert them to dummies.

You'd have to have a for loop on the inputs checking their type and if it was one of those types it'd have to convert them to dummies. Since df.to_dummies already exists that shouldn't be too bad.

For the interactions, on the user side I'm thinking it'd be like

from polars_ds import interaction_helper as ih
import polars_ds as pds

df.select(pds.lin_reg_report( ih("color", "height"), "color", "height", target="width", add_bias=True)

The only way I can think to implement that would be to haveih wrap the columns to be interacted in a pl.struct so that then on the rust side we can check if it's a struct column and if it is then it first checks to see if it should make dummies out of the sub-columns and then multiplies the columns in the struct to form a single new column.

But wait there's more, the helper function wouldn't be merely a wrapper for pl.struct, it could have parameters to allow only typing the columns once so instead of the above you could do df.select(pds.lin_reg_report( *ih("color", "height", include_main=True), target="width", add_bias=True)

Additionally it could have product (or maybe call it cartesian) so if the user puts in 3 or more variables it would do all the interactions between the 3 or if set to False it would only do the 3 together.

that would be like

df.select(pds.lin_reg_report( *ih("color", "shape", "height", include_main=True, product=True), target="width", add_bias=True)

which would have regressors: red_round_height, red_round, red_square_height, red_square, red_height, ..., green_height. (I don't want to type any more examples but the colors turn into dummies, the shape turns into dummies and then height stays a float and then all the possible combinations between them become their own interactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant