-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add linear_space
#20678
base: main
Are you sure you want to change the base?
feat: Add linear_space
#20678
Conversation
|
||
#[derive(Copy, Clone, Debug, Hash, Eq, PartialEq, Default, IntoStaticStr)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved ClosedInterval
into linear_space
since it's feature-gated here and I wanted to re-use it. I'm not sure if there's a better place for it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20678 +/- ##
========================================
Coverage 79.73% 79.74%
========================================
Files 1561 1564 +3
Lines 221913 222097 +184
Branches 2530 2531 +1
========================================
+ Hits 176943 177105 +162
- Misses 44388 44410 +22
Partials 582 582 ☔ View full report in Codecov by Sentry. |
I wasn't sure about that one myself. Will update. |
@mcrumiller I changed my mind right after and deleted my comment, but I guess I wasn't fast enough. I think it's alright to have - equidistant points over the range, not including either start or end. I'm not sure about the name |
I was thinking about this as well. A better-suited parameter name might be something like I grabbed the interval enumeration from |
Let's keep using the established closed terminology then. |
Sure. Do you have any comments on improvements? Should we include a return 'dtype' parameter like in numpy's implementation? |
Rust side mostly looks good to me. I don't think we need a dtype parameter at this time. However I think there's still a couple things missing around dtypes:
|
d64b15c
to
97bbae8
Compare
@orlp, latest round of updates addressing your suggestions:
I've added separate floating implementations (I originally went for generics but it felt messy to me, so I just made two separate implementations). Included new tests.
I've added a temporal path. start/end points with dates produces a
Done--only numeric/temporal work. Again, the start/end must match, and anything outside of these dtypes throws an Exception. |
closed: ClosedInterval = "both", | ||
eager: bool = False, | ||
) -> Expr | Series: | ||
"""Linearly-spaced elements.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation could be a bit better, perhaps with an example for start = 0, end = 1
, num_samples = 4
for each of the closed
options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep forgot docstrings.
@orlp I think we may want |
@mcrumiller Yes, I agree, but it must resolve to a single value, similar to this error: >>> df = pl.DataFrame({"x": [1, 2, 3]})
>>> df.select(pl.int_range(0, pl.col.x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/orlp/.localpython/lib/python3.11/site-packages/polars/dataframe/frame.py", line 9323, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/orlp/.localpython/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2057, in collect
return wrap_df(ldf.collect(callback))
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: `end` must contain exactly one value, got 3 values |
2f39ae6
to
6627366
Compare
@orlp updated |
@@ -70,6 +75,7 @@ impl RangeFunction { | |||
match self { | |||
IntRange { dtype, .. } => mapper.with_dtype(dtype.clone()), | |||
IntRanges => mapper.with_dtype(DataType::List(Box::new(DataType::Int64))), | |||
LinearSpace { closed: _ } => mapper.with_dtype(DataType::Float64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer correct, right? The output dtype isn't just Float64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Orson, should have used LazyFrames in my tests. Fixed.
I now assume in the actual implementation that the schema has been validated already, which lets me remove a match arm. Is this ok to do? In other words, do I need to re-check the dtypes in the linear space implementation, or can I assume that an invalid input schema would have already been caught in the dsl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcrumiller Please verify the types in the implementation as extra backup, I don't think eliding it makes much difference for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've reverted to include the type verification in the implementation.
start | ||
Lower bound of the range. | ||
end | ||
Upper bound of the time range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this say "time range"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
0.333333 | ||
0.666667 | ||
1.0 | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a closed="both"
example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a closed="none"
example which was also missing. Do you think I should be explicit in the base example with the closed="both"
parameter? This is the default so it's not strictly necessary.
>>> pl.linear_space(start=0, end=1, num_samples=3, eager=True)
shape: (3,)
Series: 'literal' [f64]
[
0.0
0.5
1.0
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I have now explicitly provided closed="both"
in the example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant closed="none"
, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I've removed the closed="both"
explicit declaration since it shows the default usage a bit better.
e513cca
to
29b6992
Compare
@orlp I think I have addressed all of your concerns. Thanks for the thorough review. |
First pass at #5255.
Update: linear space with temporals is now implemented, along with an
f32
path, and stronger testing.I kept the name
linear_space
since that's a more canonical polars name, althoughlinspace
is far more recognizable. The usage is fairly simple, and theclosed
is one ofboth
(default),left
,right
, ornone
. In each case, the number of samples asked for is provided.