Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Datetime type for ranges outside datetime.[MIN|MAX]YEAR #310

Closed

Conversation

sylwiaszunejko
Copy link
Collaborator

@sylwiaszunejko sylwiaszunejko commented Mar 12, 2024

In Python datetime.datetime type year has to be in range [MINYEAR, MAXYEAR]. This range is not the same as possible timestamps in scylla. Previously if timestamp was outside this range it made driver raise an Exception. It was not correct behavior. There was a work around implemented in cqlsh.

This PR introduces a Datetime type to accommodate ranges outside datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR), this type falls back to printing milliseconds_from_epoch offset.

Something similar was introduced before for datetime.date type - 4f3c77c.

Fixes: #255

@sylwiaszunejko
Copy link
Collaborator Author

I'm not sure about a few things:

  • is using milliseconds the right choice?
  • should I add support for passing strings as an argument to the Datetime constructor?

@sylwiaszunejko sylwiaszunejko marked this pull request as draft March 12, 2024 16:38
@sylwiaszunejko sylwiaszunejko force-pushed the timestamp_fail branch 2 times, most recently from 4644ed8 to be49277 Compare March 15, 2024 13:20
@sylwiaszunejko
Copy link
Collaborator Author

@avelanarius @Lorak-mmk I made some extra changes to make CI work, also I removed one test test_row_error_message because this test required driver to raise an Exception when timestamp is outside the range [MIN|MAX]YEAR, but after my change this is not a correct behavior anymore.

@nyh
Copy link

nyh commented Mar 18, 2024

Maybe you can add the two examples I had in #255 (comment) and #255 (comment) as tests? These two cases used to show broken behavior before you patch, and we'll probably want to know it was fixed by the patch.

@sylwiaszunejko
Copy link
Collaborator Author

@nyh I added the test you asked about

@avelanarius
Copy link

You should also update the docs page about this change (https://github.com/scylladb/python-driver/blob/master/docs/dates-and-times.rst) and possibly other pages if they mention the old way the driver parsed datetimes.

@avelanarius
Copy link

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

@sylwiaszunejko
Copy link
Collaborator Author

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

What kind of test do you mean? .util.Datetime is tested every time get_sample is called on timestamp and it happens multiple times in test_types.py for example in test_can_insert_primitive_datatypes

In Python `datetime.datetime` type year has to be in range
[MINYEAR, MAXYEAR]. This range is not the same as possible timestamps
in scylla. Previously if timestamp was outside this range it made
driver raise an Exception. It was not correct behavior. There was a
work around implemented in cqlsh.

This commit introduces a `Datetime` type to accommodate ranges outside
datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as
a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR),
this type falls back to printing milliseconds_from_epoch offset.

Fixes: scylladb#255
@fruch
Copy link

fruch commented Mar 20, 2024

Two things,

this change might break users code, since it's returning a different type than it used to.
Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

@Lorak-mmk
Copy link

Two things,

this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

@sylwiaszunejko
Copy link
Collaborator Author

Two things,
this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.
We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

@Lorak-mmk What do you suggest? Because this issue was discussed in upstream years ago, but they decided to do workaround in cqlsh instead of fixing it https://issues.apache.org/jira/browse/CASSANDRA-10625. @nyh was against this solution and said we should fix it, maybe we should have a discussion about that? I see there are a lot of things to consider. @avelanarius @fruch

@roydahan
Copy link
Collaborator

@sylwiaszunejko thanks for owning this.
I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it.
IIRC this is an edge cases that we rarly meet in testing and never seen in the field.
If that’s the case, there shouldn’t be an urgency to merge it for now.

@fruch
Copy link

fruch commented Mar 21, 2024

@sylwiaszunejko thanks for owning this.
I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it.
IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

If that’s the case, there shouldn’t be an urgency to merge it for now.

@nyh
Copy link

nyh commented Mar 21, 2024

@sylwiaszunejko thanks for owning this.
I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.
When we have that feedback, we can get back to it.
IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

@Lorak-mmk
Copy link

If the problem is not encountered by users and is generally not severe then I don't think it's worth it to break compatibility to fix it - and I suspect upstream will say the same.

@fruch
Copy link

fruch commented Mar 21, 2024

@sylwiaszunejko thanks for owning this.
I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.
When we have that feedback, we can get back to it.
IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

You are correct it doesn't fully adhere to the CQL spec in that

I think those really need that kind of dates, can override instruct the driver not to convert, or do the same hack done in cqlsh.

Still we can suggest it upstream, and maybe they would be convinced

@sylwiaszunejko
Copy link
Collaborator Author

sylwiaszunejko commented Mar 25, 2024

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream? Maybe you have any idea how to fix the issue without breaking compatibility?

@fruch
Copy link

fruch commented Mar 25, 2024

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream

Raise an issue, and open a PR similar to this one.

it might take some time until they would react to those, but that how we were doing those so far.

I guess you can find some cassandra mailing to discuss as well

@Lorak-mmk
Copy link

Can this PR be closed now that upstream PR is open?

@Lorak-mmk Lorak-mmk removed their request for review April 29, 2024 15:54
@roydahan
Copy link
Collaborator

We shall assume that the upstream PR won't get noticed for quite a while, so the question is what to do with the issue this PR claims to fix: #255

It's ok to provide just an answer there or reference to relevant docs and close the issue if we believe this is what's need to be done.

@sylwiaszunejko
Copy link
Collaborator Author

We are not merging it to not break backwards compatibility, but if someone needs datetimes outside datetime.[MIN|MAX]YEAR feel free to cherry-pick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants