Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamps and estimateIncidence #12

Open
tleht opened this issue Jan 10, 2025 · 0 comments
Open

Timestamps and estimateIncidence #12

tleht opened this issue Jan 10, 2025 · 0 comments

Comments

@tleht
Copy link

tleht commented Jan 10, 2025

Describe the bug
Currently estimateIncidence accepts cohort start and end dates in denominator and outcome tables in both date and timestamp data types. The function performs as.Date calls on these variables when it calls IncidencePrevalence:::getIncidence. This can lead to incorrect incidences when input tables contain start and end dates in different data types and the system time zone is not UTC+0

To Reproduce
Can't figure a simple example here because DuckDb only supports UTC+0 timestamps.

Additional context
In a recent DARWIN project our node ran into an issue where the number of events and incidences we got from this function were far lower than they should have been in a single cohort. What turned out to be causing this issue was the fact that the denominatorTable for specifically this cohort had cohort_start_date as a date and cohort_end_date as a timestamp due to unintended behaviour in cohort generation (both were meant to be dates by the writers of the script, but this ended up not being the case specifically in PostgreSQL). The time stamps were of the form YYYY-MM-DD 00:00:00 and had our system time zone UTC+2. The default behaviour of as.Date is to convert the time zone of POSIX* timestamps into UTC+0. This means that when as.Date calls are made in IncidencePrevalence:::getIncidence, all the cohort_end_dates ended up moving back a day (e.g. 2023-01-01 00:00:00 UTC+2 -> 2022-12-31 22:00:00 UTC+0 -> 2022-12-31). This lead to vast majority of the cases being excluded in the incidence calculations.

To prevent this, the function should probably at least check that all time variables given to it as input have the same data type. There might also be something to be said about having the option to either give relevant tz argument to the as.Date calls and/or using the system timezone by default by having the calls be as.Date(..., tz = "") when timestamp data is being used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant