We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've discovered that mismo.lib.geo.CoordinateBlocker doesn't handle missing values as I'd expect.
mismo.lib.geo.CoordinateBlocker
If a record has a missing coordinate value, I would not expect it to be blocked as the returned distance would be NaN.
The following example shows that records with a null coordinate value are indeed blocked together
from mismo.lib.geo import CoordinateBlocker import ibis ibis.options.interactive = True con = ibis.get_backend() data =[{"record_id":1, "lat":1, "lon":1}, {"record_id":2, "lat":2, "lon":None}, {"record_id":3, "lat":3, "lon":None}] table = con.create_table("test", ibis.memtable(data), overwrite=True) blocker = CoordinateBlocker(lat="lat", lon="lon", distance_km=1000) blocker(table, table) ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓ ┃ record_id_l ┃ record_id_r ┃ lat_l ┃ lat_r ┃ lon_l ┃ lon_r ┃ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩ │ int64 │ int64 │ int64 │ int64 │ float64 │ float64 │ ├─────────────┼─────────────┼───────┼───────┼─────────┼─────────┤ │ 2 │ 3 │ 2 │ 3 │ NULL │ NULL │ └─────────────┴─────────────┴───────┴───────┴─────────┴─────────┘
In this case, I can see that mismo.lib.geo.distance_km evaluates to NULL,
mismo.lib.geo.distance_km
NULL
I think this can be resolved by modifying the logic here so that it returns null if either lat or lon is null
lat
lon
The text was updated successfully, but these errors were encountered:
Thanks! Indeed looks like a bug. Expected behavior is that a record where either lat is null or lon is null should be blocked with no other records.
I'll play around when I'm at a computer. That fix you suggest seems promising, thanks!
Sorry, something went wrong.
No branches or pull requests
I've discovered that
mismo.lib.geo.CoordinateBlocker
doesn't handle missing values as I'd expect.If a record has a missing coordinate value, I would not expect it to be blocked as the returned distance would be NaN.
The following example shows that records with a null coordinate value are indeed blocked together
In this case, I can see that
mismo.lib.geo.distance_km
evaluates toNULL
,I think this can be resolved by modifying the logic here so that it returns null if either
lat
orlon
is nullThe text was updated successfully, but these errors were encountered: