Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Street Name Misspellings #26

Open
stdavis opened this issue Oct 31, 2019 · 9 comments
Open

Street Name Misspellings #26

stdavis opened this issue Oct 31, 2019 · 9 comments
Labels
addresses Related to the addresses sweeper

Comments

@stdavis
Copy link
Member

stdavis commented Oct 31, 2019

From @steveoh

get the unique street names from our roads data and address points. then parse their addresses to the parts and see if the road exists in our data or something similar with levenshtein to catch misspellings

From @ZachBeck

[Look] for compound word misspellings like Switchback Way vs Switch Back Way

Not sure on the best way to do this. Perhaps trying to compare concatenated multiple word street names to the known list of street names? Or maybe something like levenshtein could handle this.

@ZachBeck
Copy link
Member

This one is tough because it's not always clear which version it should be. All that you know is that the street name in address points is different from the roads.

@stdavis stdavis added the addresses Related to the addresses sweeper label Oct 31, 2019
@stdavis
Copy link
Member Author

stdavis commented Oct 31, 2019

Is the roads feature class the accepted source of truth for road names over the address point data? Or are you saying that it's not clear which is correct? In my mind, we need to just pick one (from the parser perspective at least) as the single source of truth.

@ZachBeck
Copy link
Member

That's a tough one... I'd say it depends on the county which is better (roads vs address pts). It's not always clear which one is correct

@stdavis
Copy link
Member Author

stdavis commented Oct 31, 2019

For this project, we may just need to pick the one that we hope it ends up as and go with it. I'm assuming that would be roads, but I'm not the one to make the call. We can talk more as this project moves forward.

@steveoh
Copy link
Member

steveoh commented Nov 1, 2019

It would help us choose if we have a little more information. @ZachBeck can you list which counties have better data in the address points vs the roads. We could then pick based on the amount of data etc. I know that greg erik and zach are trying to resolve the discrepancies so hopefully they will match up better in the future.

@stdavis
Copy link
Member Author

stdavis commented May 6, 2024

@gregbunce or @ZachBeck Do the street names align better between the roads and address point data these days?

@gregbunce
Copy link
Contributor

for the most part, it's pretty good, but they don't align perfectly. we (zach) have some code to check for alignment, but we don't run it very often.

@steveoh
Copy link
Member

steveoh commented May 7, 2024

Is there a reason not to run it after every update?

@ZachBeck
Copy link
Member

ZachBeck commented May 7, 2024

For the most part I run it every time I update a county's address points. To reconcile the differences it would be a matter of looking at each individual road in google street view to figure out where the problem is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addresses Related to the addresses sweeper
Projects
None yet
Development

No branches or pull requests

4 participants