You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do all the toponyms exist in OSM (city, state, region names, etc.)?
yes
If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?
NA
If the address does not contain city, region, etc., does adding those fields to the input improve the result?
removing the postcode leads to correct parsing
If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?
NA
Here's what I think could be improved
Eircodes are relatively new and only now coming into common use, especially for deliveries.
They are not yet widely found in OpenStreetMap.
Still, the format is easy to identify and the parser should be able to recognize them.
The text was updated successfully, but these errors were encountered:
Eircodes were just starting to roll out when it was initially trained but there were very few examples available as most people were using the old system. In a future version I've thought about adding UK/Irish/Canadian/any other similar postcodes directly to the tokenizer since they follow regular patterns that are unambiguous with other types, and then the model can just treat them as a single token and handle within a handful of type features instead of one for every normalized postcode-word (saves space as well, and those don't require geographic context so could remove them from the postcode index - which is stored efficiently as a trie but still clocks in at about 500MB), though that would muck with the weights and require a parser retraining, which is not planned for the very near future, though there's some rearchitecting going on in the background.
This style of postcode only partially benefits from the classic NLP features that are used such as word shapes/digit masks because those would normalize to something like ["pDD" "ktDD"]. With enough training data that can work even without observing every possible postcode, but the data would need to capture every pattern sans digits (for the UK/Canada there were also training examples built off of a somewhat exhaustive list that then gets normalized to word/digit shapes).
One workaround is just to extract/remove with regex before parsing since they do follow regular patterns.
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
Ireland
Here's how I'm using libpostal
Parsing addresses
Here's what I did
Tried to parse Irish addresses including Eircodes (relatively new Irish postcode format)
Example:
Riverside House, Doneraile, P51 KT93, Ireland
Here's what I got
Here's what I was expecting
For parsing issues, please answer "yes" or "no" to all that apply.
no
yes
NA
removing the postcode leads to correct parsing
NA
Here's what I think could be improved
Eircodes are relatively new and only now coming into common use, especially for deliveries.
They are not yet widely found in OpenStreetMap.
Still, the format is easy to identify and the parser should be able to recognize them.
The text was updated successfully, but these errors were encountered: