-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easier way to apply to DataFrames #42
Comments
Thanks for the input, Max! Can you elaborate a bit on what the content of what your |
It'd be cool if it could instead be:
That is, if a Separately, since CSVs from sources like IPUMS provide fips codes as ints, saving that step could be useful, but I could see it also making it more error-prone, especially for ZIPs, and it sounds like you've thought about that and made a design decision to require preliminary padding. |
This would be great to have—any updates on the status? |
Hey there! Great library, I've been using it in several projects at work. This issue caught my eye since we use Pandas a lot in my projects too, so I thought I'd weigh in. Since this library is pretty lightweight (the only dependency I see is on import pandas as pd
from us.states import STATES
us_df = pd.DataFrame([vars(st) for st in STATES]) From there, I assume you wouldn't be doing a 'lookup' unless you had another dataset you were trying to merge with, but if that's the case then you would use something like If it so happens to be that your dataset has numeric FIPS codes (this would be 'dirty' data though, since as @mileswwatkins stated, FIPS codes are officially zero-padded strings), you can either:
us_df['fips'] = pd.to_numeric(us_df['fips']) or 2) convert your data to padded-string FIPS codes prior to merging: my_df['fips'] = my_df['fips'].str.pad(width=2, fillchar='0') Alternatively, if you know you just need to convert from one representation to another and don't need all of the information from this library as a DataFrame, rather than doing a lookup one at a time, you can use this library's from us.states import mapping
my_df["abbr"] = my_df["fips"].map(mapping('fips', 'abbr')) In my opinion: since the zero-padded version is the official designation, that's what this library should maintain, and it shouldn't make assumptions about what libraries downstream consumers are using (consider if someone else were using PySpark or Dask instead), but documentation/examples are always good to have (perhaps in a 'Recipes' guide) Hope this helps! |
I've also had this problem and converted from int to a string with leading zeros as a workaround. The issue with this, however, is that I caught the fact that some of the lookups didn't return any values only later on. There was no error, passing |
In mapping a pandas DataFrame's numeric fips index to state name, I currently have to do this:
There are a couple things going on here that could be other issues*, but might also be nice to have the vectorization built-in for pandas. e.g. I'd like to be able to just do:
state['state'] = us.states.lookup(state.index)
* This one was verbose because things like
us.states.lookup(1)
andus.states.lookup('1')
fail.The text was updated successfully, but these errors were encountered: