Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with PubChem name/id mappings #89

Open
bgyori opened this issue Nov 25, 2020 · 1 comment
Open

Issues with PubChem name/id mappings #89

bgyori opened this issue Nov 25, 2020 · 1 comment

Comments

@bgyori
Copy link
Contributor

bgyori commented Nov 25, 2020

Using the PR branch #88, I tried a couple of different ways of getting a pubchem.compound name-to-id mapping data structure.

from pyobo import get_name_id_mapping
get_name_id_mapping('pubchem.compound')

throws

NoOboFoundry: OBO Foundry is missing the prefix: pubchem.compound

Another approach I tried is

from pyobo.sources.pubchem import get_pubchem_id_to_name
pc = get_pubchem_id_to_name()

which throws

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 3: invalid continuation byte

while reading the file ~/.obo/raw/pubchem.compound/2020-11-01/CID-Title.gz.

@bgyori
Copy link
Contributor Author

bgyori commented Nov 25, 2020

For the second error, I tried reproducing this with

with open('CID-Title', 'r') as fh:
    lines = fh.readlines()

and got the same encoding error, whereas this works without error

with open('CID-Title', 'r', encoding='latin-1') as fh:
     lines = fh.readlines()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant