Issues with PubChem name/id mappings #89

bgyori · 2020-11-25T16:32:23Z

Using the PR branch #88, I tried a couple of different ways of getting a pubchem.compound name-to-id mapping data structure.

from pyobo import get_name_id_mapping
get_name_id_mapping('pubchem.compound')

throws

NoOboFoundry: OBO Foundry is missing the prefix: pubchem.compound

Another approach I tried is

from pyobo.sources.pubchem import get_pubchem_id_to_name
pc = get_pubchem_id_to_name()

which throws

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 3: invalid continuation byte

while reading the file ~/.obo/raw/pubchem.compound/2020-11-01/CID-Title.gz.

The text was updated successfully, but these errors were encountered:

bgyori · 2020-11-25T16:39:30Z

For the second error, I tried reproducing this with

with open('CID-Title', 'r') as fh:
    lines = fh.readlines()

and got the same encoding error, whereas this works without error

with open('CID-Title', 'r', encoding='latin-1') as fh:
     lines = fh.readlines()

bgyori mentioned this issue Nov 25, 2020

Update PubChem version #88

Merged

Provide feedback