Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues in "make_one_canonical_transcript_per_gene.py" script #61

Open
pieterlukasse opened this issue Aug 31, 2022 · 1 comment
Open

Comments

@pieterlukasse
Copy link
Contributor

pieterlukasse commented Aug 31, 2022

Ticket to make sure the issues found and discussed in #58 comment are investigated and fixed. Main suspect is the following script:

@leexgh
Copy link
Member

leexgh commented Sep 6, 2022

For the question: "some genes do not have a value in ensembl_canonical_gene, but do have ensembl transcript ids set for mskcc_canonical_transcript and uniprot_canonical_transcript", for example gene "GAGE5"

  • ensembl_canonical_gene doesn't have value because it tries to find a match from ensembl_biomart_geneids by gene_stable_id or hgnc_symbol. But ensembl_biomart_geneids doesn't contain this gene so it has no value.
  • mskcc_canonical_transcript and uniprot_canonical_transcript has value because it ties to find a match from overrides tables by hgnc_symbol, which are isoform_overrides_at_mskcc_grch38 and isoform_overrides_uniprot, but the overrides table contains those genes so they are able to find a match and put in a value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants