I will share how I obtain the paratope information stored in hla2paratopeTable_aligned.txt file.
IMGT 3D database contains abundant information on the experimentally solved crystal structure of pMHC1 complex, which can serve as the information source to extract the paratopes informatio (HLA residues that likely to interact with the neoepitope).
In the above link, we selected species as human
and the type as pMH1
, click submit
. You will be directed to a page where all the solved complex structures were listed. When I conduct the project, there were 550
entries in the returned page, now the number has increased.
Let's click any one of the structrure hyperlink:
Then click the "paratope and epitope", where the contacting residues will be shown on the page:
For each HLA, we want to collect all the available contact residues information for each of the experimentally validated paired epitopes.
Now we definitely want to automates this retrieval process, so we use web scraping tool scrapy
and selenium
.
The codes and instruction for running the scraper can be found here in this folder. More information I'd like to refer you to the scrapy and selenium official website. You can also use any other alternative ways to retrieve the paratope information.
We perform a two round clustal-omega runs, the concrete examples are shown in our Supplemental Figure 1.
And the corresponding text descrption can be found in our Supplemental method from line 36-40.
They should be clear enough, but feel free to contact me if you have any confusions or want to know more about the process.