-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Issue with RNA Chains in Paired MSA Construction #238
Comments
I also printed a brief summary of these chains Chain A:
Sequence: GGCGACAUUUGUAAUUCCUGGACCGAUACUUCCGUCAGGACAGAGGUUGCC
Unpaired MSA: ['GGCGACAUUUGUAAUUCCUGGACCGAUACUUCCGUCAGGACAGAGGUUGCC']
Paired MSA: ['GGCGACAUUUGUAAUUCCUGGACCGAUACUUCCGUCAGGACAGAGGUUGCC']
Chain B:
Sequence: MSRIITAPHIGIEKLSAISLEELSCGLPDRYALPPDGHPVEPHLERLYPTAQSKRSLWDFASPGYTFHGLHRAQDYRRELDTLQSLLTTSQSSELQAAAALLKCQQDDDRLLQIILNLLHKV
Unpaired MSA: ['MSRIITAPHIGIEKLSAISLEELSCGLPDRYALPPDGHPVEPHLERLYPTAQSKRSLWDFASPGYTFHGLHRAQDYRRELDTLQSLLTTSQSSELQAAAALLKCQQDDDRLLQIILNLLHKV', '-------------------------------ALPPDGHPVEPHLERLYPTAQSKRSLWDFASPGYTFHGLHRAQDYRRELDTLQSLLTTSQSSELQAAAALLKCQQDDDRLLQIILNLLHKV']
Paired MSA: ['MSRIITAPHIGIEKLSAISLEELSCGLPDRYALPPDGHPVEPHLERLYPTAQSKRSLWDFASPGYTFHGLHRAQDYRRELDTLQSLLTTSQSSELQAAAALLKCQQDDDRLLQIILNLLHKV', 'MSRIITAPHIGIEKLSAISLEELSCGLPDRYALPPDGHPVEPHLERLYPTAQSKRSLWDFASPGYTFHGLHRAQDYRRELDTLQSLLTTSQSSELQAAAALLKCQQDDDRLLQIILNLLHKV']
Chain C:
Sequence: MNITLTKRQQEFLLLNGWLQLQCGHAERACILLDALLTLNPEHLAGRRCRLVALLNNNQGERAEKEAQWLISHDPLQAGNWLCLSRAQQLNGDLDKARHAYQHYLELKDHNESP
Unpaired MSA: ['MNITLTKRQQEFLLLNGWLQLQCGHAERACILLDALLTLNPEHLAGRRCRLVALLNNNQGERAEKEAQWLISHDPLQAGNWLCLSRAQQLNGDLDKARHAYQHYLELKDHNESP', '--MTLTERQQAFLLLNGWLQLQYGQAERACILLDALLHLSPDHLAARRCRLVALLKSGQGVRAQQEATWLVLNDDPQPGSWLCLSRAHQLSGELELARHAYQRYLELEEQYES-']
Paired MSA: ['MNITLTKRQQEFLLLNGWLQLQCGHAERACILLDALLTLNPEHLAGRRCRLVALLNNNQGERAEKEAQWLISHDPLQAGNWLCLSRAQQLNGDLDKARHAYQHYLELKDHNESP', 'MNITLTKRQQEFLLLNGWLQLQCGHAERACILLDALLTLNPEHLAGRRCRLVALLNNNQGERAEKEAQWLISHDPLQAGNWLCLSRAQQLNGDLDKARHAYQHYLELKDHNESP']
Chain D:
Sequence: X
Unpaired MSA: ['-']
Paired MSA: ['-'] |
Another related issue: If the prediction is for a monomeric protein, the MSA features will contain the same query sequence in the first two rows. This happens because the unpaired MSA is deduplicated only when alphafold3/src/alphafold3/model/features.py Lines 529 to 539 in b380a7c
|
Thank you for the detailed report! We are investigating on our side now and will report back once we know more. |
Hi Wantao - just want to check something before we go too far - are you running with custom MSA? Can you share your full input json? |
can you share the full |
Sorry, you can ignore those two requests - I think I agree this is a bug. We will confirm and make a fix. The reason why this has not had much effect is that the RNA model sensitivity to MSA inputs is low, and note that MSA rows are shuffled so the first row has no particular importance in the input. |
Thanks again for reporting! This has been fixed in ea04034.
Yes, this is because the sequence is included once for unpaired, once for paired MSA. This is not an issue, the model will deal with this. |
While investigating how AlphaFold3 processes MSA pairings, I encountered a problem related to the handling of RNA chains. I used a complex consisting of one RNA chain, two different protein chains, and a magnesium ion (MG) as an example.
In the following code snippet, AF3 assigns the "protein" molecule type to all chain types, including RNA, when constructing the
paired_msa
. This leads to RNA paired MSAs (which should only contain the query sequence) being converted into amino acid IDs.alphafold3/src/alphafold3/model/features.py
Lines 490 to 495 in b380a7c
To confirm this behavior, I used
print
to inspect the paired MSA for the RNA chain. The output demonstrated that the RNA sequence had been incorrectly converted into amino acid IDs (by_PROTEIN_TO_ID
).And
unpaired_msa
is constructed correctly.Moreover, since the paired MSA is always placed above the unpaired MSA, this error propagates and remains in the final MSA. I used
print
to inspect the first row of the final msa:The result is:
However, this error does not seem to significantly affect the prediction results. Could this be because AF3 no longer emphasizes the first row in MSA (making it order-independent) and relies less on the MSA?
Please let me know if I’m wrong. Thanks!
The text was updated successfully, but these errors were encountered: