What does it mean when there is no output data in 'final_design_stats.csv'? #141

nduan1 · 2024-12-19T16:37:00Z

Hi, the pipeline is amazing! I am trying to run a prediction on a ~300-aa peptide. Everything seems to be running smoothly, but there is no output in the 'final_design_stats.csv' file or the 'Accepted' folder after 6 days of processing. It is still running, though.

There is some data in the 'mpnn_design_stats.csv' and 'trajectory_stats.csv' files.

I used the default settings on my command line.
Here is the content of my JSON file:

{
"design_path": "/data/EmiolaLab/duann2/bin_proj/bindcraft/bcraft/BindCraft/candidate4-2/",
"binder_name": "can4",
"starting_pdb": "/data/EmiolaLab/duann2/bin_proj/bindcraft/bcraft/BindCraft/candidate4-2/can4.pdb",
"chains": "A",
"target_hotspot_residues": null,
"lengths": [1, 289],
"number_of_final_designs": 100
}

What does it mean if there is no final output in the 'final_design_stats.csv' file or the 'Accepted' folder? Should I change any settings? I would greatly appreciate any instructions you can provide.

Thanks!

LennartNickel · 2024-12-23T20:57:38Z

This means that no designs have made it through the filters. Have a look at the trajectories being sampled and see if they are reasonable or if you can find a reason for the failure in the failure.csv file. This can help you to adjust the settings. Sometimes, a lot of sampling is necessary, and in rare cases it just doesn't find a solution. However, I would advice to try a few more inputs (af2 prediction, difference crystal structures) and different hotspots.

linuxfold · 2024-12-29T22:15:56Z

I am having a similar issue. From preprint:

We allow only 2 MPNNsol generated sequences per individual AF2 trajectory to
pass filters to promote interface diversity amongst selected binders.

If the overall success rate is very low, would it make sense to increase the amount of MPNNsol sequences per trajectory?

After like 1000 trajectories, there are no accepted designs but there are 2 different trajectories (with 5 _mpnn and 2 _mpnn versions) in mpnn_design_stats.csv.

Is there a way to re-run the mpnn pipeline on these trajectories for more than 2 examples until some of them get accepted designs? Would this be beneficial or am I misunderstanding something?

Mikukub · 2024-12-30T03:45:01Z

Check at the code, the mpnn design is came from trajectory that mean fitter is too strict for you target. I think plddt, iptm might to high for your target. I think it should possible to redo mpnn step from trajectory but that need skill for python, I think he might update this function soon

LennartNickel · 2024-12-31T17:42:47Z

The amount of mpnn designs that are tried is by default 20 and can be increased in the advanced settings. The quote you are mentioning is linked to the amount of accepted designs allowed for each trajectory. There will be 20 mpnn sequences generated and the pipeline checks them for both af2 and rosetta filters one after the other. The moment 2 have passed all filters the remaining ones will not be checked and a new trajectory is started.

Mikukub · 2024-12-31T18:11:51Z

I want to reuse trajectory but don't want to headache with the python, I would love to see it is implementing to the program since many people that tried bindcraft a lot of trajectory

nduan1 · 2025-01-02T17:40:18Z

This means that no designs have made it through the filters. Have a look at the trajectories being sampled and see if they are reasonable or if you can find a reason for the failure in the failure.csv file. This can help you to adjust the settings. Sometimes, a lot of sampling is necessary, and in rare cases it just doesn't find a solution. However, I would advice to try a few more inputs (af2 prediction, difference crystal structures) and different hotspots.
Hi @LennartNickel, thanks for your reply. I tried using AF2 to pick the hotspot, but there is still no final output. I copied my failure.csv file. Could you guide me on how to interpret it? Thanks!

Trajectory_logits_pLDDT,Trajectory_softmax_pLDDT,Trajectory_one-hot_pLDDT,Trajectory_final_pLDDT,Trajectory_Contacts,Trajectory_Clashes,Trajectory_WrongHotspot,MPNN_score,MPNN_seq_recovery,pLDDT,pTM,i_pTM,pAE,i_pAE,i_pLDDT,ss_pLDDT,Unrelaxed_Clashes,Relaxed_Clashes,Binder_Energy_Score,Surface_Hydrophobicity,ShapeComplementarity,PackStat,dG,dSASA,dG/dSASA,Interface_SASA_%,Interface_Hydrophobicity,n_InterfaceResidues,n_InterfaceHbonds,InterfaceHbondsPercentage,n_InterfaceUnsatHbonds,InterfaceUnsatHbondsPercentage,Interface_Helix%,Interface_BetaSheet%,Interface_Loop%,Binder_Helix%,Binder_BetaSheet%,Binder_Loop%,InterfaceAAs_A,InterfaceAAs_C,InterfaceAAs_D,InterfaceAAs_E,InterfaceAAs_F,InterfaceAAs_G,InterfaceAAs_H,InterfaceAAs_I,InterfaceAAs_K,InterfaceAAs_L,InterfaceAAs_M,InterfaceAAs_N,InterfaceAAs_P,InterfaceAAs_Q,InterfaceAAs_R,InterfaceAAs_S,InterfaceAAs_T,InterfaceAAs_V,InterfaceAAs_W,InterfaceAAs_Y,HotspotRMSD,Target_RMSD,Binder_pLDDT,Binder_pTM,Binder_pAE,Binder_RMSD
60,7,17,43,0,58,0,0,0,573,117,998,0,1136,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does it mean when there is no output data in 'final_design_stats.csv'? #141

What does it mean when there is no output data in 'final_design_stats.csv'? #141

nduan1 commented Dec 19, 2024

LennartNickel commented Dec 23, 2024

linuxfold commented Dec 29, 2024 •

edited

Loading

Mikukub commented Dec 30, 2024

LennartNickel commented Dec 31, 2024

Mikukub commented Dec 31, 2024

nduan1 commented Jan 2, 2025

What does it mean when there is no output data in 'final_design_stats.csv'? #141

What does it mean when there is no output data in 'final_design_stats.csv'? #141

Comments

nduan1 commented Dec 19, 2024

LennartNickel commented Dec 23, 2024

linuxfold commented Dec 29, 2024 • edited Loading

Mikukub commented Dec 30, 2024

LennartNickel commented Dec 31, 2024

Mikukub commented Dec 31, 2024

nduan1 commented Jan 2, 2025

linuxfold commented Dec 29, 2024 •

edited

Loading