Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make create_basic_dataset reuse the CMILES from an OptimizationDataset (if present) to avoid issues where cheminformatics toolkit behavior has changed #310

Open
ntBre opened this issue Dec 16, 2024 · 0 comments

Comments

@ntBre
Copy link
Contributor

ntBre commented Dec 16, 2024

As Lily pointed out here:

there's no guarantee that running a to_smiles will give you the same cmiles for the same molecule across different OpenEye/RDKit versions, or especially if you have one toolkit installed but another was used to generate the source datasets. It would be slightly more robust to use the exact cmiles in the dataset result

Basically, the new version of create_basic_dataset (and likely other parts of qcsubmit) reconstructs a CMILES for each Molecule instead of reusing the CMILES on the OptimizationRecord. I don't think this causes any (known) functional issues, but it could lead to situations where two "identical" datasets have different CMILES.

This call to to_smiles is the root of the issue in this case, and could be replaced by some kind of dict lookup mapping record_id to cmiles extracted from self.entries earlier in the function.

dataset.add_molecule(
index=base_molecule.to_smiles(
isomeric=True, explicit_hydrogens=False, mapped=False
),
molecule=None,
initial_molecules=[rec.final_molecule for rec, _ in records],
attributes=MoleculeAttributes.from_openff_molecule(base_molecule),
extras=base_record.extras,
keywords=base_record.specification.keywords,
)

@j-wags j-wags changed the title Possible CMILES changes between toolkit/backend versions Make create_basic_dataset reuse the CMILES from an OptimizationDataset (if present) to avoid issues where cheminformatics toolkit behavior has changed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant