No tasks made on compute expansions #93

jthorton · 2021-01-27T10:13:21Z

PR #80 introduced threading to submissions by chunking up the total list of newly added entries and passing them to each thread. However, if we do a compute expansion where no new entries are added but we just want to create more tasks for a new specification, tasks will not be made as the entries list will be empty. To fix this we should get the list of entries from the dataset, this will be fine for now but might cause issues in future where we want to add compute to a subset of a dataset.

It might be better to create a pathway that better handles compute expansions explicitly and give users more options, for example, we could create a new ComputeExpansionschema which can take some QCSpecification and a dataset and can add new tasks, we could also add subset support which is not currently available. Here users could add a list of entries or a list of molecules for which they want to add compute. We could also support some of the filters built into qcsubmit and only add tasks for molecules which pass the filter. This would help in cases where we add ani2x support we can filter out molecules not covered by the model. I imagine it to look something like this

from openff.qcsubmit.compute_expansion import OptimizationExpansion, expand_compute
from openff.qcsubmit.common_structures import QCSpec
from openff.qcsubmit.procedures import GeometricProcedure
from openff.qcsubmit.workflow_components import ElementFilter

from openff.toolkit.topology import Molecule

# build the qcspec we want to add for ani2x
ani2x = QCSpec(method="ani2x", basis=None, program="torchani", spec_name="ani2x", spec_description="testing ani2x")
# try out the dlc coord system in geometric
geo = GeometricProcedure(coordsys="dlc")

# build the expansion schema
opt_expand = OptimizationExpansion(qc_specifications=ani2x, optimization_procedure=geo)

# add new tasks for all molecules in a dataset pulled from fractalclient
expand_compute(dataset=dataset, compute_schema=opt_expand)

# add compute to only a few molecules
target_mols = [Molecule.from_smiles("CC"), "CCO"] # support smiles or openff molecules
opt_expand.target_molecules = target_mols
expand_compute(dataset=dataset, compute_schema=opt_expand)

# add compute to only molecules which pass a filter
el_filter = ElementFilter(allowed_elements=["H", "C", "N", "O", "F", "S", "Cl"])
opt_expand.filters = el_filter
expand_compute(dataset=dataset, compute_schema=opt_expand)

this way each compute expansion would be its own schema which would allow users to submit multiple new specs for different subsets of the molecules. Maybe we would also want to the dataset name into the schema for provenance so users know what dataset it was applied to. The expand_compute method should also return a list of all of the indices which have had new tasks created.

The text was updated successfully, but these errors were encountered:

This was referenced Jan 27, 2021

Rework the compute expansions openforcefield/qca-dataset-submission#179

Open

Compute expansion fix #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No tasks made on compute expansions #93

No tasks made on compute expansions #93

jthorton commented Jan 27, 2021

No tasks made on compute expansions #93

No tasks made on compute expansions #93

Comments

jthorton commented Jan 27, 2021