You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #80 introduced threading to submissions by chunking up the total list of newly added entries and passing them to each thread. However, if we do a compute expansion where no new entries are added but we just want to create more tasks for a new specification, tasks will not be made as the entries list will be empty. To fix this we should get the list of entries from the dataset, this will be fine for now but might cause issues in future where we want to add compute to a subset of a dataset.
It might be better to create a pathway that better handles compute expansions explicitly and give users more options, for example, we could create a new ComputeExpansionschema which can take some QCSpecification and a dataset and can add new tasks, we could also add subset support which is not currently available. Here users could add a list of entries or a list of molecules for which they want to add compute. We could also support some of the filters built into qcsubmit and only add tasks for molecules which pass the filter. This would help in cases where we add ani2x support we can filter out molecules not covered by the model. I imagine it to look something like this
fromopenff.qcsubmit.compute_expansionimportOptimizationExpansion, expand_computefromopenff.qcsubmit.common_structuresimportQCSpecfromopenff.qcsubmit.proceduresimportGeometricProcedurefromopenff.qcsubmit.workflow_componentsimportElementFilterfromopenff.toolkit.topologyimportMolecule# build the qcspec we want to add for ani2xani2x=QCSpec(method="ani2x", basis=None, program="torchani", spec_name="ani2x", spec_description="testing ani2x")
# try out the dlc coord system in geometricgeo=GeometricProcedure(coordsys="dlc")
# build the expansion schemaopt_expand=OptimizationExpansion(qc_specifications=ani2x, optimization_procedure=geo)
# add new tasks for all molecules in a dataset pulled from fractalclientexpand_compute(dataset=dataset, compute_schema=opt_expand)
# add compute to only a few moleculestarget_mols= [Molecule.from_smiles("CC"), "CCO"] # support smiles or openff moleculesopt_expand.target_molecules=target_molsexpand_compute(dataset=dataset, compute_schema=opt_expand)
# add compute to only molecules which pass a filterel_filter=ElementFilter(allowed_elements=["H", "C", "N", "O", "F", "S", "Cl"])
opt_expand.filters=el_filterexpand_compute(dataset=dataset, compute_schema=opt_expand)
this way each compute expansion would be its own schema which would allow users to submit multiple new specs for different subsets of the molecules. Maybe we would also want to the dataset name into the schema for provenance so users know what dataset it was applied to. The expand_compute method should also return a list of all of the indices which have had new tasks created.
The text was updated successfully, but these errors were encountered:
PR #80 introduced threading to submissions by chunking up the total list of newly added entries and passing them to each thread. However, if we do a compute expansion where no new entries are added but we just want to create more tasks for a new specification, tasks will not be made as the entries list will be empty. To fix this we should get the list of entries from the dataset, this will be fine for now but might cause issues in future where we want to add compute to a subset of a dataset.
It might be better to create a pathway that better handles compute expansions explicitly and give users more options, for example, we could create a new
ComputeExpansion
schema which can take some QCSpecification and a dataset and can add new tasks, we could also add subset support which is not currently available. Here users could add a list of entries or a list of molecules for which they want to add compute. We could also support some of the filters built into qcsubmit and only add tasks for molecules which pass the filter. This would help in cases where we add ani2x support we can filter out molecules not covered by the model. I imagine it to look something like thisthis way each compute expansion would be its own schema which would allow users to submit multiple new specs for different subsets of the molecules. Maybe we would also want to the dataset name into the schema for provenance so users know what dataset it was applied to. The
expand_compute
method should also return a list of all of the indices which have had new tasks created.The text was updated successfully, but these errors were encountered: