Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csse pyd2 moltesting #355

Draft
wants to merge 7 commits into
base: next2024
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
if: matrix.python-version == '3.9'
run: poetry install --no-interaction --no-ansi --extras test
- name: Run tests
run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml
run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml -k "not pubchem_multiout_g"
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3 # NEEDS UPDATE TO v3 https://github.com/codecov/codecov-action
- name: QCSchema Examples Deploy
Expand Down
3 changes: 3 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Breaking Changes
++++++++++++++++
* The very old model names `ResultInput`, `Result`, `ResultProperties`, `Optimization` deprecated in 2019 are now only available through `qcelelemental.models.v1`
* ``models.v2`` do not support AutoDoc. The AutoDoc routines have been left at pydantic v1 syntax. Use autodoc-pydantic for Sphinx instead.
* Unlike Levi's pyd v2, this doesn't forward define dict, copy, json to v2 models. Instead it backwards-defines model_dump, model_dump_json, model_copy to v1. This will impede upgrading but be cleaner in the long run. See commented-out functions to temporarily restore this functionality. v2.Molecule retains its dict for now

New Features
++++++++++++
Expand All @@ -35,6 +36,8 @@ New Features

Enhancements
++++++++++++
* Fix a lot of warnings originating in this project.
* `Molecule.extras` now defaults to `{}` rather than None in both v1 and v2. Input None converts to {} upon instantiation.
* ``v2.FailedOperation`` field `id` is becoming `Optional[str]` instead of plain `str` so that the default validates.
* v1.ProtoModel learned `model_copy`, `model_dump`, `model_dump_json` methods (all w/o warnings) so downstream can unify on newer syntax. Levi's work alternately/additionally taught v2 `copy`, `dict`, `json` (all w/warning) but dict has an alternate use in Pydantic v2.
* ``AtomicInput`` and ``AtomicResult`` ``OptimizationInput``, ``OptimizationResult``, ``TorsionDriveInput``, ``TorsionDriveResult``, ``FailedOperation`` (both versions) learned a ``.convert_v(ver)`` function that returns self or the other version.
Expand Down
154 changes: 143 additions & 11 deletions docs/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ as their base to provide serialization, validation, and manipluation.


Basics
--------
------

Model creation occurs with a ``kwargs`` constructor as shown by equivalent operations below:

Expand All @@ -16,11 +16,27 @@ Model creation occurs with a ``kwargs`` constructor as shown by equivalent opera
>>> mol = qcel.models.Molecule(symbols=["He"], geometry=[0, 0, 0])
>>> mol = qcel.models.Molecule(**{"symbols":["He"], "geometry": [0, 0, 0]})

A list of all available fields can be found by querying the ``fields`` attribute:
Certain models (Molecule in particular) have additional convenience instantiation functions, like
the below for hydroxide ion:

.. code-block:: python

>>> mol.fields.keys()
>>> mol = qcel.models.Molecule.from_data("""
-1 1
O 0 0 0
H 0 0 1.2
""")

A list of all available fields can be found by querying for fields:

.. code-block:: python

# QCSchema v1 / Pydantic v1
>>> mol.__fields__.keys()
dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])

# QCSchema v2 / Pydantic v2
>>> mol.model_fields.keys()
dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])

These attributes can be accessed as shown:
Expand All @@ -37,11 +53,13 @@ Note that these models are typically immutable:
>>> mol.symbols = ["Ne"]
TypeError: "Molecule" is immutable and does not support item assignment

To update or alter a model the ``copy`` command can be used with the ``update`` kwargs:
To update or alter a model the ``model_copy`` command can be used with the ``update`` kwargs.
Note that ``model_copy`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``copy``, will only work on QCSchema v1 models.

.. code-block:: python

>>> mol.copy(update={"symbols": ["Ne"]})
>>> mol.model_copy(update={"symbols": ["Ne"]})
< Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

Center X Y Z
Expand All @@ -53,26 +71,30 @@ To update or alter a model the ``copy`` command can be used with the ``update``
Serialization
-------------

All models can be serialized back to their dictionary counterparts through the ``dict`` function:
All models can be serialized back to their dictionary counterparts through the ``model_dump`` function:
Note that ``model_dump`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``dict``, will only work on QCSchema v1 models. It has a different effect on v2 models.

.. code-block:: python

>>> mol.dict()
>>> mol.model_dump()
{'symbols': ['He'], 'geometry': array([[0., 0., 0.]])}


JSON representations are supported out of the box for all models:
Note that ``model_dump_json`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``json``, will only work on QCSchema v1 models.

.. code-block:: python

>>> mol.json()
>>> mol.model_dump_json()
'{"symbols": ["He"], "geometry": [0.0, 0.0, 0.0]}'

Raw JSON can also be parsed back into a model:

.. code-block:: python

>>> mol.parse_raw(mol.json())
>>> mol.parse_raw(mol.model_dump_json())
< Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

Center X Y Z
Expand All @@ -82,10 +104,120 @@ Raw JSON can also be parsed back into a model:
>

The standard ``dict`` operation returns all internal representations which may be classes or other complex structures.
To return a JSON-like dictionary the ``dict`` function can be used:
To return a JSON-like dictionary the ``model_dump`` function can be used:

.. code-block:: python

>>> mol.dict(encoding='json')
>>> mol.model_dump(encoding='json')
{'symbols': ['He'], 'geometry': [0.0, 0.0, 0.0]}


QCSchema v2
-----------

Starting with QCElemental v0.50.0, a new "v2" version of QCSchema is accessible. In particular:

* QCSchema v2 is written in Pydantic v2 syntax. (Note that a model with submodels may not mix Pydantic v1 and v2 models.)
* Major QCSchema v2 models have field ``schema_version=2``. Note that Molecule has long had ``schema_version=2``, but this belongs to QCSchema v1. The QCSchema v2 Molecule has ``schema_version=3``.
* QCSchema v2 has certain field rearrangements that make procedure models more composable. They also make v1 and v2 distinguishable in dictionary form.
* QCSchema v2 does not include new features. It is purely a technical upgrade.

Also see https://github.com/MolSSI/QCElemental/issues/323 for details and progress. The changelog contains details.

The anticipated timeline is:

* v0.50 — QCSchema v2 available. QCSchema v1 unchanged (files moved but imports will work w/o change). There will be beta releases.
* v0.70 — QCSchema v2 will become the default. QCSchema v1 will remain available, but it will require specific import paths (available as soon as v0.50).
* v1.0 — QCSchema v2 unchanged. QCSchema v1 dropped. Earliest 1 Jan 2026.

Both QCSchema v1 and v2 will be available for quite awhile to allow downstream projects time to adjust.

To make sure you're using QCSchema v1:

.. code-block:: python

# replace
>>> from qcelemental.models import AtomicResult, OptimizationInput
# by
>>> from qcelemental.models.v1 import AtomicResult, OptimizationInput

To try out QCSchema v2:

.. code-block:: python

# replace
>>> from qcelemental.models import AtomicResult, OptimizationInput
# by
>>> from qcelemental.models.v2 import AtomicResult, OptimizationInput

To figure out what model you're working with, you can look at its Pydantic base or its QCElemental base:

.. code-block:: python

# make molecules
>>> mol1 = qcel.models.v1.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
>>> mol2 = qcel.models.v2.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
>>> print(mol1, mol2)
Molecule(name='HO', formula='HO', hash='6b7a42f') Molecule(name='HO', formula='HO', hash='6b7a42f')

# query v1 molecule
>>> isinstance(mol1, pydantic.v1.BaseModel)
True
>>> isinstance(mol1, pydantic.BaseModel)
False
>>> isinstance(mol1, qcel.models.v1.ProtoModel)
True
>>> isinstance(mol1, qcel.models.v2.ProtoModel)
False

# query v2 molecule
>>> isinstance(mol2, pydantic.v1.BaseModel)
False
>>> isinstance(mol2, pydantic.BaseModel)
True
>>> isinstance(mol2, qcel.models.v1.ProtoModel)
False
>>> isinstance(mol2, qcel.models.v2.ProtoModel)
True

Most high-level models (e.g., ``AtomicInput``, not ``Provenance``) have a ``convert_v`` function to convert between QCSchema versions. It returns the input object if called with the current version.

.. code-block:: python

>>> inp1 = qcel.models.v1.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol1)
>>> print(inp1)
AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
>>> inp1.schema_version
1
>>> inp2 = qcel.models.v2.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol2)
>>> print(inp2)
AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
>>> inp2.schema_version
2

# now convert
>>> inp1_now2 = inp1.convert_v(2)
>>> print(inp1_now2.schema_version)
2
>>> inp2_now1 = inp1.convert_v(1)
>>> print(inp2_now1.schema_version)
1

Error messages aren't necessarily helpful in the upgrade process.

.. code-block:: python

# This usually means you're calling Pydantic v1 functions (dict, json, copy) on a Pydantic v2 model.
# There are dict and copy functions commented out in qcelemental/models/v2/basemodels.py that you
# can uncomment and use temporarily to ease the upgrade, but the preferred route is to switch to
# model_dump, model_dump_json, model_copy that work on QCSchema v1 and v2 models.
>>> TypeError: ProtoModel.serialize() got an unexpected keyword argument 'by_alias'

# This usually means you're mixing a v1 model into a v2 model. Check all the imports from
# qcelemental.models for version specificity. If the import can't be updated, run `convert_v`
# on the model.
>>> pydantic_core._pydantic_core.ValidationError: 1 validation error for AtomicInput
>>> molecule
>>> Input should be a valid dictionary or instance of Molecule [type=model_type, input_value=Molecule(name='HO', formula='HO', hash='6b7a42f'), input_type=Molecule]
>>> For further information visit https://errors.pydantic.dev/2.5/v/model_type

23 changes: 19 additions & 4 deletions qcelemental/models/v1/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from ...testing import compare, compare_values
from ...util import deserialize, measure_coordinates, msgpackext_loads, provenance_stamp, which_import
from .basemodels import ProtoModel, qcschema_draft
from .common_models import Provenance, qcschema_molecule_default
from .common_models import Provenance, check_convertible_version, qcschema_molecule_default
from .types import Array

if TYPE_CHECKING:
Expand Down Expand Up @@ -290,7 +290,7 @@ class Molecule(ProtoModel):
"never need to be manually set.",
)
extras: Dict[str, Any] = Field( # type: ignore
None,
{},
description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
)

Expand Down Expand Up @@ -350,7 +350,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
kwargs = {**kwargs, **schema} # Allow any extra fields
validate = True

if "extras" not in kwargs:
if "extras" not in kwargs or kwargs["extras"] is None: # latter re-defaults to empty dict
kwargs["extras"] = {}
super().__init__(**kwargs)

Expand Down Expand Up @@ -552,10 +552,12 @@ def __eq__(self, other):
by scientific terms, and not programing terms, so it's less rigorous than
a programmatic equality or a memory equivalent `is`.
"""
import qcelemental

if isinstance(other, dict):
other = Molecule(orient=False, **other)
elif isinstance(other, Molecule):
elif isinstance(other, (qcelemental.models.v2.Molecule, Molecule)):
# allow v2 on grounds of "scientific, not programming terms"
pass
else:
raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))
Expand Down Expand Up @@ -1413,6 +1415,19 @@ def scramble(

return cmol, {"rmsd": rmsd, "mill": perturbation}

def convert_v(self, version): # , *, **kwargs):
import qcelemental as qcel

# TODO: since Mol is v2/v3 while everything else is v1/v2, reconsider this
if check_convertible_version(version, error="Molecule") == "self":
return self

dself = self.dict()
if version == 2:
self_vN = qcel.models.v2.Molecule(**dself)

return self_vN


def _filter_defaults(dicary):
nat = len(dicary["symbols"])
Expand Down
9 changes: 6 additions & 3 deletions qcelemental/models/v2/basemodels.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,11 @@ def parse_file(cls, path: Union[str, Path], *, encoding: Optional[str] = None) -

return cls.parse_raw(path.read_bytes(), encoding=encoding)

def dict(self, **kwargs) -> Dict[str, Any]:
warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
return self.model_dump(**kwargs)
# UNCOMMENT IF NEEDED FOR UPGRADE
# defining this is maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested
# def dict(self, **kwargs) -> Dict[str, Any]:
# warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
# return self.model_dump(**kwargs)

@model_serializer(mode="wrap")
def _serialize_model(self, handler) -> Dict[str, Any]:
Expand Down Expand Up @@ -235,6 +237,7 @@ def serialize(

return serialize(data, encoding=encoding)

# UNCOMMENT IF NEEDED FOR UPGRADE REDO!!!
def json(self, **kwargs):
# Alias JSON here from BaseModel to reflect dict changes
warnings.warn("The `json` method is deprecated; use `model_dump_json` instead.", DeprecationWarning)
Expand Down
25 changes: 21 additions & 4 deletions qcelemental/models/v2/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from ...testing import compare, compare_values
from ...util import deserialize, measure_coordinates, msgpackext_loads, provenance_stamp, which_import
from .basemodels import ProtoModel, qcschema_draft
from .common_models import Provenance, qcschema_molecule_default
from .common_models import Provenance, check_convertible_version, qcschema_molecule_default
from .types import Array

if TYPE_CHECKING:
Expand Down Expand Up @@ -334,7 +334,7 @@ class Molecule(ProtoModel):
"never need to be manually set.",
)
extras: Dict[str, Any] = Field( # type: ignore
None,
{},
description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
)

Expand Down Expand Up @@ -382,7 +382,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
kwargs = {**kwargs, **schema} # Allow any extra fields
validate = True

if "extras" not in kwargs:
if "extras" not in kwargs or kwargs["extras"] is None: # latter re-defaults to empty dict
kwargs["extras"] = {}
super().__init__(**kwargs)

Expand Down Expand Up @@ -588,19 +588,23 @@ def __eq__(self, other):
by scientific terms, and not programing terms, so it's less rigorous than
a programmatic equality or a memory equivalent `is`.
"""
import qcelemental

if isinstance(other, dict):
other = Molecule(orient=False, **other)
elif isinstance(other, Molecule):
elif isinstance(other, (Molecule, qcelemental.models.v1.Molecule)):
# allow v2 on grounds of "scientific, not programming terms"
pass
else:
raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))

return self.get_hash() == other.get_hash()

# UNCOMMENT IF NEEDED FOR UPGRADE REDO??
def dict(self, **kwargs):
warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
return self.model_dump(**kwargs)
# TODO maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested

@model_serializer(mode="wrap")
def _serialize_molecule(self, handler) -> Dict[str, Any]:
Expand Down Expand Up @@ -1463,6 +1467,19 @@ def scramble(

return cmol, {"rmsd": rmsd, "mill": perturbation}

def convert_v(self, version):
import qcelemental as qcel

# TODO: since Mol is v2/v3 while everything else is v1/v2, reconsider this
if check_convertible_version(version, error="Molecule") == "self":
return self

dself = self.model_dump()
if version == 1:
self_vN = qcel.models.v1.Molecule(**dself)

return self_vN


def _filter_defaults(dicary):
nat = len(dicary["symbols"])
Expand Down
Loading
Loading