MolSSI · loriab · Oct 7, 2024 · Oct 8, 2024 · Oct 8, 2024 · Oct 8, 2024
diff --git a/.github/workflows/CI.yaml b/.github/workflows/CI.yaml
@@ -39,7 +39,7 @@ jobs:
         if: matrix.python-version == '3.9'
         run: poetry install --no-interaction --no-ansi --extras test
       - name: Run tests
-        run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml
+        run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml -k "not pubchem_multiout_g"
       - name: Upload coverage to Codecov
         uses: codecov/codecov-action@v3 # NEEDS UPDATE TO v3 https://github.com/codecov/codecov-action
       - name: QCSchema Examples Deploy

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -27,6 +27,7 @@ Breaking Changes
 ++++++++++++++++
 * The very old model names `ResultInput`, `Result`, `ResultProperties`, `Optimization` deprecated in 2019 are now only available through `qcelelemental.models.v1`
 * ``models.v2`` do not support AutoDoc. The AutoDoc routines have been left at pydantic v1 syntax. Use autodoc-pydantic for Sphinx instead.
+* Unlike Levi's pyd v2, this doesn't forward define dict, copy, json to v2 models. Instead it backwards-defines model_dump, model_dump_json, model_copy to v1. This will impede upgrading but be cleaner in the long run. See commented-out functions to temporarily restore this functionality. v2.Molecule retains its dict for now
 
 New Features
 ++++++++++++
@@ -35,6 +36,8 @@ New Features
 
 Enhancements
 ++++++++++++
+* Fix a lot of warnings originating in this project.
+* `Molecule.extras` now defaults to `{}` rather than None in both v1 and v2. Input None converts to {} upon instantiation.
 * ``v2.FailedOperation`` field `id` is becoming `Optional[str]` instead of plain `str` so that the default validates.
 * v1.ProtoModel learned `model_copy`, `model_dump`, `model_dump_json` methods (all w/o warnings) so downstream can unify on newer syntax. Levi's work alternately/additionally taught v2 `copy`, `dict`, `json` (all w/warning) but dict has an alternate use in Pydantic v2.
 * ``AtomicInput`` and ``AtomicResult`` ``OptimizationInput``, ``OptimizationResult``, ``TorsionDriveInput``, ``TorsionDriveResult``, ``FailedOperation`` (both versions) learned a ``.convert_v(ver)`` function that returns self or the other version.

diff --git a/docs/models.rst b/docs/models.rst
@@ -7,7 +7,7 @@ as their base to provide serialization, validation, and manipluation.
 
 
 Basics
---------
+------
 
 Model creation occurs with a ``kwargs`` constructor as shown by equivalent operations below:
 
@@ -16,11 +16,27 @@ Model creation occurs with a ``kwargs`` constructor as shown by equivalent opera
     >>> mol = qcel.models.Molecule(symbols=["He"], geometry=[0, 0, 0])
     >>> mol = qcel.models.Molecule(**{"symbols":["He"], "geometry": [0, 0, 0]})
 
-A list of all available fields can be found by querying the ``fields`` attribute:
+Certain models (Molecule in particular) have additional convenience instantiation functions, like
+the below for hydroxide ion:
 
 .. code-block:: python
 
-    >>> mol.fields.keys()
+    >>> mol = qcel.models.Molecule.from_data("""
+              -1 1
+               O 0 0 0
+               H 0 0 1.2
+        """)
+
+A list of all available fields can be found by querying for fields:
+
+.. code-block:: python
+
+    # QCSchema v1 / Pydantic v1
+    >>> mol.__fields__.keys()
+    dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])
+
+    # QCSchema v2 / Pydantic v2
+    >>> mol.model_fields.keys()
     dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])
 
 These attributes can be accessed as shown:
@@ -37,11 +53,13 @@ Note that these models are typically immutable:
     >>> mol.symbols = ["Ne"]
     TypeError: "Molecule" is immutable and does not support item assignment
 
-To update or alter a model the ``copy`` command can be used with the ``update`` kwargs:
+To update or alter a model the ``model_copy`` command can be used with the ``update`` kwargs.
+Note that ``model_copy`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
+The older Pydantic v1 syntax, ``copy``, will only work on QCSchema v1 models.
 
 .. code-block:: python
 
-    >>> mol.copy(update={"symbols": ["Ne"]})
+    >>> mol.model_copy(update={"symbols": ["Ne"]})
     <    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:
 
            Center              X                  Y                   Z
@@ -53,26 +71,30 @@ To update or alter a model the ``copy`` command can be used with the ``update``
 Serialization
 -------------
 
-All models can be serialized back to their dictionary counterparts through the ``dict`` function:
+All models can be serialized back to their dictionary counterparts through the ``model_dump`` function:
+Note that ``model_dump`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
+The older Pydantic v1 syntax, ``dict``, will only work on QCSchema v1 models. It has a different effect on v2 models.
 
 .. code-block:: python
 
-    >>> mol.dict()
+    >>> mol.model_dump()
     {'symbols': ['He'], 'geometry': array([[0., 0., 0.]])}
 
 
 JSON representations are supported out of the box for all models:
+Note that ``model_dump_json`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
+The older Pydantic v1 syntax, ``json``, will only work on QCSchema v1 models.
 
 .. code-block:: python
 
-    >>> mol.json()
+    >>> mol.model_dump_json()
     '{"symbols": ["He"], "geometry": [0.0, 0.0, 0.0]}'
 
 Raw JSON can also be parsed back into a model:
 
 .. code-block:: python
 
-    >>> mol.parse_raw(mol.json())
+    >>> mol.parse_raw(mol.model_dump_json())
     <    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:
 
            Center              X                  Y                   Z
@@ -82,10 +104,120 @@ Raw JSON can also be parsed back into a model:
     >
 
 The standard ``dict`` operation returns all internal representations which may be classes or other complex structures.
-To return a JSON-like dictionary the ``dict`` function can be used:
+To return a JSON-like dictionary the ``model_dump`` function can be used:
 
 .. code-block:: python
 
-    >>> mol.dict(encoding='json')
+    >>> mol.model_dump(encoding='json')
     {'symbols': ['He'], 'geometry': [0.0, 0.0, 0.0]}
 
+
+QCSchema v2
+-----------
+
+Starting with QCElemental v0.50.0, a new "v2" version of QCSchema is accessible. In particular:
+
+* QCSchema v2 is written in Pydantic v2 syntax. (Note that a model with submodels may not mix Pydantic v1 and v2 models.)
+* Major QCSchema v2 models have field ``schema_version=2``. Note that Molecule has long had ``schema_version=2``, but this belongs to QCSchema v1. The QCSchema v2 Molecule has ``schema_version=3``.
+* QCSchema v2 has certain field rearrangements that make procedure models more composable. They also make v1 and v2 distinguishable in dictionary form.
+* QCSchema v2 does not include new features. It is purely a technical upgrade.
+
+Also see https://github.com/MolSSI/QCElemental/issues/323 for details and progress. The changelog contains details.
+
+The anticipated timeline is:
+
+* v0.50 — QCSchema v2 available. QCSchema v1 unchanged (files moved but imports will work w/o change). There will be beta releases.
+* v0.70 — QCSchema v2 will become the default. QCSchema v1 will remain available, but it will require specific import paths (available as soon as v0.50).
+* v1.0 — QCSchema v2 unchanged. QCSchema v1 dropped. Earliest 1 Jan 2026.
+
+Both QCSchema v1 and v2 will be available for quite awhile to allow downstream projects time to adjust.
+
+To make sure you're using QCSchema v1:
+
+.. code-block:: python
+
+    # replace 
+    >>> from qcelemental.models import AtomicResult, OptimizationInput
+    # by
+    >>> from qcelemental.models.v1 import AtomicResult, OptimizationInput
+
+To try out QCSchema v2:
+
+.. code-block:: python
+
+    # replace 
+    >>> from qcelemental.models import AtomicResult, OptimizationInput
+    # by
+    >>> from qcelemental.models.v2 import AtomicResult, OptimizationInput
+
+To figure out what model you're working with, you can look at its Pydantic base or its QCElemental base:
+
+.. code-block:: python
+
+    # make molecules
+    >>> mol1 = qcel.models.v1.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
+    >>> mol2 = qcel.models.v2.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
+    >>> print(mol1, mol2)
+    Molecule(name='HO', formula='HO', hash='6b7a42f') Molecule(name='HO', formula='HO', hash='6b7a42f')
+
+    # query v1 molecule
+    >>> isinstance(mol1, pydantic.v1.BaseModel)
+    True
+    >>> isinstance(mol1, pydantic.BaseModel)
+    False
+    >>> isinstance(mol1, qcel.models.v1.ProtoModel)
+    True
+    >>> isinstance(mol1, qcel.models.v2.ProtoModel)
+    False
+
+    # query v2 molecule
+    >>> isinstance(mol2, pydantic.v1.BaseModel)
+    False
+    >>> isinstance(mol2, pydantic.BaseModel)
+    True
+    >>> isinstance(mol2, qcel.models.v1.ProtoModel)
+    False
+    >>> isinstance(mol2, qcel.models.v2.ProtoModel)
+    True
+
+Most high-level models (e.g., ``AtomicInput``, not ``Provenance``) have a ``convert_v`` function to convert between QCSchema versions. It returns the input object if called with the current version.
+
+.. code-block:: python
+
+    >>> inp1 = qcel.models.v1.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol1)
+    >>> print(inp1)
+    AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
+    >>> inp1.schema_version
+    1
+    >>> inp2 = qcel.models.v2.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol2)
+    >>> print(inp2)
+    AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
+    >>> inp2.schema_version
+    2
+
+    # now convert
+    >>> inp1_now2 = inp1.convert_v(2)
+    >>> print(inp1_now2.schema_version)
+    2
+    >>> inp2_now1 = inp1.convert_v(1)
+    >>> print(inp2_now1.schema_version)
+    1
+
+Error messages aren't necessarily helpful in the upgrade process.
+
+.. code-block:: python
+
+    # This usually means you're calling Pydantic v1 functions (dict, json, copy) on a Pydantic v2 model.
+    # There are dict and copy functions commented out in qcelemental/models/v2/basemodels.py that you
+    #   can uncomment and use temporarily to ease the upgrade, but the preferred route is to switch to
+    #   model_dump, model_dump_json, model_copy that work on QCSchema v1 and v2 models.
+    >>> TypeError: ProtoModel.serialize() got an unexpected keyword argument 'by_alias'
+
+    # This usually means you're mixing a v1 model into a v2 model. Check all the imports from
+    #   qcelemental.models for version specificity. If the import can't be updated, run `convert_v`
+    #   on the model.
+    >>> pydantic_core._pydantic_core.ValidationError: 1 validation error for AtomicInput
+    >>> molecule
+    >>>   Input should be a valid dictionary or instance of Molecule [type=model_type, input_value=Molecule(name='HO', formula='HO', hash='6b7a42f'), input_type=Molecule]
+    >>>     For further information visit https://errors.pydantic.dev/2.5/v/model_type
+
diff --git a/qcelemental/models/v1/molecule.py b/qcelemental/models/v1/molecule.py
@@ -23,7 +23,7 @@
 from ...testing import compare, compare_values
 from ...util import deserialize, measure_coordinates, msgpackext_loads, provenance_stamp, which_import
 from .basemodels import ProtoModel, qcschema_draft
-from .common_models import Provenance, qcschema_molecule_default
+from .common_models import Provenance, check_convertible_version, qcschema_molecule_default
 from .types import Array
 
 if TYPE_CHECKING:
@@ -290,7 +290,7 @@ class Molecule(ProtoModel):
         "never need to be manually set.",
     )
     extras: Dict[str, Any] = Field(  # type: ignore
-        None,
+        {},
         description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
     )
 
@@ -350,7 +350,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
             kwargs = {**kwargs, **schema}  # Allow any extra fields
             validate = True
 
-        if "extras" not in kwargs:
+        if "extras" not in kwargs or kwargs["extras"] is None:  # latter re-defaults to empty dict
             kwargs["extras"] = {}
         super().__init__(**kwargs)
 
@@ -552,10 +552,12 @@ def __eq__(self, other):
         by scientific terms, and not programing terms, so it's less rigorous than
         a programmatic equality or a memory equivalent `is`.
         """
+        import qcelemental
 
         if isinstance(other, dict):
             other = Molecule(orient=False, **other)
-        elif isinstance(other, Molecule):
+        elif isinstance(other, (qcelemental.models.v2.Molecule, Molecule)):
+            # allow v2 on grounds of "scientific, not programming terms"
             pass
         else:
             raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))
@@ -1413,6 +1415,19 @@ def scramble(
 
         return cmol, {"rmsd": rmsd, "mill": perturbation}
 
+    def convert_v(self, version):  # , *, **kwargs):
+        import qcelemental as qcel
+
+        # TODO: since Mol is v2/v3 while everything else is v1/v2, reconsider this
+        if check_convertible_version(version, error="Molecule") == "self":
+            return self
+
+        dself = self.dict()
+        if version == 2:
+            self_vN = qcel.models.v2.Molecule(**dself)
+
+        return self_vN
+
 
 def _filter_defaults(dicary):
     nat = len(dicary["symbols"])

diff --git a/qcelemental/models/v2/basemodels.py b/qcelemental/models/v2/basemodels.py
@@ -133,9 +133,11 @@ def parse_file(cls, path: Union[str, Path], *, encoding: Optional[str] = None) -
 
         return cls.parse_raw(path.read_bytes(), encoding=encoding)
 
-    def dict(self, **kwargs) -> Dict[str, Any]:
-        warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
-        return self.model_dump(**kwargs)
+    # UNCOMMENT IF NEEDED FOR UPGRADE
+    #   defining this is maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested
+    # def dict(self, **kwargs) -> Dict[str, Any]:
+    #    warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
+    #    return self.model_dump(**kwargs)
 
     @model_serializer(mode="wrap")
     def _serialize_model(self, handler) -> Dict[str, Any]:
@@ -235,6 +237,7 @@ def serialize(
 
         return serialize(data, encoding=encoding)
 
+    # UNCOMMENT IF NEEDED FOR UPGRADE REDO!!!
     def json(self, **kwargs):
         # Alias JSON here from BaseModel to reflect dict changes
         warnings.warn("The `json` method is deprecated; use `model_dump_json` instead.", DeprecationWarning)

diff --git a/qcelemental/models/v2/molecule.py b/qcelemental/models/v2/molecule.py
@@ -30,7 +30,7 @@
 from ...testing import compare, compare_values
 from ...util import deserialize, measure_coordinates, msgpackext_loads, provenance_stamp, which_import
 from .basemodels import ProtoModel, qcschema_draft
-from .common_models import Provenance, qcschema_molecule_default
+from .common_models import Provenance, check_convertible_version, qcschema_molecule_default
 from .types import Array
 
 if TYPE_CHECKING:
@@ -334,7 +334,7 @@ class Molecule(ProtoModel):
         "never need to be manually set.",
     )
     extras: Dict[str, Any] = Field(  # type: ignore
-        None,
+        {},
         description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
     )
 
@@ -382,7 +382,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
             kwargs = {**kwargs, **schema}  # Allow any extra fields
             validate = True
 
-        if "extras" not in kwargs:
+        if "extras" not in kwargs or kwargs["extras"] is None:  # latter re-defaults to empty dict
             kwargs["extras"] = {}
         super().__init__(**kwargs)
 
@@ -588,19 +588,23 @@ def __eq__(self, other):
         by scientific terms, and not programing terms, so it's less rigorous than
         a programmatic equality or a memory equivalent `is`.
         """
+        import qcelemental
 
         if isinstance(other, dict):
             other = Molecule(orient=False, **other)
-        elif isinstance(other, Molecule):
+        elif isinstance(other, (Molecule, qcelemental.models.v1.Molecule)):
+            # allow v2 on grounds of "scientific, not programming terms"
             pass
         else:
             raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))
 
         return self.get_hash() == other.get_hash()
 
+    # UNCOMMENT IF NEEDED FOR UPGRADE REDO??
     def dict(self, **kwargs):
         warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
         return self.model_dump(**kwargs)
+        # TODO maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested
 
     @model_serializer(mode="wrap")
     def _serialize_molecule(self, handler) -> Dict[str, Any]:
@@ -1463,6 +1467,19 @@ def scramble(
 
         return cmol, {"rmsd": rmsd, "mill": perturbation}
 
+    def convert_v(self, version):
+        import qcelemental as qcel
+
+        # TODO: since Mol is v2/v3 while everything else is v1/v2, reconsider this
+        if check_convertible_version(version, error="Molecule") == "self":
+            return self
+
+        dself = self.model_dump()
+        if version == 1:
+            self_vN = qcel.models.v1.Molecule(**dself)
+
+        return self_vN
+
 
 def _filter_defaults(dicary):
     nat = len(dicary["symbols"])