Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Migration-Story for ZODB with Plone from Python 2 to 3 #2525

Closed
7 of 10 tasks
pbauer opened this issue Sep 27, 2018 · 15 comments
Closed
7 of 10 tasks

Provide Migration-Story for ZODB with Plone from Python 2 to 3 #2525

pbauer opened this issue Sep 27, 2018 · 15 comments

Comments

@pbauer
Copy link
Member

pbauer commented Sep 27, 2018

ZODB itself is compatible with Python 3 but a DB created in Python 2.7 cannot be used in Python 3 without being modified before.

After some evaluation of different approaches (see https://blog.gocept.com/2018/06/07/migrate-a-zope-zodb-data-fs-to-python-3) https://github.com/zopefoundation/zodbupdate#migration-to-python-3 seems to be a good approach.

If you want to contribute to the documentation or implementation of ZODB Python 3 migration for Plone this README provides some introduction and background information that helps you to get started.

We need to:

improvements for zodbupdate that will make migrator's lifes easier:

@pbauer pbauer added this to the Plone 5.2 milestone Sep 27, 2018
@pbauer
Copy link
Member Author

pbauer commented Sep 28, 2018

See zopefoundation/Zope#285 for the zodbupdate_decode_dict in Zope.

@davisagli
Copy link
Member

davisagli commented Sep 29, 2018

I played with zodbupdate a bit a week or so ago and it seems promising, but will probably take some work to get great results.

Here's the basic path:

  1. Set up a buildout that includes Plone 5.2 in Python 2, plus the zodbupdate and the following mr.developer checkouts:
  1. Run bin/zodbupdate --pack --convert-py3 -f path/to/Data.fs. This will do an in-place migration of the filestorage so make sure you do it on a copy if you want to keep using it in Python 2.
  2. Copy the filestorage over to a buildout with the py3.cfg build of Plone 5.2, on Python 3.
  3. Start the site with bin/wsgi and look for what works and what doesn't. If you find objects with decode errors, figure out what attributes are the problem (i.e. which ones should not be converted from bytes to str -- a pdb in ZODB.serialize is helpful) and add a zodbupdate mapping for those attributes, rinse, and repeat.

So far I only tried this with a fresh Plone site, so no real data. It would be interesting to try with a real site to get a sense of how long the migration takes in practice.

Some remaining issues I found:

  • After migration, logging in does not work. That's because AuthEncoding.is_encrypted expects bytes but is getting str (the hashed passwords in the ZODBUserManager were migrated). This one is tricky because they are in a BTree and we can't just add a zodbupdate mapping to avoid migrating all BTree values to str. So we don't have a good way to target this particular BTree for avoiding str-ification. Maybe we need to convert the str back to bytes at runtime before passing it to AuthEncoding.
  • I got a weird error on the redirect after saving an edit to the homepage -- so far I couldn't figure out which object is causing the problem. (Something tells me catalog?) But after reloading, I could see my edited text.

I'm sure there are more issues. Some things that come to mind to pay attention to:

  • Any attributes that are declared as Bytes or BytesLine in a Zope schema, or string/text/lines in a Zope property, may need "binary" entries in a zodbupdate decode mapping.
  • Likewise for any attributes that store binary data without being declared like that. From the above checkouts we already have mappings for OFS.Image.Pdata and the message of a ZopeVersionControl LogEntry (which is actually a pickle stored inside a ZODB pickle!) We don't need to do this for blobs though.
  • Any attributes that store data which should become unicode text may need a zodbupdate decode mapping to say which encoding to use during conversion (probably utf-8 in most cases). Maybe we want to add an option to zodbupdate to specify a default encoding to avoid adding mappings for the common case.

@thefunny42
Copy link

I added the options to migrate zodbupdate to migrate my application to Python3. That worked fine, and we are running Python 3 in production since March. There's some stuff to know:

  • zodbupdate out of the box will break your blobs if you use the filesystem option: the script goes over the whole database, transform the records and recommit them in a new transaction. The blob files need to be renamed in order to match the new transaction id.
  • zodbupdates takes care of two big problems: bytes in date/datetime objects and zodb references, but for other strings that should be converted to bytes, or otherwise decoded, you need to identify them yourself. Hopefully if you used unicode everywhere, there will not be any problems.
  • In Python 3, strings store utf-8 by default, you do not need to change anything there.
  • We did have some issues with zope.index.text, which has an optimisation that stores non unicode code in Python 2 strings, but uses strings in Python 3 (instead of bytes, which would have been the proper thing to do). We basically made an helper script that would go over the indexes and decode them as "raw-unicode-escape" in the database before doing zodbupdate. This would be the strategy to convert strings in btrees for instance, or anything you cannot target with zodbupdate.
  • The only thing we did keep as bytes in our database are password hashes actually. We used unicode and therefore strings in Python 3 everywhere, so we did not have much troubles there.
  • We used a different script to run zodbupdate in production: https://github.com/minddistrict/mdtools.relstorage (version 2.0). This does the same thing, except directly rewrite the records in Postgresql, mostly for speed reason (zodbupdate is not very fast on large databases).

I doubt migrating a Plone site will be easy, and will depend a lot on the extensions that has been installed and the custom code written, since they should be checked for strings/bytes problems.

@icemac
Copy link

icemac commented Oct 2, 2018

Although zodb.py3migrate cannot be used to do the actual migration (see my blog post), it has an analysis step which shows the objects which might need a conversion. Maybe this is easier than the approach to try out and see what breaks. See https://zodbpy3migrate.readthedocs.io/doc.html#upgrade-workflow for the documentation of the analysis step.

@frisi frisi self-assigned this Oct 2, 2018
frisi added a commit to plone/buildout.coredev that referenced this issue Oct 2, 2018
@frisi
Copy link
Contributor

frisi commented Oct 3, 2018

We did have some issues with zope.index.text, which has an optimisation that stores non unicode code in Python 2 strings, but uses strings in Python 3 (instead of bytes, which would have been the proper thing to do).
We basically made an helper script that would go over the indexes and decode them as "raw-unicode-escape" in the database before doing zodbupdate. This would be the strategy to convert strings in btrees for instance, or anything you cannot target with zodbupdate.

Thanks for sharing your experience @thefunny42! could you please email me this script or post the relevant parts here/as gist so i can use it for documenting the migration of plone sites? Thanks a lot!

frisi added a commit to plone/buildout.coredev that referenced this issue Oct 3, 2018
@frisi
Copy link
Contributor

frisi commented Oct 3, 2018

i prepared a buildout and documented the process of creating a sample plonesite running python2 and migrate it to python3.

you can find everything under https://github.com/frisi/coredev52multipy/tree/zodbupdate
this should help users new to the topic (eg pickles, string handling in python2 VS python3) understand the problem and how to debug and fix problems during migration.

i also started to document the plone-specific problems and possible solutions there. it is pretty much a summary of @davisagli @thefunny42 and @icemac writeups including some information on where to hook into to fix it.
i'd like to discuss these in the hangout today with you guys

@thefunny42
Copy link

thefunny42 commented Oct 3, 2018

Some additional information:

  • The way we debugged our database to see if the migration worked was to unpickle all the records there was in it, we used the zodbsearch/relsearch script you can find in our mdtools repository,
  • I remember something with zope.schema where ASCII fields are based on zope.schema.Bytes in Python 2 but on zope.schema.Text in 3. We did replace in our application BytesLine with ASCIILine (because the BytesLine would actually store things in Python 2 str, and ASCIILine was ok for what we did).
  • Fixing our text index did not require any magic. In Python 2:
def fix_text_index(index):
    if not zope.catalog.text.ITextIndex.providedBy(index):
        return
    words = index.index._docwords
    count = 0
    for k, v in list(words.items()):
        if isinstance(v, str):
            count += 1
            words[k] = v.decode('raw-unicode-escape')
    if count:
        print('Updated {} words.'.format(count))
    return count != 0

@davisagli
Copy link
Member

@frisi I've only skimmed your writeup so far, but it looks really great! The same results I was discovering, but much more clearly written.

@frisi
Copy link
Contributor

frisi commented Oct 7, 2018

i removed myself as an assignee as i won't be able to carry on with the zodb-py3 migration in the near future. hope my current findings and documentation will help other contributors to get startet.

@thefunny42 thanks for your comments and fixes on the zodbupdate tickets/PR.

@davisagli could you please have a look at the updated ticket description. i tried to add an overview over the currently known migration tasks and created/linked tickets where i summarized the current state.
threre is also a PR with a rough draft of the database migration in plone/documentation#1022. if you feel that there is important information missing please add it to the docs or the list in this ticket description so we do not forget anything

thank you all for you help on this topic and happy migrating ;-)

@jensens
Copy link
Member

jensens commented Feb 7, 2019

FYI: I added [zodbupdate] section to buildout.coredev using @davisagli branch (added to sources and auto-checkout.

I used the script to convert an almost vanilla Plone 5.2 py2 DB (p.a.multilingual installed) to py3 and it worked. I did not in-depth testing on the DB, but the content is shown, login works, edit works.

@pbauer
Copy link
Member Author

pbauer commented Feb 25, 2019

@jensens
Copy link
Member

jensens commented May 10, 2019

At Saltlabs Sprint @dwt and I worked on the migration story for Plone and ZMS.

  • Now the Zope upgrade guide contains an updated process.
  • The Plone migration guide was updated as well.
  • zodbupdate was enhanced and released.
  • The CMFPlone ZODB verification script verifydb was factored out into a standlone tool zodbverify.

I would say we now have a good documented working migration story.
We could be better in explaining what to do in case of failures and how to fix them, but thats a nice to have and may evolve over time with projects migrated.

@jensens
Copy link
Member

jensens commented May 10, 2019

Note: The catalog problem is probably solved. It needs one more check.

@jensens
Copy link
Member

jensens commented May 10, 2019

ad "write the required zodbupdate_decode_dict for all the packages in Plone that need it":

I would say all is done here. But we may need more real life migrations to verify.

@pbauer
Copy link
Member Author

pbauer commented Apr 23, 2021

I consider this done. Additional docs are in https://community.plone.org/t/best-practice-documentation-on-zodb-debugging/12778

@pbauer pbauer closed this as completed Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants