Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce our dependencies #20

Open
jamadden opened this issue Sep 18, 2017 · 7 comments
Open

Reduce our dependencies #20

jamadden opened this issue Sep 18, 2017 · 7 comments

Comments

@jamadden
Copy link
Member

jamadden commented Sep 18, 2017

We have a massive dependency tree:

Successfully installed Acquisition-4.4.2 BTrees-4.4.1 ExtensionClass-4.3.0 
PyYAML-3.12 ZConfig-3.2.0 ZEO-5.1.0 ZODB-5.3.0 fudge-1.1.0 isodate-0.5.4 
nti.externalization nti.i18n-1.0.0 nti.property-1.0.0 nti.schema-1.3.0 nti.testing-2.0.0
nti.wref-1.0.0 nti.zodb-1.0.0 perfmetrics-2.0 persistent-4.2.4.2 
pyhamcrest-1.9.0 pytz-2017.2 repoze.zodbconn-0.15 simplejson-3.11.1 six-1.11.0
transaction-2.1.2 zc.lockfile-1.2.1 zc.zlibstorage-1.2.0 zdaemon-4.2.0 zodbpickle-0.6.0
zope.annotation-4.5 zope.browser-2.2.0 zope.browserpage-4.2.0 zope.browserresource-4.2.1
zope.cachedescriptors-4.3.0 zope.component-4.4.0 zope.configuration-4.1.0 
zope.container-4.2.1 zope.contenttype-4.3.0 zope.copy-4.1.0 zope.copypastemove-4.1.0
zope.datetime-4.2.0 zope.deferredimport-4.2.0 zope.deprecation-4.3.0 
zope.dottedname-4.2 zope.dublincore-4.2.0 zope.event-4.3.0 zope.exceptions-4.2.0 
zope.file-1.0.0 zope.filerepresentation-4.2.0 zope.formlib-4.4 zope.hookable-4.1.0 
zope.i18n-4.2.0 zope.i18nmessageid-4.1.0 zope.interface-4.4.2 zope.intid-4.3.0 
zope.keyreference-4.1.0 zope.lifecycleevent-4.2.0 zope.location-4.1.0 
zope.mimetype-2.2.0 zope.minmax-2.2.0 zope.pagetemplate-4.3.0 
zope.preference-4.0.0 zope.processlifetime-2.2.0 zope.proxy-4.3.0 
zope.publisher-4.3.2 zope.schema-4.5.0 zope.security-4.1.1 zope.size-4.2.0 
zope.tal-4.3.0 zope.tales-4.1.1 zope.testing-4.6.2 zope.testrunner-4.7.0
zope.traversing-4.1.0 zope.vocabularyregistry-1.0.0

We can't possibly really need all of those in the core.

For example, we only depend on zope.preference to be able to list it in configure.zcml. That's probably not our job.

For another example, we only depend on BTrees to be able to include OOBTree.OOBTree in isinstance(thing, MAPPING_TYPES) call. Shouldn't that just be isinstance(thing, collections.Mapping) (and allow the user to register mapping types with the collections.Mapping ABC, as it is intended for.

@jamadden
Copy link
Member Author

jamadden commented Sep 28, 2017

The tox 2.7 environment currently contains 76 libraries. I've released zope.mimetype 2.3.0 which adds a browser extra (that we don't need) where a bunch of things moved. Together with removing the dependency on zope.preference we have...still 75 libraries. Narf.

zope.file still pulls in zope.formlib. It needs the same treatment.

@jamadden
Copy link
Member Author

Once I clean up zope.file, zope.container still pulls in zope.publisher and zope.browser. But I think maybe that's the last of it.

@jamadden
Copy link
Member Author

zope.traversing also pulls in zope.publisher.

Collecting zope.configuration (from zope.publisher->zope.traversing>=4.0.0a1->zope.container==4.2.2.dev0)
  Using cached zope.configuration-4.1.0-py2.py3-none-any.whl
Collecting zope.browser (from zope.publisher->zope.traversing>=4.0.0a1->zope.container==4.2.2.dev0)
  Using cached zope.browser-2.2.0-py2.py3-none-any.whl
Collecting zope.contenttype>=4.0.0 (from zope.publisher->zope.traversing>=4.0.0a1->zope.container==4.2.2.dev0)
  Using cached zope.contenttype-4.3.0-py2.py3-none-any.whl
Collecting pytz (from zope.i18n->zope.traversing>=4.0.0a1->zope.container==4.2.2.dev0)

@jamadden
Copy link
Member Author

And...untangling zope.publisher from zope.traversing would be a substantial undertaking. That may be where we leave that part of it for now.

@jamadden
Copy link
Member Author

For another example, we only depend on BTrees to be able to include OOBTree.OOBTree in isinstance(thing, MAPPING_TYPES) call. Shouldn't that just be isinstance(thing, collections.Mapping) (and allow the user to register mapping types with the collections.Mapping ABC, as it is intended for.

Three points about that. First, isinstance with an ABC like collections.Mapping falls down badly with a security proxy. See this issue and the resulting FAQ I wrote. We should probably change to use the recommended zope.security.proxy.isinstance call (we probably don't want to unwrap any security proxy completely on entrance to toExternalObject; I don't think we really use security proxies very much right now, but I could see us moving in that direction, as might other users of this library).

The second is that including an ABC in your isinstance check slows it down. A lot. Especially in the negative case. But it's not quite that simple.

Consider the simplest possible positive isinstance check for a dict in CPython 2.7:

In [34]: %timeit isinstance({}, dict)
The slowest run took 7.65 times longer than the fastest. 
This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 374 ns per loop

(I'll elide that warning about intermediate results from now on. They're all about that order of magnitude.)

The longer the list of types you check gets, the more time it takes, even if the first item in the list is definitely a match.

In [42]: %timeit isinstance({}, (dict, persistent.mapping.PersistentMapping, 
     BTrees.OOBTree.OOBTree))
1000000 loops, best of 3: 651 ns per loop

In [43]: %timeit isinstance({}, (dict, persistent.mapping.PersistentMapping, 
     BTrees.OOBTree.OOBTree, int, long, str))
1000000 loops, best of 3: 807 ns per loop

In [44]: %timeit isinstance({}, (dict, persistent.mapping.PersistentMapping, 
     BTrees.OOBTree.OOBTree, int, long, str, unicode, float))
1000000 loops, best of 3: 862 ns per loop

In [45]: %timeit isinstance({}, (dict, persistent.mapping.PersistentMapping, 
     BTrees.OOBTree.OOBTree, int, long, str, unicode, float, list, type))
1000000 loops, best of 3: 934 ns per loop

In [46]: %timeit isinstance({}, (dict, persistent.mapping.PersistentMapping, 
     BTrees.OOBTree.OOBTree, int, long, str, unicode, float, list, type, collections.Mapping))
1000000 loops, best of 3: 980 ns per loop

That looks bad until you realize that much of that time is spent constructing the tuple to pass in. If you use a constant tuple (as we do), it stays closer to the original time:

In [49]: t =  (dict, persistent.mapping.PersistentMapping, BTrees.OOBTree.OOBTree,
      int, long, str, unicode, float, list, type, collections.Mapping)

In [50]: %timeit isinstance({}, t)
1000000 loops, best of 3: 411 ns per loop

Now, a positive check for a dict against a single collections.Mapping element is about four times longer than against a single dict element:

In [57]: t = collections.Mapping

In [58]: %timeit isinstance({}, t)
1000000 loops, best of 3: 1.47 µs per loop

And if we don't specifically include the dict in our list of types, but just allow collections.Mapping to catch things at the end, we pay the worst price: iterating through a bunch of negatives before finally matching an ABC:

In [59]: t = (persistent.mapping.PersistentMapping, BTrees.OOBTree.OOBTree,
    ...:       int, long, str, unicode, float, list, type, collections.Mapping)

In [60]: %timeit isinstance({}, t)
100000 loops, best of 3: 2.21 µs per loop

A trivial negative check for an ABC is around twice the price of a trivial positive check:

In [51]: %timeit isinstance(1, collections.Mapping)
100000 loops, best of 3: 2.5 µs per loop

But an expensive negative check against a long list ending in an ABC isn't that bad, relatively speaking:

In [63]: t = (persistent.mapping.PersistentMapping, BTrees.OOBTree.OOBTree,
    ...:      str, unicode, float, list, type, collections.Mapping)

In [64]: isinstance(1, t)
Out[64]: False

In [65]: %timeit isinstance(1, t)
100000 loops, best of 3: 2.97 µs per loop

So, basically, order matters, and a long list, so long as its constant, doesn't hurt that much and may be better in performance sensitive code than an ABC. We're going to have a dependency on BTrees as long as we have a dependency on ZODB, so it might as well stay in the list.

The final point is that we're currently not very optimal. Look at how we currently define MAPPING_TYPES and SEQUENCE_TYPES:

SEQUENCE_TYPES = (persistent.list.PersistentList,
                  collections.Set,
                  list,
                  tuple)

MAPPING_TYPES = (persistent.mapping.PersistentMapping,
                 BTrees.OOBTree.OOBTree,
                 collections.Mapping)

We have an ABC in the list of sequence types before the basic types list and tuple, and we don't include dict at all in the list of MAPPING_TYPES. I think we can rearrange those for some nice improvements.

And now, just for the fun of it, lets look at how two of our longest examples in CPython (2.21us, 2.5us, and 2.97 us, respectively) do in PyPy (watch those caching warnings!):

In [3]: t = (persistent.mapping.PersistentMapping, BTrees.OOBTree.OOBTree,
    ...:       int, long, str, unicode, float, list, type, collections.Mapping)

In [4]: %timeit isinstance({}, t)
The slowest run took 122.02 times longer than the fastest. 
This could mean that an intermediate result is being cached.
1000000 loops, best of 7: 236 ns per loop

In [5]: %timeit isinstance(1, collections.Mapping)
The slowest run took 921.64 times longer than the fastest. 
This could mean that an intermediate result is being cached.
1000000 loops, best of 7: 273 ns per loop

In [6]: t = (persistent.mapping.PersistentMapping, BTrees.OOBTree.OOBTree,
    ...:      str, unicode, float, list, type, collections.Mapping)

In [7]: %timeit isinstance(1, t)
The slowest run took 34.06 times longer than the fastest. 
This could mean that an intermediate result is being cached.
1000000 loops, best of 7: 294 ns per loop

So about 10x faster than CPython on the microbenchmark when the JIT kicks in.

@papachoco
Copy link
Contributor

Do we need ZODB? it seems to be only needed to get the external oid.

@jamadden
Copy link
Member Author

jamadden commented Mar 5, 2018

Off hand, I don’t remember, but i would guess we would want that for IConnection. It seems to me external oids are a somewhat important part of this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants