-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Allow lex_attr_getters to take a vocab argument #12244
DRAFT: Allow lex_attr_getters to take a vocab argument #12244
Conversation
|
||
def _get_lex_attr_value(vocab, func, string): | ||
try: | ||
return func(vocab, string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no obvious speed hit with try
/except
in the test suite (unlike for inspect
) and it would be easy to add this with the order swapped in v3. (Some more performance checks would still be necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit about collateral catches here though. It would be pretty unpleasant to have us catch a TypeError
the user's callback legitimately raised, because we'll then lose the original traceback.
If we go this route, maybe we could mitigate the problem by doing something like this?
def _get_lex_attr_value(vocab, func, string):
try:
return func(vocab, string)
except TypeError as first_error:
# TODO: some kind of setting or deprecation warning
try:
return func(string)
except TypeError:
# If the second signature doesn't work either, raise the first
# error, as that will be the one with the user's traceback
# (and don't raise it from the second error; that's not helpful)
raise first_error from None
I think doing the inspection when the callback is assigned would be more reliable though, and it'd remove the question of the runtime performance. It also lets us notify for deprecation when the callback is assigned, rather than when it is invoked, which also seems better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean about the error traceback.
I agree in theory about tracking the signature of each getter, but modifying the vocab (in a way that doesn't break all existing code and examples) to have some notion of "assigned" is going to be non-trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a backport for v3, so it's definitely not a problem to just do it this way. For reference, here's the idea I referred to on Slack for the proxying dict. I could've sworn we did this someplace else but I can't find it.
import weakref
class SetterCallbackDict(dict):
"""A dict class that makes callbacks on __setitem__.
The intended use-case is to replace a dictionary attribute
of a class, if some logic should occur when items are
assigned. The parent argument is also passed to the callback,
and stored as a weakref to avoid cycles.
"""
def __init__(self, parent, setter) -> None:
self._parent = weakref.ref(parent)
self._setter = setter
def __setitem__(self, key, value):
parent = self._parent()
return self._setter(parent, key, value)
class Vocab:
def __init__(self, lex_attrs):
self._lex_attrs = SetterCallbackDict(self, set_lex_attr)
for key, value in lex_attrs.items():
self._lex_attrs[key] = value
@property
def lex_attrs(self):
return self._lex_attrs
@lex_attrs.setter
def lex_attrs(self, new_dict):
self._lex_attrs = SetterCallbackDict(self, set_lex_attr)
for key, value in new_dict.items():
self._lex_attrs[key] = value
def set_lex_attr(vocab, key, value):
if key == "foo":
print("key was foo")
dict.__setitem__(vocab.lex_attrs, key, value)
Temporarily closing this PR as we currently don't have the bandwidth to wrap this up properly. |
Description
Allow methods in
lex_attr_getters
to optionally take avocab
argument in order to passvocab.lookups
and the new less-structurevocab.lex_attr_data
in order to provide arbitrary data (like stop words) to the getters in a way that it can be serialized with the pipeline.Types of change
?
Checklist