-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST: Doc.ents as SpanGroup #12380
TEST: Doc.ents as SpanGroup #12380
Conversation
Overview of required changes to support `SpanGroup` rather than `Tuple[Span]`: * implement slice for `SpanGroup` * return `SpanGroup` for `SpanGroup + x` or `x + SpanGroup` rather than refusing to concatenate (currently without good error handling) Side effects: * if appending to `Doc.ents`, only `Iterable[Span]` is supported rather than other formats like raw entity tuples from matcher results, but you can still assign mixed data in any of the currently supported formats to `Doc.ents` * for the `Matcher` case, `as_spans` provides a good alternative
I'm surprised that no one has complained about slicing span groups. |
if isinstance(i, slice): | ||
start, stop = util.normalize_slice(len(self), i.start, i.stop, i.step) | ||
spans = [self[i] for i in range(start, stop)] | ||
return SpanGroup(self.doc, spans=spans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not the fastest way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I suppose we could support step
here, but I'm not sure why you'd want to do this.
And if we did this, we'd want to do it for all the |
So the current import spacy
nlp = spacy.blank("en")
ruler = nlp.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "A", "pattern": "the"}])
print(nlp("the cat").ents)
I like it otherwise, but this (by itself) is definitely too breaking. |
I see the thinking here. Is it too awkward to make |
Temporarily closing this PR as we currently don't have the bandwidth to finish this. |
Description
Overview of required changes to support
SpanGroup
rather thanTuple[Span]
:SpanGroup
SpanGroup
forSpanGroup + x
orx + SpanGroup
rather than refusing to concatenate (currently without good error handling)Side effects:
Doc.ents
, onlyIterable[Span]
is supported rather than other formats like raw entity tuples from matcher results, but you can still assign mixed data in any of the currently supported formats toDoc.ents
Matcher
case,as_spans
provides a good alternativeTypes of change
?
Checklist