-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix corner case regex colorization #678
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #678 +/- ##
==========================================
- Coverage 92.49% 92.48% -0.02%
==========================================
Files 47 47
Lines 8103 8131 +28
Branches 1935 1940 +5
==========================================
+ Hits 7495 7520 +25
- Misses 354 357 +3
Partials 254 254
☔ View full report in Codecov by Sentry. |
# branch (the compiler may optimize this even more) | ||
subpatternappend((IN, [item[0] for item in items])) | ||
return subpattern | ||
# pydoctor: remove all optimizations for round-tripping issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to edit sre_parse at all ?
I’m definitely not sure of what I’m doing when editing this file…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might indeed need this to remove some optimizations, also we might want to add support for latest python regex features. So there are good reasons to have our own fork of sre_parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any regex guru in twisted contributors that could give me some feedback on how to remove all opmizations of our sre_parse fork and ensure we don’t break the logic ? @adiroiban
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adiroiban @glyph can you help me on that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tristanlatr what do you need here exactly? I'm not super familiar with this code, and I am not sure what the relevant optimizations are. What would give you confidence the logic isn't broken, beyond the existing test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so basically there are several issues here. We vendor the sre_parse module from the python 3.6 standard library. First, why the 3.6 version? It is because the non capturing groups are optimized from 3.7 onwards such that the SubPattern instances cannot be converted back to the same string. Then, I suspect there are other optimizations that’ are preventing some regex to roundtrip, the changes up there is one of them. Lastly, some regex feature have been added recently to python 3.11 I believe. So we need to add support for those in our sre_parse module to provide complete colorizing for regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would give you confidence the logic isn't broken, beyond the existing test suite?
Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
And no, we just have our test suite to check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
gotcha, this makes a bit more sense. My plate is honestly kind of full right now, so I don't know how much time I can dedicate to this. Maybe ping the mailing list to see if anyone is interested? Sustainable open source infrastructure is hard :(.
assert color_re(r'^<(?P<descr>.*)>$') == """r<span class="rst-variable-quote">'</span>^<<span class="rst-re-group">(?P<</span><span class="rst-re-ref">descr</span><span class="rst-re-group">></span>.<span class="rst-re-op">*</span><span class="rst-re-group">)</span>>$<span class="rst-variable-quote">'</span>""" | ||
|
||
@pytest.mark.xfail | ||
def test_re_named_groups_weird() -> None: | ||
# This regex triggers some weird behaviour: it adds the ↵ element at the end where it should not be... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be be fixed
linebreakok:bool = attr.ib(default=False, init=False) | ||
warnings: List[str] = attr.ib(factory=list) | ||
|
||
# state linked to regex colorization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These state trackers should be moved into the state object.
Trying to fix #668