Remove extra evaluations and unescapings #9

Derugon · 2024-11-29T11:48:57Z

Context

Parser functions always take wikitext arguments. Unlike tags, I would expect some behaviors to always apply to the arguments passed, whatever the function does in practice:

Evaluate variables within the wikitext,
Remove space-like characters at the start and at the end of the evaluated text.

If the function modifies an argument, it would do it after having normalized the argument. Consequently, if we want to pass wikitext that would still look the same once normalized, we can freeze the wikitext (e.g. by using {{(}} or {{!}}) or use <nowiki/> tags.

Some ParserPower functions apply an additional normalization step:

Replace escape sequences within the trimmed text.

Then, once all argument modifications have been applied, the function may re-evaluate variables within the unescaped wikitext.

This provides another way to circumvent normalization: we can use escape sequences within the wikitext to prevent variable syntax from being recognized, and spaces from being removed.

An example of the issue

Let Template:Quote be a template that would print its 1st argument within quotes. We do not want spaces inside the quotes, so we would basically want to use:

"{{#trim: {{{1}}} }}"

Suppose we want to pass {{!}} as argument, but we want it to be printed as-is. From the solutions introduced above, we could:

freeze the wikitext, e.g.:

{{quote | {{((}}!{{))}} }}

use <nowiki/> tags, e.g.:

{{quote | <nowiki>{{!}}</nowiki> }}

use escape sequences, e.g.:

{{#uescnowiki: {{quote | <esc>{{!}}</esc> }} }}

Outputs can be found here. While both the 2nd and 3rd approaches yield "{{!}}", the 1st one yields "|". This is because the #trim function evaluates variables within its argument (by expanding the given pre-processor node), then trims spaces, and finally returns it while telling the parser to evaluate variables (by setting the 'noparse'=false flag).

The issue

Various parser functions and tags in ParserPower evaluate variables within wikitext twice without having unescaped anything between, or unescape wikitext twice.

While changing it may not particularly allow us to do more things than we can already do, each variable evaluation that a parser function tries to do generates and evaluates additional pre-processor nodes. This takes some small additional parsing time, and artificially makes parser reports larger than they should.

Proposed changes

Remove extra variable evaluation, trimming, and unescaping steps. More precisely:

Do not evaluate a wikitext string if:
- under normal conditions it should not be standard wikitext with any variables/escape sequences (e.g. a raw error string, a stringified number, an escaped wikitext), or
- variables have already been evaluated, and no following operation added any variables to the string (e.g. unescaping braces or angle brackets should lead to new variables, replacing by evaluated wikitext within evaluated wikitext should not).
Do not unescape a wikitext string if:
- under normal conditions it should not be standard wikitext with any escape sequences (e.g. a raw error string, a stringified number), or
- it has already been unescaped at least once by the parser function/tag.

To achieve this, a few unrelated changes have been made:

Small code refactorings are made in this PR to move parameter evaluation/trimming/unescaping to the outer functions as much as possible.
For functions that accept named arguments, values of unknown arguments are no longer evaluated.

Below is a list of all variable evaluation/unescaping steps that were removed when the wikitext may have contained frozen variable syntax or wikitext escaped multiple times (that was previously evaluated/unescaped twice and is no longer in this PR), i.e. all potentially breaking changes:

{{#trim:}}
- The 1st argument (evaluated a 2nd time once trimmed).
{{#or:}}
- Any argument (evaluated twice if non empty and returned).
{{#follow:}}
- The 1st argument (evaluated twice).
{{#listfilter:}}
- The list and default arguments (evaluated twice).
- The pattern argument (evaluated thrice if the list is not empty).
- Any other argument (evaluated twice if the list is not empty).
- When using a pattern as predicate, the pattern after token replacements (evaluated twice then unescaped twice).
{{#listmap:}}
- The list and default arguments (evaluated twice).
- The pattern argument (evaluated thrice if the list is not empty).
- Any other argument (evaluated twice if the list is not empty).
- When using a pattern to generate keys, the pattern after token replacements (evaluated twice).
{{#lstmap:}}
- The pattern argument (evaluated twice if the list is not empty).
- The pattern after token replacements (evaluated twice).
{{#listunique:}}
- The list and default arguments (evaluated twice).
- The pattern argument (evaluated thrice if the list is not empty).
- Any other argument (evaluated twice if the list is not empty).
- When using a pattern to generate keys, the pattern after token replacements (evaluated twice then unescaped twice).
{{#listsort:}}
- The list and default arguments (evaluated twice).
- The pattern argument (evaluated thrice if the list is not empty).
- Any other argument (evaluated twice if the list is not empty).
- When using a pattern to generate keys, the pattern after token replacements (evaluated twice then unescaped twice).
{{#listmerge:}}
- The list and default arguments (evaluated twice).
- The matchpattern and mergepattern arguments (evaluated thrice if the list is not empty).
- Any other argument (evaluated twice if the list is not empty).
- When using a pattern to merge pairs of values, the pattern after token replacements (evaluated twice then unescaped twice).

Derugon · 2024-11-29T16:37:58Z

It seems list functions that return unescaped wikitext parse it twice, so I'm gonna work on it a little more.

RheingoldRiver · 2024-11-29T19:28:07Z

Sounds good, thanks so much for your contributions already!!

Derugon · 2024-11-29T20:26:10Z

Well, thank you all for still maintaining it, and for taking the time to review these PRs. :)

Derugon · 2024-12-04T16:50:51Z

I wrote this had no impact on code, but it is not true.

The base issue

Using the changes from this PR with templates from an existing wiki it caused a (subtle) change, a beneficial one in my case, but still a breaking change: generated wikitext with transcluded wikitext syntax will have it evaluated.

For example, {{#trim: {{(}}{{(}}!{{)}}{{)}} }} yields |, while I would expect it to produce the same result as {{(}}{{(}}!{{)}}{{)}}, i.e. {{!}}.

This means, this PR (as of now) is a breaking change for the #trim and #or parser functions (and only these 2).

The issue with unescaping

This is a side-effect we have to deal with when unescaping: after the text is unescaped, we need to parse it again, so we parse {{#uesc: \{\{!\}\} }} the same way as {{#uesc: {{(}}{{(}}!{{)}}{{)}} }}, and both yield |.

Changing it would mean variables can no longer contribute reliably to unescaping, e.g. {{#uesc: {{X}} }} would yield {{!}} (with Template:X containing \{\{!\}\}), which would completely break the purpose of unescaping.

So I'll double down on what I said last week, and not suggest to remove the extra parsing from unescaped text in this PR.

…ensions-ParserPower into no-noparse

Mistakenly rolled back some lines in previous branch merge

…ensions-ParserPower into no-noparse

…uateUnescaped` function so we can use the same behaviors when parsing unescaped parameters and unescaped outputs.

This makes it harder to track where wikitext is evaluated, and consequently, most of the time it is evaluated or unescaped more times than it should be

If the frame is not a template one, there is no arguments anyway, so there is no benefit in processing unescaped wikitext within a child frame.

Various functions trim or unescape their arguments, while these were already trimmed or unescaped. Also expand patterns directly, so we don't have unexanped nodes or untrimmed strings wandering around.

…ensions-ParserPower into no-noparse

Currently, ParserPower::expand() expands values, that are expanded again when arguments are retrieved by name. To deduplicate these expansions, ParserPower::expand() no longer expands values (only keys), so we could lazily evaluate some values in the future.

Variables are replaced when arguments are parsed. When iterating over a list, arguments are values/patterns are re-evaluated each iteration, while nothing was unescaped, we only replaced evaluated wikitext with evaluated wikitext, so all we have at that time is evaluated wikitext.

…TwoSetFieldPattern` Missed that function in the 2 previous commits.

Some functions make frame arguments available when replacing variables after unescaping. This leads to unexpected results, but changing it is out of scope of this PR, so for now add a `WITH_ARGS` flag to keep this behavior.

Remove parser function output parsing when unneeded

4dcf14b

Derugon marked this pull request as draft November 29, 2024 16:35

Derugon marked this pull request as ready for review December 4, 2024 16:51

Derugon added 5 commits December 4, 2024 17:56

Merge branch 'master' into no-noparse

7a09ff2

Merge branch 'master' of https://github.com/wiki-gg-oss/mediawiki-ext…

5a5b8fa

…ensions-ParserPower into no-noparse

Fix linkpage/linktext output

c31a77b

Mistakenly rolled back some lines in previous branch merge

argmap

59584ee

Merge branch 'master' of https://github.com/wiki-gg-oss/mediawiki-ext…

b08083a

…ensions-ParserPower into no-noparse

Derugon marked this pull request as draft December 30, 2024 09:32

Derugon added 10 commits December 31, 2024 17:34

Replace remaining noparse flags with a dedicated `ParserPower::eval…

ddbb43d

…uateUnescaped` function so we can use the same behaviors when parsing unescaped parameters and unescaped outputs.

Remove evaluation from token replacement functions

3fbc224

This makes it harder to track where wikitext is evaluated, and consequently, most of the time it is evaluated or unescaped more times than it should be

Do not create a child frame if it is not a template

c0fe3b4

If the frame is not a template one, there is no arguments anyway, so there is no benefit in processing unescaped wikitext within a child frame.

Remove unused expansion from token replacement functions

50ce898

Cleanup token replacement functions

fbbe8d3

Remove duplicated trimmings and unescapings

14bd36c

Various functions trim or unescape their arguments, while these were already trimmed or unescaped. Also expand patterns directly, so we don't have unexanped nodes or untrimmed strings wandering around.

Merge branch 'master' of https://github.com/wiki-gg-oss/mediawiki-ext…

a1fd642

…ensions-ParserPower into no-noparse

Merge branch 'master' of https://github.com/wiki-gg-oss/mediawiki-ext…

a1cdbc6

…ensions-ParserPower into no-noparse

Merge branch 'master' of https://github.com/wiki-gg-oss/mediawiki-ext…

74dc836

…ensions-ParserPower into no-noparse

Derugon changed the title ~~Remove parser function output parsing when unneeded~~ Remove extra evaluations and unescapings Feb 7, 2025

Derugon added 4 commits February 7, 2025 11:08

Remove duplicated variable evaluation & parameter expansion in `apply…

56170a2

…TwoSetFieldPattern` Missed that function in the 2 previous commits.

Always use ParserPower::evaluateUnescaped after unescaping

a13958a

Some functions make frame arguments available when replacing variables after unescaping. This leads to unexpected results, but changing it is out of scope of this PR, so for now add a `WITH_ARGS` flag to keep this behavior.

Update return types

63c11b8

Derugon marked this pull request as ready for review February 10, 2025 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove extra evaluations and unescapings #9

Remove extra evaluations and unescapings #9

Derugon commented Nov 29, 2024 •

edited

Loading

Derugon commented Nov 29, 2024 •

edited

Loading

RheingoldRiver commented Nov 29, 2024

Derugon commented Nov 29, 2024

Derugon commented Dec 4, 2024 •

edited

Loading

Remove extra evaluations and unescapings #9

Are you sure you want to change the base?

Remove extra evaluations and unescapings #9

Conversation

Derugon commented Nov 29, 2024 • edited Loading

Context

An example of the issue

The issue

Proposed changes

Derugon commented Nov 29, 2024 • edited Loading

RheingoldRiver commented Nov 29, 2024

Derugon commented Nov 29, 2024

Derugon commented Dec 4, 2024 • edited Loading

The base issue

The issue with unescaping

Derugon commented Nov 29, 2024 •

edited

Loading

Derugon commented Nov 29, 2024 •

edited

Loading

Derugon commented Dec 4, 2024 •

edited

Loading