Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extra evaluations and unescapings #9

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

Derugon
Copy link
Contributor

@Derugon Derugon commented Nov 29, 2024

Context

Parser functions always take wikitext arguments. Unlike tags, I would expect some behaviors to always apply to the arguments passed, whatever the function does in practice:

  1. Evaluate variables within the wikitext,
  2. Remove space-like characters at the start and at the end of the evaluated text.

If the function modifies an argument, it would do it after having normalized the argument. Consequently, if we want to pass wikitext that would still look the same once normalized, we can freeze the wikitext (e.g. by using {{(}} or {{!}}) or use <nowiki/> tags.

Some ParserPower functions apply an additional normalization step:

  1. Replace escape sequences within the trimmed text.

Then, once all argument modifications have been applied, the function may re-evaluate variables within the unescaped wikitext.

This provides another way to circumvent normalization: we can use escape sequences within the wikitext to prevent variable syntax from being recognized, and spaces from being removed.

An example of the issue

Let Template:Quote be a template that would print its 1st argument within quotes. We do not want spaces inside the quotes, so we would basically want to use:

"{{#trim: {{{1}}} }}"

Suppose we want to pass {{!}} as argument, but we want it to be printed as-is. From the solutions introduced above, we could:

  1. freeze the wikitext, e.g.:
{{quote | {{((}}!{{))}} }}
  1. use <nowiki/> tags, e.g.:
{{quote | <nowiki>{{!}}</nowiki> }}
  1. use escape sequences, e.g.:
{{#uescnowiki: {{quote | <esc>{{!}}</esc> }} }}

Outputs can be found here. While both the 2nd and 3rd approaches yield "{{!}}", the 1st one yields "|". This is because the #trim function evaluates variables within its argument (by expanding the given pre-processor node), then trims spaces, and finally returns it while telling the parser to evaluate variables (by setting the 'noparse'=false flag).

The issue

Various parser functions and tags in ParserPower evaluate variables within wikitext twice without having unescaped anything between, or unescape wikitext twice.

While changing it may not particularly allow us to do more things than we can already do, each variable evaluation that a parser function tries to do generates and evaluates additional pre-processor nodes. This takes some small additional parsing time, and artificially makes parser reports larger than they should.

Proposed changes

Remove extra variable evaluation, trimming, and unescaping steps. More precisely:

  • Do not evaluate a wikitext string if:
    • under normal conditions it should not be standard wikitext with any variables/escape sequences (e.g. a raw error string, a stringified number, an escaped wikitext), or
    • variables have already been evaluated, and no following operation added any variables to the string (e.g. unescaping braces or angle brackets should lead to new variables, replacing by evaluated wikitext within evaluated wikitext should not).
  • Do not unescape a wikitext string if:
    • under normal conditions it should not be standard wikitext with any escape sequences (e.g. a raw error string, a stringified number), or
    • it has already been unescaped at least once by the parser function/tag.

To achieve this, a few unrelated changes have been made:

  • Small code refactorings are made in this PR to move parameter evaluation/trimming/unescaping to the outer functions as much as possible.
  • For functions that accept named arguments, values of unknown arguments are no longer evaluated.

Below is a list of all variable evaluation/unescaping steps that were removed when the wikitext may have contained frozen variable syntax or wikitext escaped multiple times (that was previously evaluated/unescaped twice and is no longer in this PR), i.e. all potentially breaking changes:

  • {{#trim:}}
    • The 1st argument (evaluated a 2nd time once trimmed).
  • {{#or:}}
    • Any argument (evaluated twice if non empty and returned).
  • {{#follow:}}
    • The 1st argument (evaluated twice).
  • {{#listfilter:}}
    • The list and default arguments (evaluated twice).
    • The pattern argument (evaluated thrice if the list is not empty).
    • Any other argument (evaluated twice if the list is not empty).
    • When using a pattern as predicate, the pattern after token replacements (evaluated twice then unescaped twice).
  • {{#listmap:}}
    • The list and default arguments (evaluated twice).
    • The pattern argument (evaluated thrice if the list is not empty).
    • Any other argument (evaluated twice if the list is not empty).
    • When using a pattern to generate keys, the pattern after token replacements (evaluated twice).
  • {{#lstmap:}}
    • The pattern argument (evaluated twice if the list is not empty).
    • The pattern after token replacements (evaluated twice).
  • {{#listunique:}}
    • The list and default arguments (evaluated twice).
    • The pattern argument (evaluated thrice if the list is not empty).
    • Any other argument (evaluated twice if the list is not empty).
    • When using a pattern to generate keys, the pattern after token replacements (evaluated twice then unescaped twice).
  • {{#listsort:}}
    • The list and default arguments (evaluated twice).
    • The pattern argument (evaluated thrice if the list is not empty).
    • Any other argument (evaluated twice if the list is not empty).
    • When using a pattern to generate keys, the pattern after token replacements (evaluated twice then unescaped twice).
  • {{#listmerge:}}
    • The list and default arguments (evaluated twice).
    • The matchpattern and mergepattern arguments (evaluated thrice if the list is not empty).
    • Any other argument (evaluated twice if the list is not empty).
    • When using a pattern to merge pairs of values, the pattern after token replacements (evaluated twice then unescaped twice).

@Derugon Derugon marked this pull request as draft November 29, 2024 16:35
@Derugon
Copy link
Contributor Author

Derugon commented Nov 29, 2024

It seems list functions that return unescaped wikitext parse it twice, so I'm gonna work on it a little more.

@RheingoldRiver
Copy link
Member

Sounds good, thanks so much for your contributions already!!

@Derugon
Copy link
Contributor Author

Derugon commented Nov 29, 2024

Well, thank you all for still maintaining it, and for taking the time to review these PRs. :)

@Derugon
Copy link
Contributor Author

Derugon commented Dec 4, 2024

I wrote this had no impact on code, but it is not true.

The base issue

Using the changes from this PR with templates from an existing wiki it caused a (subtle) change, a beneficial one in my case, but still a breaking change: generated wikitext with transcluded wikitext syntax will have it evaluated.

For example, {{#trim: {{(}}{{(}}!{{)}}{{)}} }} yields |, while I would expect it to produce the same result as {{(}}{{(}}!{{)}}{{)}}, i.e. {{!}}.

This means, this PR (as of now) is a breaking change for the #trim and #or parser functions (and only these 2).

The issue with unescaping

This is a side-effect we have to deal with when unescaping: after the text is unescaped, we need to parse it again, so we parse {{#uesc: \{\{!\}\} }} the same way as {{#uesc: {{(}}{{(}}!{{)}}{{)}} }}, and both yield |.

Changing it would mean variables can no longer contribute reliably to unescaping, e.g. {{#uesc: {{X}} }} would yield {{!}} (with Template:X containing \{\{!\}\}), which would completely break the purpose of unescaping.

So I'll double down on what I said last week, and not suggest to remove the extra parsing from unescaped text in this PR.

@Derugon Derugon marked this pull request as ready for review December 4, 2024 16:51
@Derugon Derugon marked this pull request as draft December 30, 2024 09:32
…uateUnescaped` function

so we can use the same behaviors when parsing unescaped parameters and unescaped outputs.
This makes it harder to track where wikitext is evaluated, and consequently, most of the time it is evaluated or unescaped more times than it should be
If the frame is not a template one, there is no arguments anyway, so there is no benefit in processing unescaped wikitext within a child frame.
Various functions trim or unescape their arguments, while these were already trimmed or unescaped.

Also expand patterns directly, so we don't have unexanped nodes or untrimmed strings wandering around.
Currently, ParserPower::expand() expands values, that are expanded again when arguments are retrieved by name.
To deduplicate these expansions, ParserPower::expand() no longer expands values (only keys), so we could lazily evaluate some values in the future.
@Derugon Derugon changed the title Remove parser function output parsing when unneeded Remove extra evaluations and unescapings Feb 7, 2025
Variables are replaced when arguments are parsed. When iterating over a list, arguments are values/patterns are re-evaluated each iteration, while nothing was unescaped, we only replaced evaluated wikitext with evaluated wikitext, so all we have at that time is evaluated wikitext.
…TwoSetFieldPattern`

Missed that function in the 2 previous commits.
Some functions make frame arguments available when replacing variables after unescaping.
This leads to unexpected results, but changing it is out of scope of this PR, so for now add a `WITH_ARGS` flag to keep this behavior.
@Derugon Derugon marked this pull request as ready for review February 10, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants