Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour #780

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ben-allen
Copy link
Contributor

fix #588

@ben-allen ben-allen requested a review from ryzokuken May 6, 2023 00:03
@ljharb
Copy link
Member

ljharb commented May 6, 2023

Why would this restriction only apply to web browsers?

@ben-allen
Copy link
Contributor Author

This PR is in response to feedback from the 2021-05-25 TC39 meeting and is meant to address concerns about potential fingerprinting issues that only pertain to browser implementations.

@ljharb
Copy link
Member

ljharb commented May 10, 2023

Is there any reason not to apply the same restrictions to all engines tho? The ideal is that everything applies to everyone equally; having something only apply to a subset of impls is a suboptimal outcome.

@ben-allen ben-allen force-pushed the ships-entire-payload branch from 860f98e to 3feac13 Compare May 10, 2023 15:12
@ben-allen
Copy link
Contributor Author

updated to apply restriction to all hosts

@ljharb
Copy link
Member

ljharb commented May 10, 2023

(please don't land this until 402 editors have reviewed)

@ljharb ljharb requested a review from a team May 10, 2023 16:21
@@ -72,6 +72,9 @@ <h1>Implementation Dependencies</h1>
<em>Subsets of Unicode:</em> Some operations, such as collation, operate on strings that can include characters from the entire Unicode character set. However, both the Unicode Standard and the ECMAScript standard allow implementations to limit their functionality to subsets of the Unicode character set. In addition, locale conventions typically don't specify the desired behaviour for the entire Unicode character set, but only for those characters that are relevant for the locale. While the Unicode Collation Algorithm combines a default collation order for the entire Unicode character set with the ability to tailor for local conventions, subsets and tailorings still result in differences in behaviour.
</li>
</ul>
<emu-note>
The set of locales made available by ECMAScript hosts must not change as the result of user behaviour, and the set of available locales must not produce observable differences between two users using the same version of the same host on the same platform. As a result, ECMAScript hosts must not allow on-demand installation of new locales.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set of locales, yes, but also the set of all enumerable items including currencies, numbering systems, calendars, etc.

Also, please explain why we're adding this constraint. I imagine that it may be desirable in the future to relax this constraint, and we need to understand why it existed.

@aphillips
Copy link

aphillips commented May 11, 2023

The W3C I18N Working Group discussed this PR today in our teleconference. I am adding this comment on our behalf.

We are concerned that this prohibition will disadvantage smaller language/cultural communities who might rely on installation of support to enable locale-based APIs in the browser or JS host.

We feel that precluding the ability to install a locale or the parts of a locale (such as dictionaries for spell check/breaking/etc.) that assist with high-quality presentation on the Web and in JS applications has the potential to negatively impact those communities that cannot depend on support from browser or system vendors. If there is a "fingerprinting" risk associated with such installation, providing a warning to the user might be the best appropriate response.

Also, note that currently we are not aware of runtimes that allows the list of locales to be updated (other than by updating the entire underlying ICU build), so this strikes us as preventing a feature from existing that might be useful. Also, we note that CLDR releases include new locales twice each year, so presumably browsers would change their list of available locales as updates propagate.

@sffc sffc requested a review from codehag May 11, 2023 15:26
@sffc
Copy link
Contributor

sffc commented May 11, 2023

CC @codehag for feedback on @aphillips' comment above (see issue #588 for a reminder of the problem this PR is trying to solve)

@ljharb
Copy link
Member

ljharb commented May 11, 2023

@aphillips doesnt this prohibition only require refreshing the page after installing a new locale?

@aphillips
Copy link

aphillips commented May 12, 2023

@ljharb asked:

doesnt this prohibition only require refreshing the page after installing a new locale?

Not as far as I can tell? It seems to require that any two distinct users on the same version on the same platform (not the "same machine") should always get the same enumerable set of available values. The change in cb6449b extends this to include anything (numbering systems, calendars, etc.). I note that one frequent cause of such patching would be time zone data (which is not listed but is the regular source of runtime patching outside the normal release cycle).

If I understand the threat here, it's to prevent a bad actor from installing a locale into a user's browser and then using that locale ID (perhaps using a well-formed locale ID like: bad-actr-06b3c) to track the specific user (or to differentiate groups of users). That suggests that there should be no silent installation of locales or locale data, not necessarily an outright prohibition?

There is a similar issue (w3c/css-drafts#4055) related to fingerprinting based on fonts (which have a much higher level of per-installation variability and which, unlike locales, can actually be installed currently 😃). The challenge here is to support under-served communities of users--particularly when the threat is more-or-less theoretical--without exposing large groups of Web users to abusive behavior.

@justingrant
Copy link
Contributor

I note that one frequent cause of such patching would be time zone data (which is not listed but is the regular source of runtime patching outside the normal release cycle).

Which hosts and engines currently do runtime patching of time zone data? Are these OS patches or browser patches?

@ryzokuken ryzokuken added the editorial Involves an editorial fix label Jun 1, 2023
@dminor dminor requested review from dminor and removed request for codehag June 1, 2023 18:24
@Constellation
Copy link
Member

@ben-allen I would like to ensure that this is including the possibility of vastly different behavior of the same browser. For example, iPhone has lockdown-mode (LDM, https://support.apple.com/en-us/HT212650) which explicitly disables some of the features to put some extra defense against the targeted attacks. I don't think Intl related things can be changed based on these modes, but I would like to ensure that there is this kind of explicit possibility and this possibility is allowed in the statement :)

Eemeli pointed in the TG2 meeting that this can be said as a part of platform difference, and then this sounds fine. So I would like to ensure that the above possibility is counted as a part of platform difference.

Other part pretty looks good to me! Thanks for your work.

@ljharb
Copy link
Member

ljharb commented Jun 1, 2023

@Constellation can lockdown mode come into effect without reloading the page? If it forces a reload, then this note wouldn't apply at all, since it wouldn't be observable within the lifetime of a program.

@dminor
Copy link

dminor commented Jun 2, 2023

Our concern is that on-demand installation of locale data could provide an easy fingerprinting vector for members of smaller language/cultural communities who may face discrimination or persecution, for example by dominant cultural groups or by the government of the country in which they live.

Our position is that it is better to ship data for all locales in a single bundle, which ensures that data for smaller communities is available, without exposing them to a fingerprinting risk.

I believe it would be very difficult to create a user warning that would explain the potential risk in a way that would allow a user to make an informed decision about accepting extra locale data. I suspect most people would ignore these warnings. I'd also point out that there's no guarantee that for a small linguistic community the text of the warning would be localized, which would decrease the likelihood of making an informed decision.

The key point is that the set of locales should not change as a result of user behaviour. We're not trying to prevent vendors from shipping a new bundle of locale data to users as part of an update, just data for individual locales. In the case of Firefox, CLDR and timezone updates are done as part of our normal release cycle anyway.

@srl295
Copy link
Member

srl295 commented Jun 11, 2023

The W3C I18N Working Group discussed this PR today in our teleconference. I am adding this comment on our behalf.

We are concerned that this prohibition will disadvantage smaller language/cultural communities who might rely on installation of support to enable locale-based APIs in the browser or JS host.

This type of feature was recently requested to me, in the context of ICU4C, and explicitly for the purpose of supporting minority, disadvantaged languages. (Which, yes, could potentially be at-risk for fingerprinting of various kinds.)

We feel that precluding the ability to install a locale or the parts of a locale (such as dictionaries for spell check/breaking/etc.) that assist with high-quality presentation on the Web and in JS applications has the potential to negatively impact those communities that cannot depend on support from browser or system vendors. If there is a "fingerprinting" risk associated with such installation, providing a warning to the user might be the best appropriate response.

Agreed. Default breaking for Thai script, for example, in current implementations causes problems for minority script users. more here

Also, note that currently we are not aware of runtimes that allows the list of locales to be updated (other than by updating the entire underlying ICU build), so this strikes us as preventing a feature from existing that might be useful. Also, we note that CLDR releases include new locales twice each year, so presumably browsers would change their list of available locales as updates propagate.

CLDR releases include certain locales as "basic" and above, but there are other locales not included.

ICU4C default build includes certain locales, but not others.

Vendors include certain locales, but not others.

In short, certain locales are already excluded from web implementations. I'm concerned that requiring that these locales cannot be added on the fly could end up negatively impacting users of already-digitally disadvantaged languages.

@srl295
Copy link
Member

srl295 commented Jun 12, 2023

Is there any reason not to apply the same restrictions to all engines tho? The ideal is that everything applies to everyone equally; having something only apply to a subset of impls is a suboptimal outcome.

On another topic, Node.js has, from the first versions that included Intl by default, had the ability to customize at build and runtime the set of locales available, and also to supplement the locales depending on the startup environment. It's also been requested to have some way to add locales at runtime there as well. This language seems to make Node.js v0.12 onwards potentially noncompliant. I don't see the argument for restriction in this type of environment at all.

@ljharb
Copy link
Member

ljharb commented Jun 12, 2023

@srl295 that doesn't imply to me that it can be changed during the lifetime of a program, only at program start time.

My understanding of this requirement is that once a JS program has started, it can't observe further changes to the list of available locales. To that end, anything that requires refreshing or navigating a page, or, restarting an application or launching a process, in order to observe a different set of locales seems to me that it complies with this requirement.

@srl295
Copy link
Member

srl295 commented Jun 12, 2023

@srl295 that doesn't imply to me that it can be changed during the lifetime of a program, only at program start time.

OK. So "user behaviour" is scoped to the JS runtime? That's helpful…  I then don't see how fingerprinting is mitigated.

My understanding of this requirement is that once a JS program has started, it can't observe further changes to the list of available locales. To that end, anything that requires refreshing or navigating a page, or, restarting an application or launching a process, in order to observe a different set of locales seems to me that it complies with this requirement.

That would be very different. And wouldn't then bring as much concern. Adding locales while running has been discussed as well, but certainly has a lot of other challenges.

Copy link
Member

@ryzokuken ryzokuken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryzokuken ryzokuken requested a review from gibson042 July 3, 2023 15:33
spec/overview.html Outdated Show resolved Hide resolved
@ljharb
Copy link
Member

ljharb commented Aug 9, 2023

It might be worth noting that one potential way to satisfy this constraint is to pretend that all on-the-fly-available locales are already installed.

@sffc
Copy link
Contributor

sffc commented Aug 23, 2023

@ben-allen to update the spec text to incorporate the remainder of @Manishearth's feedback.

@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1>
</li>
</ul>

<emu-note>
Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be normative? The line below should be informative, but it sounds like this is expected to be normative?

Copy link
Contributor Author

@ben-allen ben-allen Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should indeed be normative, though it's in that fuzzy space where no extant browser implementation is affected by the change

@ben-allen ben-allen changed the title Editorial: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour Normative: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour Sep 4, 2023
@ben-allen ben-allen force-pushed the ships-entire-payload branch from 549567a to 8813f31 Compare September 4, 2023 22:36
@ben-allen ben-allen added normative and removed editorial Involves an editorial fix labels Sep 6, 2023
@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1>
</li>
</ul>

<emu-note>
Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.

How about

The initial set of locales, currencies, calendars, numbering systems, and other enumerable items visible to a particular origin must be the same for all users sharing the same user agent string (engine and platform version). Furthermore, dynamic changes to these sets must not result in users becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer, will use this

Copy link

@ByteEater-pl ByteEater-pl Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion of "user agent string" is foreign to ECMAScript and not a universal among implementations. @ljharb, please take a look, it seems to be going contrary to your comments above.

As something far from matters typically expected to be specified in a programming language standard (or API closely related to it, being an optional extension of a language's standard library) and pertaining to only a subset of implementations, namely Web browsers, shouldn't it live in a spec for those? Like the one defining what a Web browser is and what requirements it has to satisfy as an ECMAScript implementation (if supported) in addition to ECMA-262?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the concept and mention of a user agent doesn’t make any sense in an ecma specification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think we need to step back and consider the premise here. The fingerprinting concerns are largely only relevant for web browser environments.

Either it is ECMA402's job to address these fingerprinting concerns and thus it must be allowed to refer to mechanisms available in those contexts, or it is not ECMA402's job, and we don't need to handle this at all. We can't have our fingerprinting cake if we're not planning on eating it.

I suspect the framing here can be refined a bit to be clear that it is talking in a web browser context only. Alternatively, a more general point can be made about "systems where fingerprinting is a concern, like web browsers", and instead of saying UA strings talk about "already distinguishable bits of information (in the case of browsers, this is platform/version/UA string)". Something like that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it really belongs elsewhere. But if a critical mass insists it's desirable to make the spec longer and draw the attention also therein to this concern present in some implementations, an informative note with sufficiently generic wording could be added.

@sffc
Copy link
Contributor

sffc commented Sep 7, 2023

@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1>
</li>
</ul>

<emu-note>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is normative, it maybe shouldn't be a NOTE?

</emu-note>

<emu-note>
Non-normative: As a result of this constraint, the first time a browser implementation that allows on-demand locale installation receives a request from a particular origin that could require installing a new locale, it must not reveal whether or not that locale is already installed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are notes in 402 not all non-normative?

@ben-allen
Copy link
Contributor Author

ben-allen commented Oct 12, 2023

Most recent push is a minimal change based on TC39 feedback — all I did was move the text that was in notes into the main text of the Implementation Dependencies section. This may, though, be insufficient or otherwise wrong. @littledan @bakkot @sffc

@ben-allen ben-allen requested review from sffc and removed request for sffc November 27, 2023 16:54
@ben-allen ben-allen force-pushed the ships-entire-payload branch from 7edd7fb to 0a5f4b5 Compare March 7, 2024 13:42
@ben-allen ben-allen force-pushed the ships-entire-payload branch from 3ae5d92 to 4f5f712 Compare May 21, 2024 19:11
spec/overview.html Outdated Show resolved Hide resolved
@ben-allen ben-allen force-pushed the ships-entire-payload branch from 4f5f712 to 69d17cb Compare July 28, 2024 16:05
@dminor
Copy link

dminor commented Jul 31, 2024

To follow on from the discussion at TC39 yesterday, the SpiderMonkey team still considers this to be very important. I will ask for review on the current text from our privacy team before the next plenary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Previously Discussed
Development

Successfully merging this pull request may close these issues.

ships the entire payload requirement