Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content scripts section in specification #542

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
80 changes: 76 additions & 4 deletions specification/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Group: WECG
URL: https://w3c.github.io/webextensions
Editor: Mukul Purohit, Microsoft Corporation https://www.microsoft.com, [email protected]
Editor: Tomislav Jovanovic, Mozilla https://www.mozilla.org/, [email protected]
Editor: Oliver Dunk, Google https://www.google.com, [email protected]
Abstract: [Placeholder] Abstract.
Markup Shorthands: markdown yes
</pre>
Expand All @@ -27,11 +28,11 @@ An optional directory containing strings as defined in <a href="#localization">l

### Other files

An extension may also contain other files, such as those referenced in the <a href="#key-content_scripts">content_scripts</a> and <a href="#key-background">background</a> part of the <a href="#manifest">Manifest</a>.
An extension may also contain other files, such as those referenced in the [[#key-content_scripts]] and [[#key-background]] part of the [=manifest=].
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

## Manifest

A WebExtension must have a manifest file at its root directory.
A WebExtension must have a <dfn>manifest</dfn> file at its root directory.

### Manifest file

Expand Down Expand Up @@ -112,7 +113,7 @@ This key may be present.

#### Key `content_scripts`

This key may be present.
The <a href="#key-content_scripts">`content_scripts`</a> key is a [=list=] of items representing [=content scripts=] that should be registered.

#### Key `content_security_policy`

Expand Down Expand Up @@ -154,6 +155,8 @@ Filenames beginning with an underscore (`_`) are reserved for use by user agent.

## Isolated worlds

<dfn>Worlds</dfn> are isolated JavaScript contexts with access to the same underlying DOM tree but their own set of wrappers around those DOM objects.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

## Unavailable APIs

## The `browser` global
Expand All @@ -172,6 +175,12 @@ Issue(62): Specify localization handling.

## Match patterns

A <dfn>match pattern</dfn> is a pattern used to match URLs.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

## Globs

A <dfn>glob</dfn> can be any [=string=]. It can contain any number of wildcards where * can match zero or more characters and ? matches exactly one character.

## Concepts

### Uniqueness of extension IDs
Expand All @@ -190,7 +199,70 @@ Issue(62): Specify localization handling.

### Content scripts

#### Isolated worlds
<dfn>Content scripts</dfn> represent a set of JS and CSS files that should be injected into pages loaded by the user agent.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `matches`

A [=list=] of [=match patterns=] that are used to decide where the content script runs. This key is required.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `exclude_matches`

A [=list=] of [=match patterns=] that should be used to exclude URLs from where the content script runs.

#### Key `js`

A [=list=] of file paths that should be injected as scripts.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `css`

A [=list=] of file paths that should be injected as stylesheets.

#### Key `all_frames`

If `all_frames` is true, the content script must be injected into subframes. Defaults to false.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `match_about_blank`

If this is `true`, the content script will also be injected into an additional user agent specified set of pages used to represent empty frames. This will only happen if the content script matches the page that embedded the frame. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we (browsers) have different criteria for match_about_blank?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here is too vague. match_about_blank was designed for about:blank and about:srcdoc.

If you're looking for clarity, see https://stackoverflow.com/questions/41408936/can-anyone-explain-that-what-is-the-use-of-match-about-blank-in-chrome-extensi, where I previously posted an answer that describes why match_about_blank exists and what it does.

Other documentation:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rob--W, could you take another look? I've made some tweaks although it's unclear to me what was too vague.


#### Key `match_origin_as_fallback`

Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority:

1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rob--W @rdcronin Would you be able to take a look at this one and confirm if it is accurate? This was my best understanding based on bugs and documentation in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rephrase this.

(The issue with the current language is that:
a] it doesn't specify what the "non-opaque" origin is or where it comes from
b] it doesn't always use the origin of the parent; it uses the initiator (or "precursor"))

If the URL of a document has a specified scheme**, the user agent will fall back to the origin of the initiator instead. This is commonly, but not always, the parent or embedding frame.

** In chrome, these schemes are data:, about:, filesystem:, and blob:. Is that the same in other browsers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Rob--W and @xeenon to request feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverdunk The semantics have extensively been discussed on Chromium's issue tracker where I and Devlin discussed the API design. If you're interested, the start of the discussion is at https://issues.chromium.org/issues/40443085#comment48. The design that is close to what we have now was sketched in https://issues.chromium.org/issues/40443085#comment61 , with the final name (match_origin_as_fallback) at https://issues.chromium.org/issues/40443085#comment67. Devlin summarized the discussion at https://issues.chromium.org/issues/40443085#comment71

Upon reviewing the proposed texts here, I think that there is some confusion on terminology. The current text mentions blob URLs as an opaque origin, but that is not the case.

Relevant to content script matching is the URL of the document (which can have an origin component) and the origin of the document (as a security principal). There may not always be an obvious relation between the two:

  • URLs may have visible origin parts in it, such as http(s) and also blob: and (Chrome-only) filesystem: (e.g. blob:https://example.com/UUID).
  • URLs may not have a visible origin in it (about:blank and about:srcdoc), but still have a non-opaque origin: commonly the opener of the frame or window is another http(s) URL. Or even any number of about:blank/srcdoc documents where the first was initially opened by a http(s) origin.
  • The security principal of a document can be an opaque origin, even if the URL of that document looks like it has a non-opaque origin. This happens with <iframe sandbox> or sandbox directive in the Content-Security-Policy. A content script can use window.origin to see whether the origin is opaque, as it would serialize to "null".
    • In case of opaque origins, there is almost always a non-opaque initiator that opened the frame. The term "precursor origin" is used here <iframe sandbox="allow-scripts" src="https://example.com">
  • The exception is when the initiator of the navigation does not have a non-opaque origin. For example, when the user navigates to a data:-URL or to about:blank. Since data:-URL
    • Chrome does currently not run content scripts in these documents.
    • Firefox currently allows content scripts with matches for all URLs AND match_about_blank: true to run scripts in top-level about:blank. This is not documented anywhere though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both, writing the description for these two keys has taken by far the most time in this PR. I've given it another attempt and would appreciate any feedback.

As a general note, concepts like the precursor origin and security principal don't appear to be defined in any other specifications. It seems like they are more informal terms used often in implementations and by implementors. With that in mind, I've tried to describe them as best as possible without talking about them by name.

A few additional notes:

  • I've added an informal note to match_about_blank describing the Firefox behavior for top-level about:blank pages.
  • I've added a note that the path must be a wildcard if match_origin_as_fallback is set. This is the behavior today in Chrome. Interestingly, we don't have any restrictions on include_globs or exclude_globs. This feels like an omission to me and I wonder if we should specify something.
  • In Chrome, sandboxing doesn't seem to be relevant. We always apply these fallbacks, even if the parent is inaccessible to the child frame. With that in mind, I haven't mentioned it here.

Clearly there's a lot of detail here so please let me know if I've missed anything or it could be clearer.

1. If available, use the origin of the parent frame.
1. Otherwise, no origin is found and this frame can never be matched.

#### Key `run_at`

Specifies when the content script should be injected. Valid values are `document_start`, `document_end` and `document_idle`.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `include_globs`

A list of [=globs=] that a page should match in addition to matches.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Key `exclude_globs`

A list of [=globs=] that should be used to exclude URLs from where the content script runs.

#### Key `world`

The [=world=] any JavaScript scripts should be injected into. Defaults to `ISOLATED`. Valid values are `MAIN` and `ISOLATED`.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

#### Injecting a content script

Issue: If the same extension specifies the same script twice, what should happen? ([bug](https://crbug.com/324096753))

Issue: The below algorithm needs to be updated to include `match_about_blank` and `match_origin_as_fallback`.

To determine if a content script should be injected in a frame:

1. If the extension does not have access to the origin, return.
1. If the origin is not included in `matches`, return.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved
1. If `include_globs` is present and the origin is not matched, return.
1. If the origin matches an entry in `exclude_matches` or `exclude_globs`, return.
1. If this is a frame, and `all_frames` is not `true`, return.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved
1. Otherwise, inject the content script. This should be done based on the `run_at` setting.
oliverdunk marked this conversation as resolved.
Show resolved Hide resolved

### Extension pages

Expand Down
Loading