Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Sec-CH-UA-Form-Factor a future proof way of selecting content #344

Closed
nielsbasjes opened this issue Jul 31, 2023 · 9 comments
Closed

Comments

@nielsbasjes
Copy link

nielsbasjes commented Jul 31, 2023

As proposed here by @djmitche (CC: @miketaylr) a new issue to discuss my ideas for a possible future proof way for the Sec-CH-UA-Form-Factor header.

My overall proposal is to make the Sec-CH-UA-Form-Factor header a clear future proof indicator on the kind of content and interaction is suitable for the device/client the user is using at the moment. So in addition to indicating the things we see today it should also be extensible into future developments (like contactlens screens, brain interaction, ...).

From the browser perspective this header is its indicator to tell the website what is the best content it can consume. I think this should be present on the first request. Which makes it something that should be allowed into the low entropy group and cannot contain any detail about the device.

So in the below proposal for this header is where I'm trying to stay at useful detail without going into fingerprinting detail (which is a tricky balance).

Note that this also makes the Sec-CH-UA-Mobile effectively an extreme simplification of this header (perhaps in time even obsolete).

Essentially a site needs to know from the device:

  • What kind of output/content does the device support?
    • The screen size (watch, phone ... TV ... )
    • Can it do VR ? Can it mix the surroundings with the VR content (i.e. supports VR & AR & MR --> XR).
    • Is it a technically limited screen (i.e. eInk does not support fast changes like animations)
  • What kind of input/interaction does the device support?
    • Keyboard, Mouse, Touch, Gesture, ...
    • Perhaps also include kinds of sensors: Camera, Orientation, GPS, ...
  • What kind of attention from the user can you expect?
    • High like in a browser or game
    • Medium like in a TV (a question is asked and then I stand up to get the remote)
    • Low like in a car (I'm driving ...)

More extensively the kinds of values I have in mind:

The Output capabilities/ScreenType/Size indicator

  • None: No screen, Headless, Server-to-Server, etc.
  • Watch: A (usually handheld, usually touch) screen < 2"
  • Phone: A (usually handheld, usually touch) screen between 2" and 7"
  • Tablet: A (usually handheld, usually touch) screen between 7" and 14"
  • Desktop: A (usually movable but not handheld, usually no touch) screen between 15" and 30"
  • TV: A fixed (usually wall mounted, no touch) large screen > 32"
  • VR: A VR Headset that CANNOT mix the images from the outside world in view. So only suitable for VR content
  • XR: A VR Headset that CAN mix the images from the outside world in view. So suitable for all VR/AR/MR content which is commonly called XR.
  • eInk: A slow tablet sized display that only has 2 colors/greyscale and does not support fast changes like animations, movies and games.

NOTE: The above list needs discussion as it still mixes screen size and content capabilities.

The Interaction capabilities indicator (multi valued)

  • None: No screen, Headless, Server-to-Server ... so no human interactions.
  • Keyboard: A keyboard interaction
  • Mouse: A mouse interaction
  • Touch: A touch screen
  • Game: A gamepad type controller (mini joysticks) as used on Playstation, Xbox, Nintendo switch, etc. suitable for fast interactions.
  • Remote: A controller with only arrow keys, Ok and Cancel buttons: as used with many TVs and Set top boxes (like the "Google Chromecast with Google TV" and "Apple TV"). Only suitable for slow interactions.
  • Gesture: A device that looks at gestures and motion of the user.
  • Voice: A voice controlled device.
  • Camera: It has a camera
  • Orientation: It has an orientation sensor
  • Location: It has location information (GPS and such)

If multiple interaction capabilities are present these should be always in alphabetical order to reduce fingerprinting.

The Attention level indicator

  • None: No screen, Headless, Server-to-Server ... so no human attention at all.
  • Low: You can expect to not gat a response at all from the user as they have other priorities. Common usage: Car
  • Medium: The user should be able to respond within a minute. Common usage: TV (time to find the remote)
  • High: The user should be able to respond within 1 or 2 seconds. Common usage: Normal Websites, Gaming

Kinds of field values I would expect

Device Screen/Content Interaction Attention
Watch Watch Touch;Location;Orientation High
Phone Phone Touch;Location;Orientation;Camera High
Tablet Tablet Touch;Location;Orientation;Camera High
Amazon Echo Tablet Touch Medium
PS5 TV Game High
Nintendo Switch Phone Game High
Tesla Tablet Touch;Location Low
Google TV TV Remote Medium
Apple Vision Pro XR Gesture;Orientation;Camera High
PS4 VR Headset VR Gesture;Orientation High
PS5 VR Headset XR Gesture;Orientation;Camera High

Note that it may seem to add a lot of entropy but I think it doesn't because just about all phones will have the same list here and from the other headers it is already known to be a phone. Same for all Tablets, all game consoles, etc.

To be discussed: How to fit this into the header.

@Sora2455
Copy link

Sora2455 commented Aug 1, 2023

Note that it may seem to add a lot of entropy but I think it doesn't because just about all phones will have the same list here and from the other headers it is already known to be a phone. Same for all Tablets, all game consoles, etc.

This logic is backwards - if most people have similar values and then a small group have very different values, then this makes it far easier to identify that small group.

@miketaylr
Copy link
Collaborator

I think this should be present on the first request. Which makes it something that should be allowed into the low entropy group

Given the high-entropy nature of this proposal (i.e., there's a lot of info here that can't be passively derived), I don't think we would send it by default. And +1 to what @Sora2455 is saying.

As to "Output capabilities/ScreenType/Size indicator" - this is basically the original proposal of the form-factor hint. Should we really be sending more info/bytes over the wire than this (interaction/attn) for use cases that aren't interested in it (e.g., analytics)?

@djmitche
Copy link
Contributor

djmitche commented Aug 1, 2023

I like that this is getting down to the "facts" about the UA/device, rather than interpretations or industry trends. And, I think the three dimensions outlined here are good: Output, Interaction, and Attention.

I'd like to get more feedback from other sites that might want to use this value. While that's in progress, a few comments of my own:

  • There are already screen-size hints (https://github.com/WICG/responsive-image-client-hints/blob/main/Viewport-Height-Explainer.md), so to some extent the size attribute of the Output dimension is redundant. I realize those are in pixels, but perhaps there's enough information there? If so, then there can be many fewer Output options.
  • For Interaction, it seems there's a chance for this to bleed into an enumeration of sensor APIs on the device: Temperature? Heart-Rate? etc. Maybe we could limit it to the "primary" interaction methods. In practice, I think that means dropping "Camera", "Orientation", and "Location" from the list above.
  • To @miketaylr's point about Interaction and Attention being separate from form-factor: I agree that those probably don't belong in a hint entitled "form factor", but I can imagine they are equally or more useful to a site than the Output dimension. Some more feedback from people out there building sites would be useful here. We can decide later whether those dimensions are included in Sec-CH-UA-Form-Factor or in some other header(s).
  • I agree with @Sora2455 and @miketaylr regarding the entropy -- every time some new, interesting device is introduced, this hint will become high-entropy when the 100's or 1000 early-adopters are the only users possessing that device. Sites that really need this information on the first request can use the Critical-CH header to get that (at a small penalty of re-starting that connection). But the information certainly doesn't need to be sent automatically with every request the browser makes.

@nielsbasjes
Copy link
Author

Note that it may seem to add a lot of entropy but I think it doesn't because just about all phones will have the same list here and from the other headers it is already known to be a phone. Same for all Tablets, all game consoles, etc.

This logic is backwards - if most people have similar values and then a small group have very different values, then this makes it far easier to identify that small group.

Ah yes, I see your point.

@miketaylr
Copy link
Collaborator

Looking forward to the feedback you get @djmitche. With my editor hat on, my preference is to keep it simple and informed by use cases, and possibly add additional info in the future.

@djmitche
Copy link
Contributor

Classifying the use-cases I've uncovered so far:

  • Analytics - understanding the distribution of how users are experiencing a site, at a higher level than enumerating UA versions
  • Variants - extending the "mobile vs. desktop" distinction to more than 2 choices
  • Wrappers - a few cases where a particular browser implementation (e.g., Chrome for Android) is embedded in some larger context (e.g., a TV). The default UA for that implementation is misleading, and implementers would like a more "honest" and future-compatible approach to fixing it than just sticking more words into the UA string.

The analytics cases are wanting to cover the universe of user agents contacting a site. The variants and wrappers cases are user agents wanting a way to say "this is new and different in a way that is significant to at least one site and probably others".

To the three dimensions:

  • Output - largely redundant to the existing size-related hints, with the exception of XR and eInk.
  • Interaction - seems like the information needed for the use-case classes above is mostly on this dimension.
  • Attention - reactions are pretty lukewarm on this. It seems neat, but doesn't address any particular use-cases.

To Mike's point about simplicity, it seems reasonable to drop the Output and Attention dimensions, leaving only a single dimension. The spec language should probably describe what that represents as a way of constraining what new values are added.

That leaves:

  • What to call it - maybe "form factor" is good enough, and maybe it's not too important since really only computers will see the value day-to-day
  • Allowed values and how to add more.
  • How to represent it - I think the options are
    • Single value - UA must pick the most suitable of the available options
    • Set of values - UA picks all applicable values, without any ordering between them
    • Ranked values - either a list with order being significant, or using quality values

I have some opinions on those but would be interested to hear reactions to this summary before voicing them.

@djmitche
Copy link
Contributor

Perhaps I'm looking for feedback too early (in an issue, before making a PR to a draft spec!). Here are my opinions, which I will formulate into the existing PR, #343.

  • Sec-CH-UA-Form-Factor is a good (enough) name.
  • The list in Make Sec-CH-UA-Form-Factor a list, add meanings #343, based on that from @miketaylr, is still pretty good. In keeping with editor-miketaylr, let's collapse the *R into just XR and, in the absence of use-cases for it identified so far in this conversation, drop "TV".
    • "Desktop" refers to a user-agent running on a personal computer, typically equipped with a large screen, keyboard, and pointing device.
    • "Automotive" refers to a user-agent embedded in a vehicle, where the user may be responsible for operating the vehicle and unable to attend to small details.
    • "Mobile" refers to small, touch-oriented device typically carried on a user’s person.
    • "Tablet" refers to a touch-oriented device larger than "Mobile" and not typically carried on a user’s person.
    • "XR" refers to immersive devices that augment or replace the environment around the user.
  • New values can be added when there is
    • a new class of device that users interact with in a meaningfully different way,
    • a compelling use-case where sites would like to change how they interact with users on that device, and
    • no reliable way to identify that new class of device using existing hints.
  • Some values may encompass others. For example, if we later decide to add "AR", it's likely that any device including the "AR" form-factor in the hint would also include "XR". This helps the hint to evolve as we invent more fine-grained form-factors.
  • The hint should contain all applicable values in an sf-list, ordered lexically. Sites should test this list for the presence of values of interest, without regard to the order of the list. Ordering the values lexically prevents distinguishing two user-agents by otherwise-meaningless differences in order. I don't think ranked values helps in any of the identified use-cases.

@arichiv gave a nice illustration of how allowing values to encompass others, and representing a set of values, supports future expansion.

@miketaylr
Copy link
Collaborator

LGTM as a plan @djmitche

@djmitche
Copy link
Contributor

djmitche commented Sep 6, 2023

Fixed in #343, but I can't close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants