[TOC]
The goal of this document is to improve the understanding of the input event system in the desktop Chrome UI.
The Chrome UI system handles input events (typically key downs or mouse clicks) in three stages:
- At the very beginning, the OS generates a native input event and sends it to a Chrome browser window.
- Then, the Chrome Windowing system receives the event and converts it into a
platform-independent ui::Event. It then sends the event to the Views system.
The interaction with IME (Input Method, for non-English text input) is also handled at this stage. - Lastly, the Views system sends the event to the control that expects to receive it. For example, a character key event will insert one letter to a focused textfield.
Aura is the window abstraction layer used on Windows, Linux, and ChromeOS. An event goes through several phases in Aura and is eventually passed into views.
Phase 0 - DesktopWindowTreeHost
After the user presses a key or clicks the mouse, the OS generates a low-level input event and pumps it into a message loop. After some low-level os-specific plumbing, the event is then delivered to a DesktopWindowTreeHost that hosts a native window and handles events in DesktopWindowTreeHost::DispatchEvent().
Phase 1 - EventProcessor pre-dispatch
Next, the event is passed to a WindowEventDispatcher which is an EventProcessor owned by the DesktopWindowTreeHost. On ChromeOS, some ui::EventRewriters may rewrite the event before passing.
An EventProcessor delivers the event to the right target. It provides a root EventTarget and a default EventTargeter. EventTargeter is responsible for finding the event target. An EventTarget can also provide an EventTargeter and the EventProcessor prefers its root EventTarget’s targeter over the default targeter.
The EventProcessor delivers the event to the first target found by the targeter. If the event is not marked as handled, it will ask the targeter to find the next target and repeat the procedure until the event is handled.
The EventProcessor can also have pre- and post-dispatch phases that happen before and after the event is dispatched to the target.
In the case of WindowEventDispatcher, it has a pre-dispatch phase for different types of events.
- For mouse move events, it may synthesize and dispatch a ET_MOUSE_EXITED event to notify that mouse exits from previous UI control.
- For key events, it forwards the key to ui::InputMethod::DispatchKeyEvent() and the event will be handled there. Depending on IME involvement, later phases of WindowEventDispatcher may be SKIPPED. Details are explained later.
If the event is not marked handled in the EventProcessor pre-dispatch phase, it will be passed to the target. For key events, the target is the aura::Window that owns the focus. For mouse events, the target is the aura::Window under the cursor.
Each browser window comes with an aura::Window. It is worth noting that the web content and dialog bubble live in their own aura::Windows. These different aura::Windows treat accelerators differently and the detail will be explained later.
Phase 2 - EventTarget pre-target
Like EventProcessor, an EventTarget consumes the event in three phases. it owns one target handler and optionally multiple pre-targets and post-target handlers. An event will first be passed to pre-target handlers, and if not consumed by them, then to the default target handler, and lastly to post-target handlers.
Non-content aura::Window uses pre-handler to forward key events to FocusManager in views. If the key is an accelerator, the event will be intercepted and later phases will be SKIPPED.
Mouse events at present are not processed in pre-handlers. Content aura::Window does not have any pre-handlers, either.
Phase 3 - EventTarget regular
At this phase, non-content aura::Window asks (Desktop)NativeWidgetAura::OnEvent() to handle the event.
DesktopNativeWidgetAura is the native implementation of a top-level Widget. Non top-level widgets, e.g. dialog bubble, use NativeWidgetAura instead. The native widget then passes the event to Widget::OnMouseEvent(), Widget::OnClickEvent(), or other Widget methods depending on the event type. The event is then handled in views and is explained in a later section.
Content aura::Window instead asks RenderWidgetHostViewAura::OnEvent() to handle the event. The event will be sent to Blink and then to BrowserCommandController if not consumed by the webpage. Some important shortcuts, e.g. Ctrl+T, are preserved and will not be sent to the web page.
Phase 4 - EventTarget post-target
This phase is not effective in window abstraction.
Phase 5 - EventProcessor post-dispatch
For touch events, WindowEventDispatcher may recognize the event as a gesture event and dispatch it.
Key event handling and IME interoperability
We mentioned in phase 1 pre-dispatch that a key event may be consumed in this phase and no later phases. This is because we need to interact with IME through InputMethod::DispatchKeyEvent() in pre-dispatch.
If the IME accepts this key event, Chrome will stop any further event handling
because IMEs have their own interpretation to the event. Instead, Chrome
exits
phase 1 with a fake VKEY_PROCESSKEY event indicating the event has been
processed by IME, and waits for new events emitted by IME and handles them
accordingly. For example, Chrome on Linux
listens
for the GTK preedit-changed
event that indicates a change in the composition
text.
If the IME does not accept this key event, WindowEventDispatcher will re-enter phase 1 but with IME explicitly skipped, so that the event can be passed to phase 2 where accelerators are handled.
MacViews is an umbrella term that covers the broader effort to adopt views in Chrome Mac. Before this, Chrome Mac was using native Cocoa controls. In this document, we use MacViews to refer to the windows abstraction part of Chrome Mac.
Mac does not use Aura and is significantly different from Aura in that it hosts native NSWindow in RemoteCocoa that talks to views through a mojo interface. This design allows RemoteCocoa to either live within the browser process or in a separate process for PWA mode. This design is largely due to the requirement of PWAs on Mac. [ref]
Mac’s event handling borrows heavily from Cocoa’s Event architecture but applies its own handling where appropriate.
During startup ChromeBrowserMainParts will kick off NSApp’s main run loop that will continue to service Chrome application event messages for the life of the program. These messages are picked up by BrowserCrApplication (NSApplication subclass) and for the most part forwarded to the appropriate NativeWidgetMacNSWindow (NSWindow subclass).
A key departure from how typical Cocoa applications are architected is that Chrome uses a single root NSView (the BridgedContentView) as the contentView for it’s NSWindow. This view is largely responsible for adapting native NSEvents and funneling them through to the Views framework.
The below two examples demonstrate two key event flows through the Cocoa layers of Chrome through to the Views framework.
The below diagram demonstrates points of interest during dispatch of a right mouse down event on a Chrome browser window button.
Summary:
- The Window Server is responsible for determining which NSWindow a mouse event belongs to.
- Once the NSWindow has been identified the Window Server will place the mouse down event in Chrome’s BrowserCrApplication (NSApplication) event queue.
- BrowserCrApplication’s main run loop reads from the event queue.
- BrowserCrApplication delivers the event to the NativeWidgetMacNSWindow (NSWindow) which delivers the mouseDown event to its root NSView contentView.
- BridgedContentView aggregates all mouse related NSResponder messages (rightMouseDown, mouseMoved, leftMouseUp etc) into the mouseEvent: method.
- The mouseEvent method performs NSEvent conversion into ui::Event and sends the event to the NativeWidgetMacNSWindowHost’s OnMouseEvent() method.
- BridgedContentView communicates to the NativeWidgetMacNSWindowHost via a
bridge.
- NativeWidgetMacNSWindowHost implements a Mojo remote remote_cocoa::mojom::NativeWidgetNSWindowHost such that the BridgedContentView and the NativeWidgetMacNSWindowHost can communicate via message passing (needed in the case these exist across process boundaries).
- NativeWidgetMac owns a NativeWidgetMacNSWindowHost instance.
The following demonstrates key points of interest in the event flow that occurs when a user presses a character key with the intention to enter text into the browser’s omnibox.
Summary:
- The Window Server will deliver key events to the CrBrowserApplication’s (NSApplication) event queue.
- Provided the keyDown event is not a key equivalent or keyboard interface control, the BrowserCrApplication sends the event to NativeWidgetMacNSWindow (NSWindow) that is associated with the first responder.
- The window dispatches the event as a keyDown event to it’s first responder (in this case the BridgedContentView which serves as the NSWindow’s contentView).
- BridgedContentView inherits from NSTextInputClient which is required for Chrome to interact properly with Cocoa’s text input management system.
- BridgedContentView forwards the keyEvent to
interpretKeyEvents:
method.
- This invokes Cocoa’s input management system.
- This checks the pressed key against all key-binding dictionaries.
- If there is a match in the keybinding dictionary it sends a doCommandBySelector: message back to the view. (commands include insertTab, insertNewline, insertLineBreak, moveLeft etc).
- If no command matches it sends an insertText: message back to the BridgedContentView.
- BridgedContentView
converts the NSString to UFT16 and sends it through to it’s TextInputHost.
- TextInputHost implements the remote Mojo interface remote_cocoa::mojom::TextInputHost and BridgedContextView communicates with the TextInputHost via Mojo message passing.
- The TextInputHost calls
InsertText()
on it’s
ui::TextInputClient.
- This should be the TextInputClient of the currently focused view.
The Window Abstraction layer will pass the input event to Views. Views is Chrome’s (mostly) platform-independent UI framework that orchestrates UI elements in a tree structure. Every node in the views tree is a View, which is a UI element similar to an HTML DOM element.
A Widget hosts the views tree and is a window-like surface that draws its content onto a canvas provided by the underlying window abstraction. Every widget can have at most one focused view which is tracked by a FocusManager owned by the widget.
The root of Views tree is a RootView, which is a special subclass of View that helps bridging between children views and the wrapping Widget.
Suppose the omnibox is focused and the user presses down a key, say character ‘a’. How is this key routed through the system and delivered to the omnibox? We will only study the stack after the Window Abstraction layer passes the event to Views.
Surprisingly, the stack that the event needs to go through is not deep. On Aura, it can be summarized as a path of Widget -> RootView -> focused View. RootView will ask FocusManger for the focused view and the event will be directly delivered to it. There is no tree traversal. The details can be broken down into:
- Widget::OnKeyEvent():
The key event will be passed from platform-dependent NativeWidget to Widget.
The event is then processed by EventRewriter(s) attached to the Widget. Rewriters are used only on ChromeOS. - EventProcessor::OnEventFromSource():
The key event is then passed to the root view of the widget. Note that a
RootView is an EventProcessor.
An EventProcessor is a multicaster. It tries to deliver the event to multiple targets until the event is marked as handled: It delegates the target enumeration task to an EventTargeter.
In the case of RootView, the EventTargeter is a ViewTargeter. A ViewTargeter will return the currently focused view as the target for the key event. - EventHandler::OnEvent(): OmniboxViewViews receives the event. Every view is an EventHandler and OmniboxViewViews is no exception.
- OmniboxViewViews::HandleKeyEvent(): Eventually where the event is consumed and the text in the omnibox gets updated.
On Mac, the event will be funneled directly from the Window Abstraction layer to the focused view, i.e. Widget and RootView are not involved in event routing. This is achieved by having NativeWidgetMacNSWindowHost save a pointer to the focused view in a TextInputHost. NativeWidgetMac registers itself as an observer of focus change in FocusManager and updates NativeWidgetMacNSWindowHost’s TextInputHost on focus change.
In the case when the keystroke is an accelerator, for example, a Ctrl+T to open a new tab, the focused view is not the expected view to handle it.
On Aura, when the focus is not on the web content, accelerators are handled by a pre-handler that happens before the handler for character keys. This pre-handlers is hooked on aura::Window in the Window Abstraction layer. If a key event is consumed as accelerators, the character key handler path will be skipped.
In the accelerator path, the FocusManager relays the event from Window Abstraction to views::AcceleratorManager, and finally to chrome::BrowserCommandController.
Not all accelerators are handled by views::AcceleratorManager. A notable exception is the Tab key which is used to switch the focused view and will be handled directly by the FocusManager.
If the focus is on the web content, the event will be handled by RenderWidgetHostViewEventHandler where the web content has the priority to receive most keys except for important navigation accelerators.
Mac interprets accelerators early in the event pipeline. Key Equivalents like Cmd+T will be handled early in the Window Abstraction layer and the event never goes into Views.
Views like textfield will grab focus on a click event.
After the Widget has received the event from the window abstraction, the RootView traverses the views tree from leaf to root and looks for the first view that accepts the click event. Here, the leaf view is the lowest descendant view that contains the cursor location of the event. The task of searching for this view is delegated to ViewTargeter.
When the click event is dispatched to a Textfield, it will notify the FocusManager to update the focused view.
A focused control receives keyboard events. Focus change can be triggered by mouse click or pressing Tab key to switch to the next focusable view. Chrome usually draws a FocusRing around the focused view. Focus rings are drawn separate from its view drawing.
On all desktop platforms other than Mac, all the controls should be focusable by default. However, on Mac, by default, only TextField and List are focusable. Other controls such as Button, Combobox are not focusable. Full keyboard access in system settings needs to be enabled to have navigation of all controls on the screen.
To comply with Mac’s behavior, Chrome sets kDefaultFocusBehavior in the platform style to ACCESSIBLE_ONLY on Mac as opposed to ALWAYS on other platforms. Controls that are marked as ACCESSIBLE_ONLY will be skipped in search of the next view to focus if the Full keyboard access is off.
Windows and Linux have more resemblance to handling focus on the window level. They both have a ‘top-level’ window concept. A top-level window is a window that has no parent. Chrome uses only top-level windows and the focus changing event is only triggered between top-level windows.
Windows
Chrome observes WM_ACTIVATE messages to monitor the active window change [ref].
The event is then routed up through the NativeWidget (DesktopNativeWidgetAura) and eventually to Widget. Chrome then uses its own focus management to further route those focus events to the view which is supposed to have the actual keyboard focus.
Linux
Chrome on Linux observes the platform native focus change events FocusIn and FocusOut [ref] to respond to focus change events. Nevertheless, eventually just like on Windows, the events will be interpreted as active window change.
Mac
On Mac, the ‘key window’ is the window that receives keyboard events. Chrome observes the key status change by implementing windowDidBecomeKey on an NSWindow and handles according [ref], which is similar to Windows’ active window change.
We have almost identical focus handling in views across different platforms. In views, a widget owns a FocusManager who manages the focus for all the widgets in this tree. FocusManager handles proper routing of keyboard events, including the handling of keyboard accelerators. The FocusManager also handles focus traversal among child widgets in addition to between Views.