A computer system with a user agent at the front-end and GPU at the back-end has components working on different timelines in parallel:

Content timeline
Associated with the execution of the Web script. It includes calling all methods described by this specification.
Steps executed on the content timeline look like this.

Device timeline
Associated with the GPU device operations that are issued by the user agent. It includes creation of adapters, devices, and GPU resources and state objects, which are typically synchronous operations from the point of view of the user agent part that controls the GPU, but can live in a separate OS process.
Steps executed on the device timeline look like this.

Queue timeline
Associated with the execution of operations on the compute units of the GPU. It includes actual draw, copy, and compute jobs that run on the GPU.
Steps executed on the queue timeline look like this.





In this specification, asynchronous operations are used when the result value depends on work that happens on any timeline other than the Content timeline. They are represented by callbacks and promises in JavaScript.



  1. User encodes a dispatch command by calling a method of the GPUComputePassEncoder which happens on the Content timeline.
  2. User issues GPUQueue.submit() that hands over the GPUCommandBuffer to the user agent, which processes it on the Device timeline by calling the OS driver to do a low-level submission.
  3. The submit gets dispatched by the GPU invocation scheduler onto the actual compute units for execution, which happens on the Queue timeline.
  1. 用户通过调用发生在内容时间轴上的GPUComputePassEncoder方法对调度命令进行编码。
  2. 用户发出GPUQueue.submit(),将GPUCommandBuffer移交给用户代理,用户代理通过调用操作系统驱动程序进行低级提交,在设备时间轴上对其进行处理。
  3. 提交由GPU调用调度器调度到实际的计算单元上执行,这发生在队列时间线上。


  1. User fills out a GPUBufferDescriptor and creates a GPUBuffer with it, which happens on the Content timeline.
  2. User agent creates a low-level buffer on the Device timeline.
  1. 用户填写一个GPUBufferDescriptor并用它创建一个GPUBuffer,这发生在内容时间轴上。
  2. 用户代理在设备时间轴上创建低级缓冲区。


  1. User requests to map a GPUBuffer on the Content timeline and gets a promise in return.
  2. User agent checks if the buffer is currently used by the GPU and makes a reminder to itself to check back when this usage is over.
  3. After the GPU operating on Queue timeline is done using the buffer, the user agent maps it to memory and resolves the promise.
  1. 用户请求将GPUBuffer映射到内容时间轴上,并得到一个promise作为返回值。
  2. 用户代理检查缓冲区当前是否由GPU使用,并提醒自己在使用结束时检查。
  3. 在GPU使用缓冲区对队列时间轴进行操作之后,用户代理将其映射到内存并resolves the promise。

  • Y-axis is up in normalized device coordinate (NDC): point(-1.0, -1.0) in NDC is located at the bottom-left corner of NDC. In addition, x and y in NDC should be between -1.0 and 1.0 inclusive, while z in NDC should be between 0.0 and 1.0 inclusive. Vertices out of this range in NDC will not introduce any errors, but they will be clipped.

  • Y-axis is down in framebuffer coordinate, viewport coordinate and fragment/pixel coordinate: origin(0, 0) is located at the top-left corner in these coordinate systems.

  • Window/present coordinate matches framebuffer coordinate.

  • UV of origin(0, 0) in texture coordinate represents the first texel (the lowest byte) in texture memory.

  • Y轴在归一化设备坐标(NDC)中向上:NDC中的点(-1.0,-1.0)位于NDC的左下角。此外,NDC中的x和y应介于-1.0和1.0之间,而NDC中的z应介于0.0和1.0之间。NDC中超出此范围的顶点不会引入任何错误,但它们将被剪裁。

  • Y轴在帧缓冲区坐标、视口坐标和片段/像素坐标中向下:原点(0,0)位于这些坐标系的左上角。

  • 窗口/当前坐标与帧缓冲区坐标匹配。

  • 纹理坐标中的UV原点(0,0)表示纹理内存中的第一个texel(最低字节)。

Note: WebGPU’s coordinate systems match DirectX and Metal’s coordinate systems in a graphics pipeline.


If an object is successfully created, it is valid at that moment. An internal object may be invalid. It may become invalid during its lifetime, but it will never become valid again.


Invalid objects result from a number of situations, including:
 If there is an error in the creation of an object, it is immediately invalid. This can happen, for example, if the object descriptor doesn’t describe a valid object, or if there is not enough memory to allocate a resource.
 If an object is explicitly destroyed (e.g. GPUBuffer.destroy()), it becomes invalid.
 If the device that owns an object is lost, the object becomes invalid.


To determine if a given GPUObjectBase object is valid to use with a targetObject, run the following steps:
 1.If any of the following conditions are unsatisfied return false:
  object is valid
  object.[[device]] is valid.
  object.[[device]] is targetObject.[[device]].
 2.Return true.


3.1.1. Dot Syntax

In this specification, the . (“dot”) syntax, common in programming languages, is used. The phrasing “Foo.Bar” means “the Bar member of the value (or interface) Foo.”


The ?. (“optional chaining”) syntax, adopted from JavaScript, is also used. The phrasing “Foo?.Bar” means “if Foo is null or undefined, Foo; otherwise, Foo.Bar”.


For example, where buffer is a GPUBuffer, buffer?.[[device]].[[adapter]] means “if buffer is null or undefined, then undefined, otherwise, the [[adapter]] internal slot of the [[device]] internal slot of buffer.


3.1.2. Internal Objects

An internal object is a conceptual, non-exposed WebGPU object. Internal objects track the state of an API object and hold any underlying implementation. If the state of a particular internal object can change in parallel from multiple agents, those changes are always atomic with respect to all agents.


Note: An “agent” refers to a JavaScript “thread” (i.e. main thread, or Web Worker).


3.1.3. WebGPU Interfaces

A WebGPU interface is an exposed interface which encapsulates an internal object. It provides the interface through which the internal object’s state is changed.


As a matter of convention, if a WebGPU interface is referred to as invalid, it means that the internal object it encapsulates is invalid.


Any interface which includes GPUObjectBase is a WebGPU interface.


interface mixin GPUObjectBase {
attribute USVString? label;

GPUObjectBase has the following attributes:


label, of type USVString, nullable
A label which can be used by development tools (such as error/warning messages, browser developer tools, or platform debugging utilities) to identify the underlying internal object to the developer. It has no specified format, and therefore cannot be reliably machine-parsed.

In any given situation, the user agent may or may not choose to use this label.

label, USVString类型,可为空


GPUObjectBase has the following internal slots:

[[device]], of type device, readonly
An internal slot holding the device which owns the internal object.

[[device]], 类型device, 只读

3.1.4. Object Descriptors

An object descriptor holds the information needed to create an object, which is typically done via one of the create* methods of GPUDevice.


dictionary GPUObjectDescriptorBase {
USVString label;

GPUObjectDescriptorBase has the following members:

label, of type USVString
The initial value of GPUObjectBase.label.


label, USVString类型

2.8.1. Machine-specific limits

WebGPU can expose a lot of detail on the underlying GPU architecture and the device geometry. This includes available physical adapters, many limits on the GPU and CPU resources that could be used (such as the maximum texture size), and any optional hardware-specific capabilities that are available.


User agents are not obligated to expose the real hardware limits, they are in full contol of how much the machine specifics are exposed. One strategy to reduce fingeprinting is binning all the target platforms into a few number of bins. In general, the privacy impact of exposing the hardware limits matches the one of WebGL.


The default limits are also deliberately high enough to allow most application to work without requesting higher limits. All the usage of the API is validated according to the requested limits, so the actual hardware capabilities are not exposed to the users by accident.


2.8.2. Machine-specific artifacts

There are some machine-specific rasterization/precision artifacts and performance differences that can be observed roughly in the same way as in WebGL. This applies to rasterization coverage and patterns, interpolation precision of the varyings between shader stages, compute unit scheduling, and more aspects of execution.


Generally, rasterization and precision fingerprints are identical across most or all of the devices of each vendor. Performance differences are relatively intractable, but also relatively low-signal (as with JS execution performance).


Privacy-critical applications and user agents should utilize software implementations to eliminate such artifacts.


2.8.3. Machine-specific performance

Another factor for differentiating users is measuring the performance of specific operations on the GPU. Even with low precision timing, repeated execution of an operation can show if the user’s machine is fast at specific workloads. This is a fairly common vector (present in both WebGL and Javascript), but it’s also low-signal and relatively intractable to truly normalize.


WebGPU compute pipelines expose access to GPU unobstructed by the fixed-function hardware. This poses an additional risk for unique device fingerprinting. User agents can take steps to dissociate logical GPU invocations with actual compute units to reduce this risk.


2.8.4. User Agent State

This specification doesn’t define any additional user-agent state for an origin. However it is expected that user agents will have compilation caches for the result of expensive compilation like GPUShaderModule, GPURenderPipeline and GPUComputePipeline. These caches are important to improve the loading time of WebGPU applications after the first visit.


For the specification, these caches are indifferentiable from incredibly fast compilation, but for applications it would be easy to measure how long createComputePipelineAsync() takes to resolve. This can leak information across origins (like “did the user access a site with this specific shader”) so user agents should follow the best practices in storage partitioning.


The system’s GPU driver may also have its own cache of compiled shaders and pipelines. User agents may want to disable these when at all possible, or add per-partition data to shaders in ways that will make the GPU driver consider them different.


WebGPU provides access to constrained global resources shared between different programs (and web pages) running on the same machine. An application can try to indirectly probe how constrained these global resources are, in order to reason about workloads performed by other open web pages, based on the patterns of usage of these shared resources. These issues are generally analogous to issues with Javascript, such as system memory and CPU execution throughput. WebGPU does not provide any additional mitigations for this.


2.7.1. Memory resources

WebGPU exposes fallible allocations from machine-global memory heaps, such as VRAM. This allows for probing the size of the system’s remaining available memory (for a given heap type) by attempting to allocate and watching for allocation failures.


GPUs internally have one or more (typically only two) heaps of memory shared by all running applications. When a heap is depleted, WebGPU would fail to create a resource. This is observable, which may allow a malicious application to guess what heaps are used by other applications, and how much they allocate from them.


2.7.2. Computation resources

If one site uses WebGPU at the same time as another, it may observe the increase in time it takes to process some work. For example, if a site constantly submits compute workloads and tracks completion of work on the queue, it may observe that something else also started using the GPU.


A GPU has many parts that can be tested independently, such as the arithmetic units, texture sampling units, atomic units, etc. A malicious application may sense when some of these units are stressed, and attempt to guess the workload of another application by analyzing the stress patterns. This is analogous to the realities of CPU execution of Javascript.


WebGPU applications have access to GPU memory and compute units. A WebGPU implementation may limit the available GPU memory to an application, in order to keep other applications responsive. For GPU processing time, a WebGPU implementation may set up “watchdog” timer that makes sure an application doesn’t cause GPU unresponsiveness for more than a few seconds. These measures are similar to those used in WebGL.


When uploading floating-point data from CPU to GPU, or generating it on the GPU, we may end up with a binary representation that doesn’t correspond to a valid number, such as infinity or NaN (not-a-number). The GPU behavior in this case is subject to the accuracy of the GPU hardware implementation of the IEEE-754 standard. WebGPU guarantees that introducing invalid floating-point numbers would only affect the results of arithmetic computations and will not have other side effects.


2.5.1. Driver bugs

GPU drivers are subject to bugs like any other software. If a bug occurs, an attacker could possibly exploit the incorrect behavior of the driver to get access to unprivileged data. In order to reduce the risk, the WebGPU working group will coordinate with GPU vendors to integrate the WebGPU Conformance Test Suite (CTS) as part of their driver testing process, like it was done for WebGL. WebGPU implementations are expected to have workarounds for some of the discovered bugs, and disable WebGPU on drivers with known bugs that can’t be worked around.


2.5.2. Timing attacks

WebGPU is designed for multi-threaded use via Web Workers. As such, it is designed not to open the users to modern high-precision timing attacks. Some of the objects, like GPUBuffer or GPUQueue, have shared state which can be simultaneously accessed. This allows race conditions to occur, similar to those of accessing a SharedArrayBuffer from multiple Web Workers, which makes the thread scheduling observable.

WebGPU设计用于通过Web Workers进行多线程使用。因此,它的设计不会让用户受到现代高精度定时攻击。一些对象(如GPUBuffer或GPUQueue)具有可同时访问的共享状态。这允许出现竞争条件,类似于从多个Web Workers访问SharedArrayBuffer的情况,这使得线程调度可以观察到。

WebGPU addresses this by limiting the ability to deserialize (or share) objects only to the agents inside the agent cluster, and only if the cross-origin isolated policies are in place. This restriction matches the mitigations against the malicious SharedArrayBuffer use. Similarly, the user agent may also serialize the agents sharing any handles to prevent any concurrency entirely.


In the end, the attack surface for races on shared state in WebGPU will be a small subset of the SharedArrayBuffer attacks.


WebGPU also specifies the “timestamp-query” feature, which provides high precision timing of GPU operations. The feature is optional, and a WebGPU implementation may limit its exposure only to those scenarios that are trusted. Alternatively, the timing query results could be processed by a compute shader and aligned to a lower precision.


2.5.3. Row hammer attacks

Row hammer is a class of attacks that exploit the leaking of states in DRAM cells. It could be used on GPU. WebGPU does not have any specific mitigations in place, and relies on platform-level solutions, such as reduced memory refresh intervals.


Shaders can access physical resources either directly (for example, as a “uniform” GPUBufferBinding), or via texture units, which are fixed-function hardware blocks that handle texture coordinate conversions. Validation on the API side can only guarantee that all the inputs to the shader are provided and they have the correct usage and types. The host API side can not guarantee that the data is accessed within bounds if the texture units are not involved.


In order to prevent the shaders from accessing GPU memory an application doesn’t own, the WebGPU implementation may enable a special mode (called “robust buffer access”) in the driver that guarantees that the access is limited to buffer bounds.

为了防止着色器访问应用程序不拥有的GPU内存,WebGPU实现可以在驱动程序中启用一种特殊模式(称为”robust buffer access”),以确保访问仅限于缓冲区边界内。

Alternatively, an implementation may transform the shader code by inserting manual bounds checks. When this path is taken, the out-of-bound checks only apply to array indexing. They aren’t needed for plain field access of shader structures due to the minBindingSize validation on the host side.


If the shader attempts to load data outside of physical resource bounds, the implementation is allowed to:
1.return a value at a different location within the resource bounds
2.return a value vector of “(0, 0, 0, X)” with any “X”
3.partially discard the draw or dispatch call

3.部分放弃the draw or dispatch call

If the shader attempts to write data outside of physical resource bounds, the implementation is allowed to:
1.write the value to a different location within the resource bounds
2.discard the write operation
3.partially discard the draw or dispatch call

3.部分放弃the draw or dispatch call

Generally, allocating new memory may expose the leftover data of other applications running on the system. In order to address that, WebGPU conceptually initializes all the resources to zero, although in practice an implementation may skip this step if it sees the developer initializing the contents manually. This includes variables and shared workgroup memory inside shaders.


The precise mechanism of clearing the workgroup memory can differ between platforms. If the native API does not provide facilities to clear it, the WebGPU implementation transforms the compute shader to first do a clear across all invocations, synchronize them, and continue executing developer’s code.
