Support for 16-bit floating-point values in WGSL
In WGSL, the f16
type is the set of 16-bit floating-point values of the IEEE-754 binary16 (half precision) format. It means that it uses 16 bits to represent a floating-point number, as opposed to 32 bits for conventional single-precision floating-point (f32
). This smaller size can lead to significant performance improvements, especially when processing large amounts of data.
For comparison, on an Apple M1 Pro device, the f16
implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32
implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed as shown in the following screenshots.
Not all GPUs support 16-bit floating-point values. When the "shader-f16"
feature is available in a GPUAdapter
, you can now request a GPUDevice
with this feature and create a WGSL shader module that takes advantage of the half-precision floating-point type f16
. This type is valid to use in the WGSL shader module only if you enable the f16
WGSL extension with enable f16;
. Otherwise, createShaderModule() will generate a validation error. See the following minimal example and issue dawn:1510.
const adapter = await navigator.gpu.requestAdapter();
if (!adapter.features.has("shader-f16")) {
throw new Error("16-bit floating-point value support is not available");
}
// Explicitly request 16-bit floating-point value support.
const device = await adapter.requestDevice({
requiredFeatures: ["shader-f16"],
});
const code = `
enable f16;
@compute @workgroup_size(1)
fn main() {
const c : vec3h = vec3<f16>(1.0h, 2.0h, 3.0h);
}
`;
const shaderModule = device.createShaderModule({ code });
// Create a compute pipeline with this shader module
// and run the shader on the GPU...
It's possible to support both f16
and f32
types in the WGSL shader module code with an alias
depending on the "shader-f16"
feature support as shown in the following snippet.
const adapter = await navigator.gpu.requestAdapter();
const hasShaderF16 = adapter.features.has("shader-f16");
const device = await adapter.requestDevice({
requiredFeatures: hasShaderF16 ? ["shader-f16"] : [],
});
const header = hasShaderF16
? `enable f16;
alias min16float = f16;`
: `alias min16float = f32;`;
const code = `
${header}
@compute @workgroup_size(1)
fn main() {
const c = vec3<min16float>(1.0, 2.0, 3.0);
}
`;
Push the limits
The maximum number of bytes necessary to hold one sample (pixel or subpixel) of render pipeline output data, across all color attachments, is 32 bytes by default. It is now possible to request up to 64 by using the maxColorAttachmentBytesPerSample
limit. See the following example and issue dawn:2036.
const adapter = await navigator.gpu.requestAdapter();
if (adapter.limits.maxColorAttachmentBytesPerSample < 64) {
// When the desired limit isn't supported, take action to either fall back to
// a code path that does not require the higher limit or notify the user that
// their device does not meet minimum requirements.
}
// Request highest limit of max color attachments bytes per sample.
const device = await adapter.requestDevice({
requiredLimits: { maxColorAttachmentBytesPerSample: 64 },
});
The maxInterStageShaderVariables
and maxInterStageShaderComponents
limits used for inter-stage communication have been increased on all platforms. See issue dawn:1448 for details.
For each shader stage, the maximum number of bind group layout entries across a pipeline layout which are storage buffers is 8 by default. It is now possible to request up to 10 by using the maxStorageBuffersPerShaderStage
limit. See issue dawn:2159.
A new maxBindGroupsPlusVertexBuffers
limit has been added. It consists of the maximum number of bind group and vertex buffer slots used simultaneously, counting any empty slots below the highest index. Its default value is 24. See issue dawn:1849.
Changes to depth-stencil state
To improve the developer experience, the depth-stencil state depthWriteEnabled
and depthCompare
attributes are not always required anymore: depthWriteEnabled
is required only for formats with depth, and depthCompare
is not required for formats with depth if not used at all. See issue dawn:2132.
Adapter information updates
Non-standard type
and backend
adapter info attributes are now available upon calling requestAdapterInfo() when the user has enabled the "WebGPU Developer Features" flag at chrome://flags/#enable-webgpu-developer-features
. The type
can be "discrete GPU", "integrated GPU", "CPU", or "unknown". The backend
is either "WebGPU", "D3D11", "D3D12", "metal", "vulkan", "openGL", "openGLES", or "null". See issue dawn:2112 and issue dawn:2107.
The optional unmaskHints
list parameter in requestAdapterInfo() has been removed. See issue dawn:1427.
Timestamp queries quantization
Timestamp queries allow applications to measure the execution time of GPU commands with nanosecond precision. However, the WebGPU specification makes timestamp queries optional due to timing attack concerns. The Chrome team believes that quantizing timestamp queries provides a good compromise between precision and security, by reducing the resolution to 100 microseconds. See issue dawn:1800.
In Chrome, users can disable timestamp quantization by enabling the "WebGPU Developer Features" flag at chrome://flags/#enable-webgpu-developer-features
. Note that this flag alone does not enable the "timestamp-query"
feature. Its implementation is still experimental and therefore requires the "Unsafe WebGPU Support" flag at chrome://flags/#enable-unsafe-webgpu
.
In Dawn, a new device toggle called "timestamp_quantization" has been added and is enabled by default. The following snippet shows you how to allow the experimental "timestamp-query" feature with no timestamp quantization when requesting a device.
wgpu::DawnTogglesDescriptor deviceTogglesDesc = {};
const char* allowUnsafeApisToggle = "allow_unsafe_apis";
deviceTogglesDesc.enabledToggles = &allowUnsafeApisToggle;
deviceTogglesDesc.enabledToggleCount = 1;
const char* timestampQuantizationToggle = "timestamp_quantization";
deviceTogglesDesc.disabledToggles = ×tampQuantizationToggle;
deviceTogglesDesc.disabledToggleCount = 1;
wgpu::DeviceDescriptor desc = {.nextInChain = &deviceTogglesDesc};
// Request a device with no timestamp quantization.
myAdapter.RequestDevice(&desc, myCallback, myUserData);
Spring-cleaning features
The experimental "timestamp-query-inside-passes" feature has been renamed to "chromium-experimental-timestamp-query-inside-passes" to make it clear to developers that this feature is experimental and available only in Chromium-based browsers for now. See issue dawn:1193.
The experimental "pipeline-statistics-query" feature, which was only partially implemented, has been removed because it is no longer being developed. See issue chromium:1177506.
This covers only some of the key highlights. Check out the exhaustive list of commits.
What's New in WebGPU
A list of everything that has been covered in the What's New in WebGPU series.
Chrome 131
- Clip distances in WGSL
- GPUCanvasContext getConfiguration()
- Point and line primitives must not have depth bias
- Inclusive scan built-in functions for subgroups
- Experimental support for multi-draw indirect
- Shader module compilation option strict math
- Remove GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 130
- Dual source blending
- Shader compilation time improvements on Metal
- Deprecation of GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 129
Chrome 128
- Experimenting with subgroups
- Deprecate setting depth bias for lines and points
- Hide uncaptured error DevTools warning if preventDefault
- WGSL interpolate sampling first and either
- Dawn updates
Chrome 127
- Experimental support for OpenGL ES on Android
- GPUAdapter info attribute
- WebAssembly interop improvements
- Improved command encoder errors
- Dawn updates
Chrome 126
- Increase maxTextureArrayLayers limit
- Buffer upload optimization for Vulkan backend
- Shader compilation time improvements
- Submitted command buffers must be unique
- Dawn updates
Chrome 125
Chrome 124
- Read-only and read-write storage textures
- Service workers and shared workers support
- New adapter information attributes
- Bug fixes
- Dawn updates
Chrome 123
- DP4a built-in functions support in WGSL
- Unrestricted pointer parameters in WGSL
- Syntax sugar for dereferencing composites in WGSL
- Separate read-only state for stencil and depth aspects
- Dawn updates
Chrome 122
- Expand reach with compatibility mode (feature in development)
- Increase maxVertexAttributes limit
- Dawn updates
Chrome 121
- Support WebGPU on Android
- Use DXC instead of FXC for shader compilation on Windows
- Timestamp queries in compute and render passes
- Default entry points to shader modules
- Support display-p3 as GPUExternalTexture color space
- Memory heaps info
- Dawn updates
Chrome 120
- Support for 16-bit floating-point values in WGSL
- Push the limits
- Changes to depth-stencil state
- Adapter information updates
- Timestamp queries quantization
- Spring-cleaning features
Chrome 119
- Filterable 32-bit float textures
- unorm10-10-10-2 vertex format
- rgb10a2uint texture format
- Dawn updates
Chrome 118
- HTMLImageElement and ImageData support in
copyExternalImageToTexture()
- Experimental support for read-write and read-only storage texture
- Dawn updates
Chrome 117
- Unset vertex buffer
- Unset bind group
- Silence errors from async pipeline creation when device is lost
- SPIR-V shader module creation updates
- Improving developer experience
- Caching pipelines with automatically generated layout
- Dawn updates
Chrome 116
- WebCodecs integration
- Lost device returned by GPUAdapter
requestDevice()
- Keep video playback smooth if
importExternalTexture()
is called - Spec conformance
- Improving developer experience
- Dawn updates
Chrome 115
- Supported WGSL language extensions
- Experimental support for Direct3D 11
- Get discrete GPU by default on AC power
- Improving developer experience
- Dawn updates
Chrome 114
- Optimize JavaScript
- getCurrentTexture() on unconfigured canvas throws InvalidStateError
- WGSL updates
- Dawn updates