Published: February 26, 2025
Improve machine-learning workloads with subgroups
After a year of development and trials, the subgroups WebGPU feature enabling SIMD-level parallelism is now available. It allows threads in a workgroup to communicate and execute collective math operations, such as calculating a sum of numbers, and offers an efficient method for cross-thread data sharing. See the original proposal and chromestatus entry.
For reference, Google Meet saw 2.3-2.9 times speed increases when benchmarking subgroups against packed integer dot products for matrix-vector multiply shaders on some devices during the origin trial.
When the "subgroups"
feature is available in a GPUAdapter
, request a GPUDevice
with this feature to get subgroups support in WGSL. It's helpful to check subgroupMinSize
and subgroupMaxSize
adapter info values—for example, if you have a hardcoded algorithm that requires a subgroup of a certain size.
You also need to explicitly enable this extension in your WGSL code with enable subgroups;
to get access to the following built-in values in both compute and fragment shaders stages:
subgroup_invocation_id
: A built-in value for the index of the thread within the subgroup.subgroup_size
: A built-in value for subgroup size access.
The numerous subgroup built-in functions (for example, subgroupAdd()
, subgroupBallot()
, subgroupBroadcast()
, subgroupShuffle()
) enable efficient communication and computation between invocations within a subgroup. These subgroup operations are classified as single-instruction multiple-thread (SIMT) operations. Additionally, the quad built-in functions, which operate on a quad of invocations facilitate data communication within the quad.
You can use f16 values with subgroups when you request a GPUDevice
with both "shader-f16"
and "subgroups"
features.
The following sample is a good starting point for exploring subgroups: it shows a shader that uses the subgroupExclusiveMul()
built-in function to compute factorials without reading or writing memory to communicate intermediate results.
Remove float filterable texture types support as blendable
Now that the 32-bit float textures blending is available with the "float32-blendable"
feature, the incorrect support for float filterable texture types as blendable is removed. See issue 364987733.
Dawn updates
Dawn now requires macOS 11 and iOS 14 and only supports Metal 2.3+. See issue 381117827.
The new GetWGSLLanguageFeatures()
method of the wgpu::Instance
now replaces EnumerateWGSLLanguageFeatures()
. See issue 368672124.
The following binding types have an Undefined
value and their default values in binding layout have been changed. See issue 377820810.
wgpu::BufferBindingType::Undefined
is nowUniform
wgpu::SamplerBindingType::Undefined
is nowFiltering
wgpu::TextureSampleType::Undefined
is nowFloat
wgpu::StorageTextureAccess::Undefined
is nowWriteOnly
This covers only some of the key highlights. Check out the exhaustive list of commits.
What's New in WebGPU
A list of everything that has been covered in the What's New in WebGPU series.
Chrome 134
- Improve machine-learning workloads with subgroups
- Remove float filterable texture types support as blendable
- Dawn updates
Chrome 133
- Additional unorm8x4-bgra and 1-component vertex formats
- Allow unknown limits to be requested with undefined value
- WGSL alignment rules changes
- WGSL performance gains with discard
- Use VideoFrame displaySize for external textures
- Handle images with non-default orientations using copyExternalImageToTexture
- Improving developer experience
- Enable compatibility mode with featureLevel
- Experimental subgroup features cleanup
- Deprecate maxInterStageShaderComponents limit
- Dawn updates
Chrome 132
- Texture view usage
- 32-bit float textures blending
- GPUDevice adapterInfo attribute
- Configuring canvas context with invalid format throw JavaScript error
- Filtering sampler restrictions on textures
- Extended subgroups experimentation
- Improving developer experience
- Experimental support for 16-bit normalized texture formats
- Dawn updates
Chrome 131
- Clip distances in WGSL
- GPUCanvasContext getConfiguration()
- Point and line primitives must not have depth bias
- Inclusive scan built-in functions for subgroups
- Experimental support for multi-draw indirect
- Shader module compilation option strict math
- Remove GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 130
- Dual source blending
- Shader compilation time improvements on Metal
- Deprecation of GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 129
Chrome 128
- Experimenting with subgroups
- Deprecate setting depth bias for lines and points
- Hide uncaptured error DevTools warning if preventDefault
- WGSL interpolate sampling first and either
- Dawn updates
Chrome 127
- Experimental support for OpenGL ES on Android
- GPUAdapter info attribute
- WebAssembly interop improvements
- Improved command encoder errors
- Dawn updates
Chrome 126
- Increase maxTextureArrayLayers limit
- Buffer upload optimization for Vulkan backend
- Shader compilation time improvements
- Submitted command buffers must be unique
- Dawn updates
Chrome 125
Chrome 124
- Read-only and read-write storage textures
- Service workers and shared workers support
- New adapter information attributes
- Bug fixes
- Dawn updates
Chrome 123
- DP4a built-in functions support in WGSL
- Unrestricted pointer parameters in WGSL
- Syntax sugar for dereferencing composites in WGSL
- Separate read-only state for stencil and depth aspects
- Dawn updates
Chrome 122
- Expand reach with compatibility mode (feature in development)
- Increase maxVertexAttributes limit
- Dawn updates
Chrome 121
- Support WebGPU on Android
- Use DXC instead of FXC for shader compilation on Windows
- Timestamp queries in compute and render passes
- Default entry points to shader modules
- Support display-p3 as GPUExternalTexture color space
- Memory heaps info
- Dawn updates
Chrome 120
- Support for 16-bit floating-point values in WGSL
- Push the limits
- Changes to depth-stencil state
- Adapter information updates
- Timestamp queries quantization
- Spring-cleaning features
Chrome 119
- Filterable 32-bit float textures
- unorm10-10-10-2 vertex format
- rgb10a2uint texture format
- Dawn updates
Chrome 118
- HTMLImageElement and ImageData support in
copyExternalImageToTexture()
- Experimental support for read-write and read-only storage texture
- Dawn updates
Chrome 117
- Unset vertex buffer
- Unset bind group
- Silence errors from async pipeline creation when device is lost
- SPIR-V shader module creation updates
- Improving developer experience
- Caching pipelines with automatically generated layout
- Dawn updates
Chrome 116
- WebCodecs integration
- Lost device returned by GPUAdapter
requestDevice()
- Keep video playback smooth if
importExternalTexture()
is called - Spec conformance
- Improving developer experience
- Dawn updates
Chrome 115
- Supported WGSL language extensions
- Experimental support for Direct3D 11
- Get discrete GPU by default on AC power
- Improving developer experience
- Dawn updates
Chrome 114
- Optimize JavaScript
- getCurrentTexture() on unconfigured canvas throws InvalidStateError
- WGSL updates
- Dawn updates