> ..here is the summary of the additions in version 2.0 of the language:
Vector instructions: With a massive 236 new instructions — more than the total number Wasm had before — it now supports 128-bit wide SIMD (single instruction, multiple data) functionality of contemporary CPUs, like Intel’s SSE or ARM’s SVE. This helps speeding up certain classes of compute-intense applications like audio/video codecs, machine learning, and some cryptography.
Bulk memory instructions: A set of new instructions allows faster copying and initialization of regions of memory or ranges of tables.
Multi-value results: Instructions, blocks, and functions can now return more than one result value, sometimes supporting faster calling conventions and avoiding indirections. In addition, block instructions now also can have inputs, enabling new program transformations.
Reference types: References to functions or pointers to external objects (e.g., JavaScript values) become available as opaque first-class values. Tables are repurposed as a general storage for such reference values, and new instructions allow accessing and mutating tables in Wasm code. In addition, modules now may define multiple tables of different types.
Non-trapping conversions: Additional instructions allow the conversion from float to integer types without the risk of trapping unexpectedly.
Sign extension instructions: A new group of instructions allows directly extending the width of signed integer value. Previously that was only possible when reading from memory.
> Instructions, blocks, and functions can now return more than one result value, sometimes supporting faster calling conventions and avoiding indirections.
Unfortunately, despite being "enabled", Rust+LLVM don't take advantage of this because of ABI compatibility mess. I don't know whether the story on Clang's side is similar.
Between functions there might be a performance advantage, but as wasm VMs do more things like runtime inlining (which becomes more and more important with wasm GC and the languages that compile to it), that benefit goes away.
I figured out the way to get multi-value results on GCC for 32-bit ARM. Use a union to pack two 32-bit values into a 64-bit value. Return the 64-bit value. Then use a union to split the 64-bit value into two 32-bit values. I haven't tested it on other 32-bit architectures though.
"As a result there is no longer any possible method of writing a function in Rust that returns multiple values at the WebAssembly function type level."
You can have an ISA sufficiently generic to run on any CPU, or one sufficiently specific to efficiently exploit SIMD on a particular CPU. Never both. That's why some platforms provider higher-level operations, like element-wise multiplication of packed arrays. I can't see whether the actual WASM2 SIMD instructions are sufficiently generic because apparently I'm rate-limited on GitHub (???) and therefore can't see the spec.
Values are hardwired to 128 bits which can be i8x16/i16x8/i32x4/i64x2 or f32x4/f64x2, so that already limits the 'feature surface' drastically.
IMHO as long as it covers the most common use cases (e.g. vec4 / mat4x4 floating point math used in games and a couple of common ALU and bit-twiddling operations on integers) that's already quite a bit better than having to fall back to scalar math.
They were sufficient for me to implement most of `string.h` and get speedups between 4 and 16x vs “portable musl C code,” including sophisticated algorithms such as this one:
http://0x80.pl/notesen/2016-11-28-simd-strfind.html
> apparently I'm rate-limited on GitHub (???) and therefore can't see the spec.
Are you also on Firefox? I've been getting those 429s a lot over the past week or so. I haven't changed my configuration other than I'm religious about the "check for updates" button, but I cannot imagine a world in which my release-branch browser is a novelty. No proxies, yes I run UBO but it is disabled for GH
premature optimization is the root of all evil and this SIMD mess could have been implemented so much more elegantly if they just followed the general variable size flexible vector proposal
One can always compile CL to CPS, then "returning" is just calling the current continuation, and then passing multiple values to the current continuation is trivial (since that's always possible). Since WASM is single-threaded there is not concurrency risk with using closures so extensively, though one pays the full price of call/cc when implementing this way, which means that the stack becomes a heap, which is not great for performance.
They also say at the end "In a future post we will take a look at Wasm 3.0, which is already around the corner at this point!" so I suppose the Wasm 3.0 is coming very soon?
Wasm 2.0 Completed - https://webassembly.org/news/2025-03-20-wasm-2.0/
> ..here is the summary of the additions in version 2.0 of the language:
Vector instructions: With a massive 236 new instructions — more than the total number Wasm had before — it now supports 128-bit wide SIMD (single instruction, multiple data) functionality of contemporary CPUs, like Intel’s SSE or ARM’s SVE. This helps speeding up certain classes of compute-intense applications like audio/video codecs, machine learning, and some cryptography.
Bulk memory instructions: A set of new instructions allows faster copying and initialization of regions of memory or ranges of tables.
Multi-value results: Instructions, blocks, and functions can now return more than one result value, sometimes supporting faster calling conventions and avoiding indirections. In addition, block instructions now also can have inputs, enabling new program transformations.
Reference types: References to functions or pointers to external objects (e.g., JavaScript values) become available as opaque first-class values. Tables are repurposed as a general storage for such reference values, and new instructions allow accessing and mutating tables in Wasm code. In addition, modules now may define multiple tables of different types.
Non-trapping conversions: Additional instructions allow the conversion from float to integer types without the risk of trapping unexpectedly.
Sign extension instructions: A new group of instructions allows directly extending the width of signed integer value. Previously that was only possible when reading from memory.