I recently spent a weekend to learn more about an exciting new web technology called WebAssembly, and built a hand simulator for texas holdem. I wrote the code for the simulator in C++ and then ported it to WebAssembly for use in the browser. I wanted to document what it's like working with WebAssembly, and walk through a project from start to finish.

Basics

WebAssembly is a new binary format for executing code on the web, allowing for much faster execution times in some cases. As of right now, you can port code written in C and C++ to run inside a current web browser. WebAssembly is very attrative, but doesn't make sense for every single project, take a look at some of the limitiations to get a better idea.

WebAssembly is being created as an open standard to be fast, efficient, and portable. WebAssembly code can be executed at near-native speed across different platforms by taking advantage of common hardware capabilities. It is specified to be run in a safe, sandboxed execution environment. Like other web code, it will enforce the browser's same-origin and permissions policies. If you're not familiar with the concepts of WebAssembly, I would start here for excellent resources and tutorials.

Setup

The best tool I could find for compiling to wasm is Emscripten. It takes LLVM bitcode (which can be generated from C/C++), and compiles that into JavaScript, which can be run on the web. With Emscripten, C/C++ developers don’t have the high cost of porting code manually to JavaScript — or having to learn JavaScript at all. Web developers also benefit, as they can use the many thousands of pre-existing native utilities and libraries in their sites.


        C++  =>  LLVM  =>  Emscripten  =>  JS
      

Setting up the Emscripten SDK is extrememly simple, and the instructions are very easy to follow. After you've installed the SDK you should now be able to run emcc on your command line.

Building Code

There isn't really anything diffcult or confusing about compiling the C++ code into wasm. Built-in support is available for a number of standard libraries: libc, libc++ and SDL. These will automatically be linked when you compile code that uses them (you don't even need to add -lSDL). If your project uses other libraries, for example zlib or glib, you will need to build and link them. The normal approach is to build the libraries to bitcode and then compile library and main program bitcode together to JavaScript.

There is a large amount of "glue" code required for JavaScript and the compiled web assembly modules to work with eachother. Emscripten generates JavaScript that handles memory allocation, memory leaks, and a host of other problems. What we are seeing is the Emscripten generated runtime wrapper that is responsible for loading the wasm file, and supplying the supporting functionality to make the C standard library work. This is required to set up the resizable ArrayBuffer that contains the linear array of bytes read and written by WebAssembly’s lowlevel memory access instructions. Without the generated Emscripten JavaScript you will need to do this yourself which becomes a little bit tricky.


        emcc sim.cpp -O3 -s WASM=1 -s EXPORTED_FUNCTIONS="['_run']" -o sim.js
      

There are many command line arguments that can be used with emcc, the above command just utilizes the most common ones and they're described below.

  • -s EXPORTED_FUNCTIONS="['_run']" tells emcc which methods we'd like to access from JavaScript. Note that these names are from the source, and are prefixed with an underscore. The "glue" code will make reference to these names so they can be called in JavaScript. In the C++ code I've written, there is a function called run which runs a texas holdem hand simulation.
  • -s WASM=1 specifies that we want wasm output, instead of asm.js source.
  • -o sim.js tells emcc to generate the wasm module and the JavaScript "glue" code. It also specifies the output name to use and to compile and instantiate the wasm so it can be used in the web environment.

One very useful environment variable that can be used is EMCC_DEBUG. It forces the Emscripten compiler to log all of it's build steps. This is useful if you see random failures, and you want some help in figuring out what is breaking. The debug logs and intermediate files are output to TEMP_DIR/emscripten_temp, where TEMP_DIR is by default /tmp (it is defined in the .emscripten configuration file).

Simulating poker hands can be split into different threads using web workers. The worker code is fairly straightforward. We extend the Emscripten generated module library to handle any calls from our C++ code, and from our HTML page we load a new worker and keep track of them so we can send and receive messages.


        const MAX_WORKERS = Math.min(navigator.hardwareConcurrency || 4, 8);
        const WORKERS = [];

        // Load the workers
        for (let i = 0; i < MAX_WORKERS; i+=1) WORKERS.push(new Worker("worker.js"));
      

You can see above that there is a method 'cc' which is called from the C++ code to handle any returned values. Another, more efficiant way to load the workers would be to pass the compiled wasm bytes directly.


       fetch('sim.wasm').then(response =>
         response.arrayBuffer()
       ).then(bytes =>
         WebAssembly.compile(bytes)
       ).then(mod => {
         worker1.postMessage(mod);
         worker2.postMessage(mod);
         // ...
       });
      

In order to use any Emscripten macros in your code you need to include the Emscripten header (first line). For more information see here.


        #include "emscripten.h"

        // This calls the JavaScript worker which in turn calls postMessage with the data back to the main thread
        EM_ASM({
          Module.cc([$0]);
        }, totalSimulations);
      

Another important thing to note is that each worker must load the generated module code from Emscripten. Communication between the main thread and the workers is done through the postMessage API. Keep in mind that you can pass any value or JavaScript object handled by the structured clone algorithm, which includes cyclical references, so you can not pass functions.


        let Simulate;

        // The onmessage property of the Worker interface represents an EventHandler when the message event occurs
        onmessage = (e) => {
          const mod = e.data;

          Simulate(mod[0], mod[1], mod[2], mod[3], mod[4], simCount); // Pass the data into the C++ method
        };

        // Overrides for the generated emcc script, module gets redifined later
        let Module = {
          cc: (data) => {
            // This is the method that gets called from within our C++ module
          },

          onRuntimeInitialized: () => {
            // This corresponds to the EXPORTED_FUNCTIONS command from emcc
            Simulate = Module.cwrap('run', 'number', ['array', 'array', 'number', 'array', 'number','number']);
          }
        };

        // This loads the wasm generated glue code
        importScripts('sim.js');
      

Loading WebAssembly

Since we are not using a self contained wasm module, but instead let emcc generate the JavaScript glue code for us, loading the web assembly is as easy as including the generated JavaScript in our page. If we were using the SIDE_MODULE option in Emscripten we could load out web assembly ourselves, which involves setting up the memory for our application.


        async function createWebAssembly(path, importObject) {
          const bytes = await window.fetch(path).then(x => x.arrayBuffer());
          return WebAssembly.instantiate(bytes, importObject);
        }
      

You would also need to specify an import object: this provides the environment Web Assembly runs in as well as any other parameters to instantiation. For more information, see here.

Emscripten has an option called ONLY_MY_CODE which can be used on the command line. This will tell Emscripten to disable any linking of standard libraries. This pipeline will turn any undefined references to functions from the C++ file into wasm imports, which you'll need to manually hook up to JavaScript functions. JavaScript doesn't have these methods — either not with the same signatures or names (e.g, Math.atan in JavaScript vs atan in C), or because it's conceptually different (think malloc vs JavaScript's objects and garbage collection), so Emscripten has to provide them for you.


        const memory = new WebAssembly.Memory({initial: 256, maximum: 256});
        const env = {
          'abortStackOverflow': _ => { throw new Error('overflow'); },
          'table': new WebAssembly.Table({initial: 0, maximum: 0, element: 'anyfunc'}),
          'tableBase': 0,
          'memory': memory,
          'memoryBase': 1024,
          'STACKTOP': 0,
          'STACK_MAX': memory.buffer.byteLength,
        };
        const importObject = {env};
      

This environment configures the memory available to Web Assembly. You can see that you would also need to setup the table. Tables make it possible to have function pointers, but in a way that isn’t vulnerable to attacks by referencing memory locations directly. A table is an array that lives outside of WebAssembly’s memory, and the values are references to functions. They were added to the spec to support these function pointers, because C and C++ rely on them.

Conclusion

WebAssembly is a very exciting new technology, and working with it is extrememly simple using tools like Emscripten. Not every project is going to be suited for porting to WebAssembly, but overtime that will likely change. I'm looking forward to seeing how WebAssembly evolves over time, and what types of things developers will create with it. There are already many resources online descriving WebAssembly in great detail, one ofthe best that I've come across is from Mozilla. If you want to see the code from the texas holdem simulator referenced in this post, please take a look here.