Box2D Physics & WebGL


Your browser does not support the canvas tag. This is a static example of what would be seen.

I’ve been meaning to spend some time looking into javascript physics engines, and the possibility we could connect them to WebGL for rendering, for a while. After some research I came to the conclusion that Box2DWeb was the physics engine most likely to yield interesting results, and most likely to let me build something more complex easily once I’d gotten to know the basics.

Box2DWeb is available from here and I found this short tutorial very useful when getting started.

I’ll quickly describe what I’ve done here…

First we have to pull a lot of stuff into the global namespace, otherwise we end up using very long names for everything and our code becomes fairly unreadable. Placing this block of code at the top of our script simplifies things somewhat.

  var b2Vec2 = Box2D.Common.Math.b2Vec2
    , b2BodyDef = Box2D.Dynamics.b2BodyDef
    , b2Body = Box2D.Dynamics.b2Body
    , b2FixtureDef = Box2D.Dynamics.b2FixtureDef
    , b2Fixture = Box2D.Dynamics.b2Fixture
    , b2World = Box2D.Dynamics.b2World
    , b2MassData = Box2D.Collision.Shapes.b2MassData
    , b2PolygonShape = Box2D.Collision.Shapes.b2PolygonShape
    , b2CircleShape = Box2D.Collision.Shapes.b2CircleShape
    , b2DebugDraw = Box2D.Dynamics.b2DebugDraw

The physics objects exist in a world, so the first thing any Box2D app needs to do is create the world. Making a world is fairly simple as it happens.

  world = new b2World(new b2Vec2(0, 10), true); // gravity, allowSleep

The first parameter when building a world is gravity which we set to 10 to approximate real world gravity.

Once we have a world we can create and add rigid bodies to it. In this simple example we have a fixed (static) box at the bottom of the scene and a dynamic ball at the top. The code for these two doesn’t look all that different as it happens. Each sets up a definition of a fixture and body and then issues calls to create them both. The box uses a different definition of the shape (fixture in Box2D terminology) and is flagged as a b2_staticBody rather than b2_dynamicBody meaning it cannot move.

For each rigid body we create we also get to add some user data, which can be anything we like. I’m going to use that to bind the objects to our WebGL rendering so we know how to draw the various bodies when the time comes. In this case I record the scale and a reference to the mesh we want to render, so we take a unit circle (radius = 1.0) and apply a scale of 0.1, matching the radius of the circle we ask Box2D to create.

This is code we use to create the ball in this example…

  function createBall() {
    var fixDef = new b2FixtureDef;
    var bodyDef = new b2BodyDef;

    fixDef.density = 1.0;
    fixDef.friction = 0.5;
    fixDef.restitution = 0.2;
         
    // *** create dynamic circle object ***
    bodyDef.type = b2Body.b2_dynamicBody;
    // user data
    bodyDef.userData = { };
    bodyDef.userData.scaleX = 0.1;
    bodyDef.userData.scaleY = 0.1;
    bodyDef.userData.mesh = circleMesh;
    // position.
    bodyDef.position.x = 0.0;
    bodyDef.position.y = -1.0;
    fixDef.shape = new b2CircleShape(bodyDef.userData.scaleX);
    // make-it!
    sphereBody = world.CreateBody(bodyDef);
    sphere = sphereBody.CreateFixture(fixDef);
  }

Bear in mind that we need to think about coordindate systems here. We want the coordinate systems of the rendering and physics to match. Our rendering is going to use untransformed (clip space) vertex positions, so we just use the same in Box2D, meaning the top and left edges of the canvas are -1.0 and the right and bottom are 1.0.

Now we need to ‘tick’ the world to trigger physics processing. This block of code does that for us, where my sample ensures a fixed update of 60hz so we can just hard code that.

  var frameRate = 1.0 / 60.0;
  world.Step(frameRate, 10, 10);
  world.ClearForces();

Finally we want to render the contents of the world. Box2D provides a mechanism for iterataing over the bodies in the world, getting the position and rotation of each, and querying for any user data. This already gives us all we need. If we have user data we can go ahead and draw the object. The position and rotation, when combined with the scale we stored in our user data give us all the information we need to build a matrix to represent to transformations the shader needs to apply in order to locate the vertices in the correct positions. Once we have that we can bind our matrix to a shader, grab the mesh from the user-data and issue a draw call.

  var obj = world.GetBodyList();
  while (obj) {
    var pos = obj.GetPosition();
    var angle = obj.GetAngle();
    var userData = obj.GetUserData();
    if (userData != null) {
      mat4.identity(modelMtx);
      mat4.translate(modelMtx, [pos.x, pos.y, 0.0], modelMtx);
      mat4.rotateZ(modelMtx, angle, modelMtx);
      mat4.scale(modelMtx, [userData.scaleX, userData.scaleY, 1.0], modelMtx);
      // set shader constant here...
      var mesh = userData.mesh;
      // draw mesh here...
    }
    obj = obj.GetNext();
  }

Deferred Lighting in WebGL


Your browser does not support the canvas tag. This is a static example of what would be seen.

Light Rotation X
Light Rotation Y

This post describes an attempt to build a simple form of deferred lighting in WebGL. To keep things simple I'm going to stick to a single directional light and will instead focus on the render-target setup and how we feed data to our lighting shader.

Populating a GBuffer

First we need a GBuffer. This is a buffer that holds the parameters needed by our lighting shaders. In it's simplest form we need a set of buffers that together hold an unlit surface colour (albedo), a surface normal, and a depth value, and for now this should give us all we need. There are other properties that are useful such as roughness, metalness, emissive lighting, etc, but I won't focus on them here.

We need a representation of a depth within our GBuffer so we can reconstruct positions in our lighting shader. WebGL doesn't support depth render targets out of the box. We can instead do this by setting up a standard RGBA 32 bit target and then with a shader packing the depth value into the RGB channels as a 24 bit value. The following 2 functions would allow us to achieve that.

		
vec3 packFloat8bitRGB(float val) { 
  // 24bit encoding, 0-1 depth
  vec3 pack = vec3(1.0, 255.0, 65025.0) * val;
  pack = fract(pack);
  pack -= vec3(pack.yz / 255.0, 0.0);
  return pack;
}
float unpackFloat8bitRGB(vec3 pack) {
  return dot(pack, vec3(1.0, 1.0 / 255.0, 1.0 / 65025.0));
}

Rather than do that though, I'm going to use a WebGL extension. WebGL supports an extension called WEBGL_depth_texture which provides access to a depth render-target which can be attached to the FBO's and used like a standard depth buffer. Not every device supports it, but a lot do. We can access that extension like so...

		
var glDepthTextureExt = gl.getExtension("WEBGL_depth_texture");

Then these two snippets of code show how we build a depth texture, and then how we attach that to an FBO.

		
  texture = gl.createTexture();
  gl.bindTexture(gl.TEXTURE_2D, texture);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
  gl.texImage2D(gl.TEXTURE_2D, 0, gl.DEPTH_STENCIL, width, height, 0, 
    gl.DEPTH_STENCIL, glDepthTextureExt.UNSIGNED_INT_24_8_WEBGL, null);
		
  gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.DEPTH_STENCIL_ATTACHMENT, gl.TEXTURE_2D, texture, 0); 

With that in place we can look to populate our depth buffer. WebGL doesn't allow us to write to multiple render-targets at the same time (MRT) so we are going to end up with two passes, each populating RGBA targets with colour and then normal data, each with an attached depth target. Colour values can simply be read from a texture and copied into the colour target. For now we don't need any more than that. For the normals though we need something a little more complex. First we are going to store the normals in eye space (relative to the camera orientation) as this simplifies some of the work we'll do later, and we are also going to sample a normal map and incorporate the normals into our final normal along with the interpolated vertex normals. For this I'm going to use the following shader code, where v_ indicates a varying passed from the vertex shader and u_ indicates a uniform (shader constant).

	
  vec3 N = v_eyeNormal.xyz;
  vec4 normalMap = texture2D(normalSampler, v_tc0.xy);
  vec3 Npixel = normalMap.xyz * 2.0 - 1.0;
  vec3 biNormal = cross(v_eyeNormal, v_eyeTangent);
  Npixel = (v_eyeTangent * Npixel.x + biNormal * Npixel.y + v_eyeNormal * Npixel.z);
  N += (Npixel * u_normalMapScale);
  N = normalize(N);

Note that we also need to apply an offset and scale to the normal before storing into a target in order to pack the -1 to 1 range into the 0 to 1 range of our texture.

The Lighting Shader

For our deferred directional light all we need to do is render a quad across the viewport and then for each pixel pickup our GBuffer parameters and calculate the lighting contribution. For a directional light the only additional data we need are some properties of the light, say direction and colour, and the inverse of the projection matrix so that we can reconstruct the position from a depth value. The light direction needs to be in eye space before we upload to the shader to match the GBuffer parameters, like this...

	
  mat4.multiplyVec3(viewMtx3x3, worldSpaceLightDirection, eyeSpaceLightDirection);

The shader needs to be able to turn a depth value into an eye space position. To do this we can reconstruct a clip space position and back project into view space. We already uploaded the inverse projection matrix, so that code looks like this...

  float clipDepth = -1.0 + texture2D(depthSampler, texCoord.xy).r * 2.0;
  vec4 eyePos = u_projInvMtx * vec4(clipPosXY, clipDepth, 1.0);
  eyePos.xyz = eyePos.xyz / eyePos.w;

Even though I've not really got any meaningful data to throw at it I'm going to try to implement a PBR compatible lighting shader. I won't go into the detail of how this works but there is a lot of very detailed documentation available online covering the various parts of this implementation, for example here and here. My own code looks like this...

    // [10e-5 to avoid /0 for Fs where NdotV == 0]
  float NdotV = max(dot(N, V), 10e-5);
  float NdotL = max(dot(N, L), 0.0);
  vec3 BRDF = vec3(0.0, 0.0, 0.0);
  if (NdotL > 0.0)
  {
    vec3 F = F_schlick(VdotH, F0);
      // [0.005 to avoid sub pixel highlights where M == 0]
    float D = D_trowbridgeReitzCGX(NdotH, max(Msquared, 0.005));
    float G = G_smithSchlick(NdotL, NdotV, M);
    vec3 Fs = (F * D * G) / (4.0 * NdotL * NdotV);
    BRDF = (Kd + Fs) * lightColour * NdotL;
  }

In that code snippet L is the light direction, V is a vector pointing towards the viewer, and N is the surface normal.

For reference the implementations of the 3 parts of the PBR lighting look like this:

vec3 F_schlick(float VdotH, vec3 F0) {
  return F0 + (1.0 - F0) * pow(1.0 - VdotH, 5.0);		
}
float D_trowbridgeReitzCGX(float NdotH, float Msquared) {
  float alpha = Msquared * Msquared;
  float t = ((NdotH * NdotH) * (alpha - 1.0) + 1.0);
  return alpha / (3.14159265359 * t * t);
}
float G_smithSchlick(float NdotL, float NdotV, float M) {
  float k = (0.8 + 0.5 * M);
  k *= k;
  k *= 0.5;
  float gv = NdotV / (NdotV * (1.0 - k) + k);
  float gl = NdotL / (NdotL * (1.0 - k) + k);
  return gv * gl;
}

One final thing to note is that the material textures used here were sourced form freepbr.com.

Decoding RW2 Raw Image Files

After owning a Panasonic Lumix digital camera for a while, and recently deciding to shoot images in raw rather than jpeg, I was fairly surprised to find that Panasonic opted to store their images in a non standard (and as best I can tell) undocumented format. Obviously this makes the images harder to work with than I would like. I’d never really considered this before, but it appears that there are large numbers of RAW formats using similar conventions and a whole world of format converters, plugins, and other tools.

The problem is that these files don’t ‘just work’ in lots of apps, even when those apps have good image format support generally, so I feel I need to convert them into some other format that I’ll find easier to work with. In particular I’d like to be able to load the images into Photoshop CS2 as this version is available for free download, and I’d like to see thumbnails in Windows explorer. Somewhat surprisingly they do seem to show thumbnails in Windows Explorer (using Windows 10), but they don’t work in Photoshop, so some research is needed.

So what is an RW2 file? Well, it turns out that RW2 files are basically TIFF files with various custom tags and with sensor data in place of the image data, some of which are important to pulling the raw image data out, and some of which are not.

LibTIFF

My first approach to decoding them was to attempt to lean on libtiff for much of the work. This is an open source TIFF parser than can be found here. At first this seemed promising. I quickly learned that RW2 uses a custom file signature. The first 4 bytes of a TIFF file are mostly always the same, but RW2 replaces them with something different. We can setup libtiff such that we hook in callbacks for file I/O and this made it pretty easy to swap out the data before the library saw it, which gets us into the parser. From there I quickly determined that libtiff wasn’t going to cut it though. After faking the header it complains that the image height tag is missing, and after some analysis of the file I concluded that it actually was missing as was lots of other important data the library appeared to depend on. At this point I decided that the format was too ‘non standard’ and that libtiff wasn’t going to be of much use, and gave up.

Building my own RW2 parser

Building a parser for the TIFF style format is fairly easy. The format of a TIFF file is a small header (8 bytes I think) containing a signature and the offset to a list of tags. One of the tags then points to a further list of tags. Some of the tags point to blocks of data. This format is documented in various places and is easy to parse. Getting to the point where we have all the tags is fairly easy. Making sense of them and finding the image data is much harder though.

As I found when trying to read these files with libtiff, the image dimensions were missing. This turns out to be because Panasonic store the sensor dimensions using custom tags. This makes sense when you realize that instead of storing image data, the file quite literally stores the values delivered by the sensor as the photo is taken. The sensor data is compressed using what appears to be a fairly standard RAW compression format and there are a few examples of how you decompress this data if you look for similar projects in github for example or elsewhere. Once decompressed, it turns out the sensor isn’t an array of RGB sensors so we don’t get RGB pixel values, but rather a matrix of R, G and B sensors. Typically the color values we get back are laid out in a repeating 2×2 grid containing an R, B and 2 G values per cell. From this data we can reconstruct an RGB image. The pair of G values provide the details and when combined with the lower frequency R and B values we get the colours too. This is all sounding more complex than I’d like, so before I put a lot of effort into parsing the format myself, I’m hoping I can find that someone has done the work for me.

dcraw

After searching online for a bit I found dcraw. This seems to be a fairly well maintained tool, in widespread use, that converts raw images in various formats to either PPM or TIFF files, both containing 16 bit colour data. The tool is available for download as a Windows executable from a few places. Here are some examples of how this works. The first converts my RW2 file to a TIFF and the second two to a PPM file with various options applied.

dcraw -T IMAGE.RW2
dcraw -6 IMAGE.RW2
dcraw -6 -4 IMAGE.RW2

The first option produces a TIFF file which both Windows Explorer and Photoshop CS2 understand. The last two options produce PPM files which Photoshop CS2 is happy to consume at least. -6 and -4 tell the tool to write 16 bit data linear data. I think I want both of these settings. Unfortunately -4 only seems to work when outputting PPM files, or if not it does something different when using TIFF files. If I add this option to the command lines used to produce TIFF files the result isn’t affected, or at least that’s how it seems anyway.

Something else I’m noticing now that I have images I can load in Photoshop, is that even with the higher precision, there are large sections of the image that have hit the upper limit of the range. The image in question shows some trees with a bright sky behind them, and the RGB values for the sky in photoshop show as 32767 or 32768 across all channels. I’d hoped that working in RAW would reveal a wider range, rather than just more precision, but I’m now wondering if that’s not the case. Am I hitting the limits of the sensor, or has the software applied some sort of normalization process to bring the sensor data into a 0-1 range? If so then I’m wondering if there isn’t some value in intentionally under exposing my photos in order that this not happen, as doing so I think would give me maximum freedom to adjust the photo’s later, because if a region of the photo hits the limits of the range and ‘whites out’ then the true data has in effect been lost at the point the image was captured.

libraw and other solutions

At this point I still don’t have as full an understanding of the data being produced as I would like, or what I consider to be a complete solution should I wish to batch reconvert the images to another format with minimal loss of data and maximum flexibility.

I was also really hoping to get the data into a format that supported a wider color range assuming the raw file contained data beyond the range I’m seeing in the TIFF file. A format like EXR would be ideal, as where TIFF only appears to give us high precision, I’m fairly sure it gives us range too, assuming the source data had the range to start with.

There are libraries like libraw for example that might allow me to extract the data myself with a little work.

I’ll settle for TIFFs, but I’m wondering if I need to go back to writing a tool of my own.

HDR Rendering With WebGL

8bit buffer

Your browser does not support the canvas tag. This is a static example of what would be seen.
8bit buffer + RGBM encoding + tonemapping

Your browser does not support the canvas tag. This is a static example of what would be seen.
float buffer + tonemapping

Your browser does not support the canvas tag. This is a static example of what would be seen.

Each of the three blocks of four rows above show data in a roughly 0-10 range when this page loads. Hover the mouse over them to see that range animate from a high of 1 to a high value of 19 and back again. Hopefully the top rows, which are rendered directly to the canvas more or less, make it clear how the limitations of the standard 8 bit encoding create problems for HDR content. The next four show the same content rendered via an intermediate RGBM encoded buffer, then pushed through a tone-mapper, and the final four rows show the same content again this time rendered to a floating point buffer with linear encoding and also pushed through a tone mapper.

The lack of good support for floating point render targets in WebGL creates a potential problem for people wanting to implement a HDR rendering pipeline. Typically a modern pipeline would use such a buffer to store lighting data, and as the lights get brighter we don’t want to end up clamping the colors when we run out of range in a standard 8 bit per channel buffer. Floating point targets allow those values to break out of that range and go much higher, and so allows for a better representation how lighting works in the real world and also provide greater freedom for adjustment in post processing.

WebGL in theory supports an extension for floating point render targets. See here using the OES_texture_float extension. Perhaps we don’t want to depend on extensions if we can help it, though at the time of writing it does appear that this extension is very widely supported. There are some good stats available at webglstats.com that seem to show 96% of users might have support for this. The extension doesn’t come with support for linear filtering out of the box though, meaning you have to then use the OES_texture_float_linear extension, which appears less widely supported at 91%.

There are a few factors, leading to a few options for how we handle this, which I’m going to briefly explore here…

Color Encoding

Aside from deciding on the format of our HDR buffers we also need to decided on an encoding format for the data. A fairly standard choice is sRGB vs linear. sRGB is commonly used in 8 bit per channel textures in order to make better use of the limited precision. sRGB stores color values raised to the power 1/2.2, the point, I believe, being to better match the distribution of the range to the way people see, so you have more precision where the viewer is most likely to perceive a lack of precision. sRGB has become the standard encoding for image files across all computer displays. A linear encoding on the other hand stores raw color (lighting) values. If we have the precision and range to work with a linear encoding it can make certain operations in computer graphics easier to manage. For example adding together two sRGB values involves converting to linear, adding, then converting back again, which isn’t something a standard blend unit is necessarily going to handle for us. Adding values together happens to be exactly what we want to do when accumulating the lighting data in a scene.

RGBM

RGBM is another type of encoding we could use. Where sRGB allowed us to get better precision out of three 8 bit values, RGBM allows us to get a larger range. How it works is that we use re-purpose the alpha channel, which would often be unused anyway, and instead use it as a multiplier to be applied to the RGB channels. We are free to choose what range the 0-255 multiplier represents. We then translate from RGBM to linear and back again in order to use data held in our targets in our shaders.

There are examples of RGBM encode and decode shader code here. The code I use actually looks very similar to this, where I’ve also chosen a scale of 6 for my encoding, though in theory you can go a fair bit higher if you want to.

Another trick we can also do to get even more range is to encode into sRGB before encoding into RGBM. The sRGB encoding works just as it always does for values below 1 but above 1 we’ll find it drags down our largest values, so for example 36 becomes 6 if we are working with a power value of 2, so our 0-6 multiplier can in theory be mapped to a larger range of input values.

Tone Mapping

The final consideration is tone mapping. If we are pushing the data in our buffers outside of the 0-1 displayable range, we need to decide how to interpret that data when displaying it. The common approach is to run a tone mapper to map the data in a meaningful way for display. There are a few fairly standard approaches to this.

The examples above use Naughty Dogs ‘filmic tone mapping’ as described by John Hable here and here [e.g pg. 140].

This is how the GLSL shader code ends up looking. Note that this includes the power(x,1/2.2) adjustment to monitors gamma so we don’t need to do that too. We can just take the linear input, push it through this code and put the result directly into the canvas.

  vec3 c0 = vec3(0.0, 0.0, 0.0);
  vec3 c1 = vec3(0.004, 0.004, 0.004);
  vec3 c2 = vec3(6.2, 6.2, 6.2);
  vec3 c3 = vec3(0.5, 0.5, 0.5);
  vec3 c4 = vec3(1.7, 1.7, 1.7);
  vec3 c5 = vec3(0.06, 0.06, 0.06);
  vec3 x = max(c0, rgb - c1);
  rgb = (x * (c2 * x + c3) / (x * (c2 * x + c4) + c5));

There are of course other approaches, for example ACES tonemapping, described here and also Reinhard tonemapping, but I’m happy enough with this.