In a perfect world we would have a draw call for each render object and we would set uniforms unique to that render object i.e. the color, the MVP matrix, the objects material etc. However if we had a even handful of render objects in our scene, our program would come to a slow unplayable halt. So we want to share as many things as possible across render objects. The only hard limit that we can’t share is the vertices & indices array we are drawing, and the shader we are drawing with. The rest is up for grabs. Even with these two hard parameters, there are still work arounds like having a uber mesh or under shader. The good news is with 2D games, we only use one vertex buffer in the game, a quad, and in this game one shader for sprites. Given this, there is no reason why we can’t render our whole scene with one draw call. And this is what we’re going to do!
To make our game code as simple as possible, the game code knows nothing about instancing. We can call ‘pushSprite’ or ‘pushMesh’ anywhere in the code, which gets added to a render buffer. Then when we are ready to issue the draw commands to the GPU we call ‘executeRenderGroup’ which does all the heavy lifting.
Since the render objects are unsorted in what sprite or shader they use, we first have to sort them. This is the first part of our instancing algorithm. Our sort function is a quick sort that groups the objects based on a criteria. This criteria depends on how your game engine is set up, but for my game is looking for:
1. Same vertex handle (our quad)
2. Same shader program handle
3. Same texture handle (where a texture atlas comes into play)
It groups render objects with this criteria, together. After this we can then loop through them and collect unique information about the render object like it’s position and color. This is part two of our instancing algorithm. This loop looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | for(int i = 0; i < renderGroup->items.count; ++i) { bool collecting = true; while(collecting) { RenderItem *nextItem = getRenderItem(group, i + 1); if(nextItem) { if(info->bufferHandles == nextItem->bufferHandles && info->textureHandle == nextItem->textureHandle && info->program == nextItem->program) { //collect data addElementInifinteAllocWithCount_(&pvms, nextItem->PVM.val, 16); addElementInifinteAllocWithCount_(&colors, nextItem->color.E, 4); if(nextItem->textureHandle) { addElementInifinteAllocWithCount_(&uvs, nextItem->textureUVs.E, 4); } else { assert(uvs.count == 0); } i++; } else { collecting = false; } } else { collecting = false; } } |
So here we are looping through and collecting all the unique information and putting in into a stretchy array. In our engine the three things we need to send to the shader are:
1. PVM matrix to position the object on screen
2. Color tint
3. The texture UV coords since we are using a texture atlas
The final part of our instancing algorithm is then converting this array into a form that the GPU can access. Since arrays can’t be very big in glsl, I’ve gone for storing this info in a texture buffer. What this looks like is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | typedef struct { GLuint tbo; // this is attached to the buffer GLuint buffer; } BufferStorage; BufferStorage createBufferStorage(InfiniteAlloc *array) { BufferStorage result = {}; glGenBuffers(1, &result.tbo); glBindBuffer(GL_TEXTURE_BUFFER, result.tbo); glBufferData(GL_TEXTURE_BUFFER, array->sizeOfMember*array->count, array->memory, GL_DYNAMIC_DRAW); glGenTextures(1, &result.buffer); glBindTexture(GL_TEXTURE_BUFFER, result.buffer); glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, result.tbo); return result; } |
So it creates a GL_TEXTURE_BUFFER object, which we then copy the array data into. We create a buffer for each unique info (one for PVM, one for color & one for UV coords), then set the buffer handle via a uniform. We then delete this buffer the next frame, to make sure we aren’t deleting anything in the middle of rendering.
There is one more piece to the puzzle here. In order to do things as I’ve done i use a modified version of glDrawElements, which is glDrawElementsInstanced. This is specifically made for instancing in which pass the usual arguments as drawelements, but with one more argument specifying how many times we want to draw this vertex array. Then in our shader we have a handy predefined variable call gl_InstanceID. This gives us the index of the instance being drawn, which we use to access the info our of our arrays.
For accessing our PVM matrix we use the following code:
1 2 3 4 5 6 7 | int offset = 4 * int(gl_InstanceID); vec4 a = texelFetch(PVMArray, offset + 0); vec4 b = texelFetch(PVMArray, offset + 1); vec4 c = texelFetch(PVMArray, offset + 2); vec4 d = texelFetch(PVMArray, offset + 3); mat4 PVM = mat4(a, b, c, d); |
This wraps up how I’m doing instancing for the game. There are many ways to make use of instancing, and comes down to how your engine is set up, and what can you share amongst render objects. Can you use a texture atlas instead of seperate textures, can your use a uber shader instead of seperate shaders? and for unique data, what’s the best way to retrieve this on the GPU?
Some more info on topic:
Opengl Instancing tutorial
Randy Gaul lecture at Digipen
NOTE:
I tried using a standard glTexImage2D(GL_TEXTURE_2D… to store the unique data instead of a GL_TEXTURE_BUFFER, but I couldn’t get it to work. I think because of the implicit remapping of the values behind the scenes. Whereas GL_TEXTURE_BUFFER doesn’t do this.
NOTE: To my dismay GL_TEXTURE_BUFFER isn’t supported on iOS. However isn’t applicable now since they’ve moved to Metal (even more dismay :().