Instancing

Oliver  —  1 week ago [Edited 1 hour, 21 minutes later]
When we ask the GPU to render a sprite or mesh to the screen, the biggest bottleneck isn’t the actual rendering of the pixels, but sending the data from the CPU to the GPU. And what this comes down to is how many draw calls we issue in each frame. (A draw call being in OpenGl glDrawElements). So instancing is an optimisation technique in which we bundle as many render objects (render object being a sprite or mesh in our game world) into the same draw call.

In a perfect world we would have a draw call for each render object and we would set uniforms unique to that render object i.e. the color, the MVP matrix, the objects material etc. However if we had a even handful of render objects in our scene, our program would come to a slow unplayable halt. So we want to share as many things as possible across render objects. The only hard limit that we can’t share is the vertices & indices array we are drawing, and the shader we are drawing with. The rest is up for grabs. Even with these two hard parameters, there are still work arounds like having a uber mesh or under shader. The good news is with 2D games, we only use one vertex buffer in the game, a quad, and in this game one shader for sprites. Given this, there is no reason why we can’t render our whole scene with one draw call. And this is what we’re going to do!

To make our game code as simple as possible, the game code knows nothing about instancing. We can call ‘pushSprite’ or ‘pushMesh’ anywhere in the code, which gets added to a render buffer. Then when we are ready to issue the draw commands to the GPU we call ‘executeRenderGroup’ which does all the heavy lifting.

Since the render objects are unsorted in what sprite or shader they use, we first have to sort them. This is the first part of our instancing algorithm. Our sort function is a quick sort that groups the objects based on a criteria. This criteria depends on how your game engine is set up, but for my game is looking for:

1. Same vertex handle (our quad)
2. Same shader program handle
3. Same texture handle (where a texture atlas comes into play)

It groups render objects with this criteria, together. After this we can then loop through them and collect unique information about the render object like it’s position and color. This is part two of our instancing algorithm. This loop looks like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
for(int i = 0; i < renderGroup->items.count; ++i) {
	bool collecting = true;
        while(collecting) {
            RenderItem *nextItem = getRenderItem(group, i + 1);
            if(nextItem) {

                if(info->bufferHandles == nextItem->bufferHandles && info->textureHandle == nextItem->textureHandle && info->program == nextItem->program) {
                    
                    //collect data
                    addElementInifinteAllocWithCount_(&pvms, nextItem->PVM.val, 16);
                    addElementInifinteAllocWithCount_(&colors, nextItem->color.E, 4);
                    
                    if(nextItem->textureHandle) {
                        addElementInifinteAllocWithCount_(&uvs, nextItem->textureUVs.E, 4);
                    } else {
                        assert(uvs.count == 0);
                    }
                    i++;
                } else {
                    collecting = false;
                }
            } else {
                collecting = false;
            }
        }



So here we are looping through and collecting all the unique information and putting in into a stretchy array. In our engine the three things we need to send to the shader are:
1. PVM matrix to position the object on screen
2. Color tint
3. The texture UV coords since we are using a texture atlas

The final part of our instancing algorithm is then converting this array into a form that the GPU can access. Since arrays can’t be very big in glsl, I’ve gone for storing this info in a texture buffer. What this looks like is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
typedef struct {
    GLuint tbo; // this is attached to the buffer
    GLuint buffer;
} BufferStorage;

BufferStorage createBufferStorage(InfiniteAlloc *array) {
    BufferStorage result = {};
    glGenBuffers(1, &result.tbo);


    glBindBuffer(GL_TEXTURE_BUFFER, result.tbo);

    glBufferData(GL_TEXTURE_BUFFER, array->sizeOfMember*array->count, array->memory, GL_DYNAMIC_DRAW);

    
    glGenTextures(1, &result.buffer);

    glBindTexture(GL_TEXTURE_BUFFER, result.buffer);

    
    glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, result.tbo);

    
    return result;
}


So it creates a GL_TEXTURE_BUFFER object, which we then copy the array data into. We create a buffer for each unique info (one for PVM, one for color & one for UV coords), then set the buffer handle via a uniform. We then delete this buffer the next frame, to make sure we aren’t deleting anything in the middle of rendering.

There is one more piece to the puzzle here. In order to do things as I’ve done i use a modified version of glDrawElements, which is glDrawElementsInstanced. This is specifically made for instancing in which pass the usual arguments as drawelements, but with one more argument specifying how many times we want to draw this vertex array. Then in our shader we have a handy predefined variable call gl_InstanceID. This gives us the index of the instance being drawn, which we use to access the info our of our arrays.

For accessing our PVM matrix we use the following code:

1
2
3
4
5
6
7
	int offset = 4 * int(gl_InstanceID);
	vec4 a = texelFetch(PVMArray, offset + 0);
	vec4 b = texelFetch(PVMArray, offset + 1);
	vec4 c = texelFetch(PVMArray, offset + 2);
	vec4 d = texelFetch(PVMArray, offset + 3);
	
	mat4 PVM = mat4(a, b, c, d);


This wraps up how I’m doing instancing for the game. There are many ways to make use of instancing, and comes down to how your engine is set up, and what can you share amongst render objects. Can you use a texture atlas instead of seperate textures, can your use a uber shader instead of seperate shaders? and for unique data, what’s the best way to retrieve this on the GPU?

Some more info on topic:
Opengl Instancing tutorial
Randy Gaul lecture at Digipen

NOTE:
I tried using a standard glTexImage2D(GL_TEXTURE_2D… to store the unique data instead of a GL_TEXTURE_BUFFER, but I couldn’t get it to work. I think because of the implicit remapping of the values behind the scenes. Whereas GL_TEXTURE_BUFFER doesn’t do this.

NOTE: To my dismay GL_TEXTURE_BUFFER isn’t supported on iOS. However isn’t applicable now since they’ve moved to Metal (even more dismay :().

#17195
Mārtiņš Možeiko  —  1 week ago [Edited 2 minutes later]

IMHO much cleaner way to use matrices in instancing is using glVertexAttribDivisor for your matrices. Stick them in regular vertex array buffer one for each instance and call glVertexAttribDivisor (4 times, because mat4 takes 4 attribute slots) with divisor set to 1. Then you'll have regular "in mat4 PVM;" attribute in vertex shader. No need to deal with texture fetches. Same for other non-PVM matrix attributes that are unique per mesh/model.
#17196
Oliver  —  6 days, 10 hours ago [Edited 0 minutes later]
That does sound cleaner. So is that a seperate vao buffer than the one that is used for the mesh?
#17197
Mārtiņš Možeiko  —  6 days, 10 hours ago
VAO does not matter here. What you should be thinking is about GL_ARRAY_BUFFER buffer. You can put it in same buffer as your main mesh data as long as you set up glVertexAttribPointer offset correctly. Or just put it in new buffer with offset 0, then update it whole and remember to bind it before setting attrib pointers.
#17199
Oliver  —  5 days, 17 hours ago [Edited 0 minutes later]
And each frame would I do a new glbufferdata call for the same Vbo to update it with the new marix values? And am I right to assume this new glbufferdata call delete the old buffer data? Thanks mmozeiko for the help.
#17200
ratchetfreak  —  5 days, 17 hours ago
OliverMarsh
And each frame would I do a new glbufferdata call for the same Vbo to update it with the new marix values? And am I right to assume this new glbufferdata call delete the old buffer data? Thanks mmozeiko for the help.


Yes glBufferData will delete the old data. glBufferSubdata will overwrite the old data keeping the allocation (though the buffer may get a shadow copy to avoid needing to wait on the gpu).
#17202
Oliver  —  5 days, 14 hours ago [Edited 14 minutes later]
I was also wondering with this approach, since a vao might be used to draw more than once in the frame, would I have to unbind the last vbo and bind a new one before each gldrawelementsinstanced call, since if I use the same vbo handle I would overwrite the last instance batch’s data?
When is a safe place to delete bufferdata: after gldrawElements call or after swapWindow call?
#17203
ratchetfreak  —  5 days, 14 hours ago
You can safely overwrite the previous drawcall's data like that in opengl. That guarantee is part of it's immediate mode roots.

If the draw call isn't done yet the driver will duplicate the buffer transparently.

#17205
Oliver  —  5 days, 9 hours ago
Thankypu for the help rachetfreak, I’ll see how I go.
#17207
Raytio  —  4 days, 9 hours ago [Edited 0 minutes later]
ratchetfreak
You can safely overwrite the previous drawcall's data like that in opengl. That guarantee is part of it's immediate mode roots.

If the draw call isn't done yet the driver will duplicate the buffer transparently.



Also you can take this approach and add several buffers revolve through them each frame so you dont get sucky perf hits when opengl starts to do copys behind your back. Although I am not for sure if you can avoid this easily in opengl.
#17208
ratchetfreak  —  4 days, 6 hours ago
godratio


Also you can take this approach and add several buffers revolve through them each frame so you dont get sucky perf hits when opengl starts to do copys behind your back. Although I am not for sure if you can avoid this easily in opengl.


Avoiding that overhead means diving into AZDO. But it kinda means poking at a black box and hoping the internals work out like you want them to.
#17224
Oliver  —  6 hours, 8 minutes ago
I moved over to using the above method, and works fine. However I did run into a interesting bug. I draw with two different shaders, one for drawing textures and one for drawing colored quads. The colored quad shader doesn't have an attribute for the uv coordinates (used for looking up in the texture atlas), but they both have one for the PVM matrix & color tint.

1
2
3
in mat4 PVM;
in vec4 color1;
in vec4 uvAtlas; //not in the colored quad shader


On my windows computer the scene rendered fine and there were no problems. But then I ran it on my mac and the textures were rendering but the colored quads weren't. The only difference is that I do a branch to see it is a texture or not. If it is I set the UV attribute, if it isn't I don't.

1
2
3
4
5
6
if(isTexture) {
        GLint UVattrib = getAttribFromProgram(program, "uvAtlas").handle;
        glEnableVertexAttribArray(UVattrib);  
        glVertexAttribPointer(UVattrib, 4, GL_FLOAT, GL_FALSE, offsetForStruct, ((char *)0) +sizeof(float)*20);
         glVertexAttribDivisor(UVattrib, 1);        
 }


So with this branch it didn't draw the quads on mac. So I put a dummy attribute in the quad shader for the uv coordinates and commented out the if statement and it worked.
1
2
3
in mat4 PVM;
in vec4 color1;
in vec4 uvAtlas; //now in both shaders

So not sure what caused the bug.
Log in to comment