AtomicCounters & IndirectBufferCommands

Posted on

I’ve made use of Atomic Counters and Indirect Buffers in the past, but always in the most straightforward manner. I.e. create a dedicated buffer for the atomic counter, and another for the Indirect Command Buffer, increment the counter in a shader then write the Atomic Counter value into the Indirect Command Buffer using the Image API, ending up with a shader that looks something like below.

#version 420

layout(location = 0) in ivec3 inputBuffer;

layout(r32ui, binding = 0) uniform uimageBuffer outputBuffer;
layout(r32ui, binding = 1) uniform uimageBuffer indirectArrayCommand;
layout(       binding = 0) uniform atomic_uint  atomicCounter;

void main()
{
	// ...
	// do some stuff
	// ...

	if(someCondition == true)
	{
		//increment counter
		int index = int(atomicCounterIncrement(atomicCounter));

		//store stuff in output buffer
		imageStore(outputBuffer, index, uvec4(someStuff)));
	}

	memoryBarrier();

	//Store the atomicCounter value to the count (the first element) of the DrawArraysIndirect command
	imageStore(indirectArrayCommand, 0, uvec4(atomicCounter(atomicCounter)));
}

This works fine, but one annoying thing about this approach is that it consumes an extra image unit (of the max 8 available). Fortunately, it turns out that it is unnecessary to create an extra atomic counter and perform the synchronization with the indirect draw command. It is possible to simply bind the appropriate element of the indirect draw buffer directly to the atomic counter.

// This binds the count element of the Indirect Array Command Buffer directly as an atomic counter in the shader
// (no need for copy from dedicated atomic counter)
glBindBufferRange(GL_ATOMIC_COUNTER_BUFFER,        // Target buffer is the atomic counter
                  0,                               // Binding point, must match the shader
                  IndirectArrayCommandBuffer_id,   // Source buffer is the Indirect Draw Command Buffer
                  0,                               // Offset, 0 for count, 1 for primCount (instances), etc...
                  sizeof(GLuint));

This allows us to get rid of Indirect Buffers image unit binding, which simplifies the shader as shown below. The main reason I’ve found to do this is reduce the number of image units required by the shader, as its very easy to hit the limit of 8.

#version 420

layout(location = 0) in ivec3 inputBuffer;

layout(r32ui, binding = 0) uniform uimageBuffer outputBuffer;
layout(       binding = 0) uniform atomic_uint  atomicCounter;

void main()
{
	// ...
	// do some stuff
	// ...

	if(someCondition == true)
	{
		//increment counter
		int index = int(atomicCounterIncrement(atomicCounter));

		//store stuff in output buffer
		imageStore(outputBuffer, index, uvec4(someStuff)));
	}
}

3 thoughts on “AtomicCounters & IndirectBufferCommands

    Joe said:
    July 17, 2014 at 9:40 am

    Hi there,

    i want to use something similar – maybe you can help me out…
    I have some thousand samples, search for close neighbours, and later each sample will process neighbours in the next compute shader.

    Now i have 3 versions of this next shader with local sizes of 64, 128, 256, and i want to process each sample according to number of neighbours, so a sample with 200 neighbours should go to shader no. 2.

    My idea is to write 3 lists, using atomic incrementing the x element of a inderect dispatch command.
    Later i can indirect dispatch the 3 processing shaders without any data transfer to the host.

    But then i realised a problem:
    It is not guaranteed that all of the 3 lists have any content, and it is not allowed to do glDispatchCompute (0, 1, 1), so i assume it’s also not allowed to this by glDispatchComputeIndirect.

    I concluded it is better to transfer the 3 list counters to the host after the neighbour searching, so i can check and avoid to process an empty list. But then i don’t see any need to use indirect dispatch at all 😦

    Maybe you have any idea about that problem case – i’m new to GPUs and might miss something…

    Great blog!
    Joe

    randallr responded:
    July 18, 2014 at 4:55 pm

    I’m not sure I understand your problem here, but the number of work groups launched is the product of the values passed into glDisplatchCompute, so you are essentially telling it to launch 0 work groups.

      Joe said:
      July 19, 2014 at 1:49 am

      Yes i’ve tried it and it works for me. My worries are that if you launch 0 work groups using glDispatchCompute you get a GL error. glDispatchComputeIndirect has no error checking, so i can’t be 100% sure it works on every driver, but i’ll just assume no one has a problem with it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s