CUDA 5 and OpenGL Interop and Dynamic Parallelism

Posted on Updated on

I seem to revisit this every time every time Nvidia releases a new version of of CUDA.

The good news…

The old methods still work, the whole register, map, bind, etc… process I described in my now two year old post Writing to 3D OpenGL textures in CUDA 4.1 with 3D Surface writes still works.  Ideally, the new version number shouldn’t introduce any new problems…

The bad news…

Unfortunately, if you try to write to a globally scoped CUDA surface from a device-side launched kernel (i.e. a dynamic kernel), nothing will happen.  You’ll scratch your head and wonder why code that works perfectly fine when launched from the host-side, fails silently when launched device-side.

I only discovered the reason when I decided to read, word for word, the CUDA Dynamic Parallelism Programming Guide. On page 14, in the “Textures & Surfaces” section is this note:

NOTE: The device runtime does not support legacy module-scope (i.e. Fermi-style)
textures and surfaces within a kernel launched from the device. Module-scope (legacy)
textures may be created from the host and used in device code as for any kernel, but
may only be used by a top-level kernel (i.e. the one which is launched from the host).

So now the old way of dealing with textures is considered “Legacy” but apparently not quite deprecated yet.  So don’t use them if you plan on using dynamic parallelism.  Additional Note: if you so much call a function that attempts to perform a “Fermi-style” surface write you’re kernel will fail silently, so I highly recommend removing all “Fermi-style” textures and surfaces if you plan on using dynamic parallelism.

So what’s the “New style” of textures and surfaces, well also on page 14 is a footnote saying:

Dynamically created texture and surface objects are an addition to the CUDA memory model
introduced with CUDA 5.0. Please see the CUDA Programming Guide for details.

So I guess they’re called “Dynamically created textures and surfaces”, which is a mouthful so I’m going to refer to them as “Kepler-style” textures and surfaces.  In the actual API they are cudaTextureObject_t and cudaSurfaceObject_t, and you can pass them around as parameters instead of having to declare them at file scope.

OpenGL Interop

So now we have two distinct methods for dealing with textures and surfaces, “Fermi-style” and “Kepler-style”, but we only know how graphics interoperability works with the old, might-as-well-be-deprecated, “Fermi-style” textures and surfaces.

And while there are some samples showing how the new “Kepler-style” textures and surfaces work (see the Bindless Texture sample), all the interop information still seems to target the old “Fermi-style” textures and surfaces.  Fortunately, there is some common ground between “Kepler-style” and “Fermi-style” textures and surfaces, and that common ground is the cudaArray.

Really, all we have to do is replace Step 6  (binding a cudaArray to a globally scoped surface) from the previous tutorial, with the creation of a cudaSurfaceObject_t. That entails creating a cuda resource description (cudaResourceDesc), and all we have to do is appropriately set the array portion of the cudaResourceDesc to our cudaArray, and then use that cudaResourceDesc to create our cudaSurfaceObject_t, which we can then pass to our kernels, and use to write to our registered and mapped OpenGL textures.

// Create the cuda resource description
struct cudaResourceDesc resoureDescription;
memset(&resDesc, 0, sizeof(resoureDescription));
resDesc.resType = cudaResourceTypeArray;	// be sure to set the resource type to cudaResourceTypeArray
resDesc.res.array.array = yourCudaArray;	// this is the important bit

// Create the surface object
cudaSurfaceObject_t writableSurfaceObject = 0;
cudaCreateSurfaceObject(&writableSurfaceObject, &resoureDescription);

And thats it! Here’s hoping the API doesn’t change again anytime soon.

3 thoughts on “CUDA 5 and OpenGL Interop and Dynamic Parallelism

    […] Edit: For how this works in CUDA 5 see my new post CUDA 5 and OpenGL Interop and Dynamic Parallelism. […]

      Ulf said:
      October 24, 2014 at 9:50 am

      I just came across your posting about “Writing to 3D OpenGL textures…”. That is exactly what I am looking for. Very nice.
      However, I did not get it running correctly on CUDA 6.5. (always returning with error message “Not 1.0f, failed writing to texture”).
      Maybe you are able to post a complete working example. That would be great.

    jo said:
    August 30, 2013 at 5:59 am

    What do you do about cudaGraphicsSubResourceGetMappedArray() returning a different array handle each time you call it? Docs say it can. Somehow rebind the texture object, or create a new one each time it’s a different handle?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s