I seem to revisit this every time every time Nvidia releases a new version of of CUDA.
The good news…
The old methods still work, the whole register, map, bind, etc… process I described in my now two year old post Writing to 3D OpenGL textures in CUDA 4.1 with 3D Surface writes still works. Ideally, the new version number shouldn’t introduce any new problems…
The bad news…
Unfortunately, if you try to write to a globally scoped CUDA surface from a device-side launched kernel (i.e. a dynamic kernel), nothing will happen. You’ll scratch your head and wonder why code that works perfectly fine when launched from the host-side, fails silently when launched device-side.
I only discovered the reason when I decided to read, word for word, the CUDA Dynamic Parallelism Programming Guide. On page 14, in the “Textures & Surfaces” section is this note:
NOTE: The device runtime does not support legacy module-scope (i.e. Fermi-style)
textures and surfaces within a kernel launched from the device. Module-scope (legacy)
textures may be created from the host and used in device code as for any kernel, but
may only be used by a top-level kernel (i.e. the one which is launched from the host).
So now the old way of dealing with textures is considered “Legacy” but apparently not quite deprecated yet. So don’t use them if you plan on using dynamic parallelism. Additional Note: if you so much call a function that attempts to perform a “Fermi-style” surface write you’re kernel will fail silently, so I highly recommend removing all “Fermi-style” textures and surfaces if you plan on using dynamic parallelism.
So what’s the “New style” of textures and surfaces, well also on page 14 is a footnote saying:
Dynamically created texture and surface objects are an addition to the CUDA memory model
introduced with CUDA 5.0. Please see the CUDA Programming Guide for details.
So I guess they’re called “Dynamically created textures and surfaces”, which is a mouthful so I’m going to refer to them as “Kepler-style” textures and surfaces. In the actual API they are
cudaSurfaceObject_t, and you can pass them around as parameters instead of having to declare them at file scope.
So now we have two distinct methods for dealing with textures and surfaces, “Fermi-style” and “Kepler-style”, but we only know how graphics interoperability works with the old, might-as-well-be-deprecated, “Fermi-style” textures and surfaces.
And while there are some samples showing how the new “Kepler-style” textures and surfaces work (see the Bindless Texture sample), all the interop information still seems to target the old “Fermi-style” textures and surfaces. Fortunately, there is some common ground between “Kepler-style” and “Fermi-style” textures and surfaces, and that common ground is the
Really, all we have to do is replace Step 6 (binding a
cudaArray to a globally scoped surface) from the previous tutorial, with the creation of a
cudaSurfaceObject_t. That entails creating a cuda resource description (
cudaResourceDesc), and all we have to do is appropriately set the array portion of the
cudaResourceDesc to our
cudaArray, and then use that
cudaResourceDesc to create our
cudaSurfaceObject_t, which we can then pass to our kernels, and use to write to our registered and mapped OpenGL textures.
// Create the cuda resource description struct cudaResourceDesc resoureDescription; memset(&resDesc, 0, sizeof(resoureDescription)); resDesc.resType = cudaResourceTypeArray; // be sure to set the resource type to cudaResourceTypeArray resDesc.res.array.array = yourCudaArray; // this is the important bit // Create the surface object cudaSurfaceObject_t writableSurfaceObject = 0; cudaCreateSurfaceObject(&writableSurfaceObject, &resoureDescription);
And thats it! Here’s hoping the API doesn’t change again anytime soon.
I finally got a GPU capable of dynamic parallelism, so I finally decided to mess around with CUDA 5. But I discovered a couple of configuration options that are required if you want to enable dynamic parallelism. You’ll know you haven’t configured things correctly if you attempt to call a kernel from the device and you get the following error message:
ptxas : fatal error : Unresolved extern function ‘cudaGetParameterBuffer’
Note: this assume you have already selected the appropriate CUDA 5 build customizations for your project
Open the project project properties