Wednesday, March 3, 2010

Mixing NVIDIA Technologies thanks to CgFX



I recently found very interesting to Push the limits of NVIDIA CgFX runtime.
Cg is the equivalent of HLSL for OpenGL (GLSL and other profiles), DirectX (9-10-11) and is multi-platform (Windows, MacOSX, Linux).
Unfortunately, Cg didn't get the expected success, despite the fact it really offers interesting features. But I am confident that things will change...

On top of simple shader declaration, Cg offers a more general layer : CgFX. This layer can be considered the equivalent of what Microsoft exposed as "D3D Effects", but in more powerful.

I decided to push the limits of CgFX and see how we could take full advantage of its flexibility.

Spotlight on few cool CgFX features

CgFX offers some interesting features that very few developers explored :
  • Interfaces : very useful in order to have an abstraction layer between what we want to compute and how it will be computed. For example, lighting can be done through this :lights are exposed in the shader code as interfaces. The implementation (spot, point, directional...) of these lights is hidden and decoupled from the shader that is using this interface.
  • Unsized arrays of parameters and/or interfaces : a good example of using this features is to store the light interfaces in this array. The shader doesn't have to know how many lights are in this array. The array doesn't have to care about the implementation of the lights it holds. So the array is a good way to store an undefined amount of lights of various type
  • Custom States : states are used in the Passes of the Techniques. CgFX allows to customize states and to create new states. You can easily define special states; states that have no meaning in D3D or OpenGL, but have a meaning for what you need to do in the pass with your application. These states are a perfect way to tell the application what to do in the passes. For example, you can create state "projection_mode" that would allow the effect to tell the application to switch from perspective to ortho projection... and by the way, regular OpenGL & D3D states in Cg are also custom states, but they are shipped and implemented by Cg runtime :-)
  • Semantics : very simple feature... but this allowed me to simulate handles of objects (more details later below)
  • A hairy API... but giving you lots of ways to dig into the loaded effect. People often complained about how Cg API is bad looking, or is too complex. I somehow agree. But my recent experience showed that this API in fact offers a lot of interesting ways to do what you need. As usual, there is no magic : flexibility often leads to complex API.
Pushing the limits

The cool features I enumerated above allowed me to start thinking on flexible approaches for complex scene rendering.

Scene Level effect
Something that always bothered me is that DirectX or Cg examples always show examples on how to use Effects (so, Techniques and Passes) for objects; more precisely: how to use effects for object's materials.
I considered another approach : why not use 2 levels of effects :




  • the first layer is the common way : provide effects for each material. For example an effect would be available for some Plastic shading, for objects that need a plastic look... etc.
  • A second layer of effects acting on scene level : prior to rendering any object, why not use the Technique/Pass paradigm to tell the application how it need to process the scene ?
This scene level effect makes totally sense. Here is an example.
If the application needs to render a complex scene (made of intermediate passes like post-processing passes), the Technique/Pass paradigm matches perfectly what we want.
So, let's decide to create an effect for the whole scene, where many techniques will be available for many kind of scene-level processing. Let's say we created a technique for transparent&opaque objects, with some special post-processing (say, Depth of field...) :
  • Pass 1: render the Opaque Objects in a specific render target; let's also store the camera distance of each pixel in another render target (a texture)
  • Pass 2: render the transparent Objects in another render target (a texture); but let's keep the depth buffer of opaque objects so that we get a consistent result
  • Pass 3: perform a fullscreen quad operation that will read the opaque and transparent textures and do the compositing. This operation will allow to fake some refraction on transparent triangles thanks to a bump-map, for example
  • Pass 4: use the camera distance stored earlier to perform another pass that will blur the scene according to some Depth of field equations
It is important to mention here that the job of Pass 1 and 2 is to simply setup how the application is going to render the scene-graph (or whatever you call it): Pass 1 just asked the application to render the scene-graph of opaque-only objects in a specific render target. Pass 2 will ask the scene graph to only render transparent objects (from back to front...)

Another important point is : Pass 1 and 2 will trigger the rendering of complex scenes which are also made of other CgFX effects for materials. For example, Pass 1 will trigger a scene graph where many CgFX effects are needed to render good looking materials (plastic, car-paint...).
As you can see, a Pass of an effect can nest other effects made of other techniques and passes...

Making things consistent

The main problem I had to solve was : how do we make sure that nested material effects are consistent with the scene-level effects ?

Let's use an example to explain the issue:
Let's assume a scene-level effect asked to render the whole scene as usual (RGBA colors...); but this pass also asked for storing the camera distance for each rendered pixel (so that we can do a Depth of field post-processing in a later pass). How can the underlying effects be aware that it needs to output more than RGBA : how can I tell the effect it needs to store the distance to another output (keep in mind that a material effect is a self-sufficient effect file and doesn't know anything about the scene-level effect file) ?

The solution is to use annotations and interfaces :
  • the scene-level pass will use an annotation to tell underlying effects which interface should be used to generate the pixels. Meaning that a special interface was defined and that material effects have to use this interface rather than simply write fragments by itself.
  • the material effects will receive the correct implementation of "pixel output" interface, so that all the render targets get properly updated. Here, the material effect has no knowledge of what finally will be written in the output: it only provides all the possible required data. The interface that the scene-level did choose will do this job...
Resource creation

Usually, developer who are using shaders or effects need to make sure the resources needed for this effect are create and are available. For example, if an effect is using intermediate render targets for RGBA and camera distance, the application will need to allocate the buffers (in OpenGL, known as textures, renderbuffers and put together with FBO - Frame Buffer Object) and bind these buffers.

I found a way to allow the CgFX effect source code to define the resources needed for the effect to work properly. My idea is that a scene-effect that is unable to describe the mandatory resources is incomplete. There is no point of having a flexible tool if it still requires developers to write specific code for each specific case.

The solution again came from the annotation and Semantic features. Let's use a very simple example : let's assume the scene-level effect needs to render some intermediate RGBA colors to a 32 bits floating points buffer (say, for further post-processing of HDR...):
  • the scene-level effect will then contain a simple integer parameters where the name will represent the texture target and its semantic will explicitly tell the application that this integer parameter is a texture resource
  • This integer will contain annotations. One annotation for example will tell that the format of the texture is RGA32F; another annotation will give the width and heigh of the texture
At initialization time, the application will simply need to walk through the parameters that are involved to the current technique and will have to create the resources by using the details provided by the annotations.

____

As you can see, there is a way to make a scne-effect layer live with a material-effect layer; and there is a way to attach the resource creation to the effects.

This approach turned-out to be very successful and very flexible. I could implement many scene-level effects, like Image-based ligthing; HDR with radial blur; Toon shading etc.

The beauty of this implementation is that no additional C++ coding is required to implement many different effects : the whole effect written in CgFX is self sufficient.

Adding other NVIDIA technologies

This approach turned out to be so flexible that I wondered how I could add more of our technologies. I am thinking of course about two famous things:
  • Optix ray tracing (to give the web link...)
  • CUDA (to give the web link...)
OptiX in a CgFX pass

So I started to figure out how I could give all the details to CgFX on how to setup OptiX. My idea was that CgFX is enough flexible to be able to tell what to load and what to associate. I am not sure I can give much details here without getting the reader lost. So I will stick the the very basic concept.
OptiX is a very powerful toolkit that allows the developer to perform any sort of work that implies some rays to gather intersection in a scene. Of course the famous "ray tracing" is the main use-case. But in fact you can do many other things.
OptiX requires you to write some CUDA code and to compile this code to a format called PTX (a pseudo-assembly code for 'compute' operations). This PTX code is what is used to compute intersections with any primitive (triangle, sphere...); shading of any surface etc. (the reader should read the documentation in order to fully understand OptiX)

My idea was to find a way to generically include an Optix pass in the rendering process of an application. I wanted to have OptiX included in a pass to create some results in a resouce (texture or buffer), so that I could use this result for later passes (compositing etc.)

A typical case : hybrid rendering for inter-reflections.

To clarify, Hybrid rendering is a way to render a scene by mixing some fast OpenGL/D3D rendering with slower (but more accurate) Ray tracing. The best use-case is "inter-reflection".
There is no way to solve the complex inter-reflections with typical OpenGL/D3D pipeline. The best we can do is on water surface or any flat mirrors, because the topology of such object is simple. But reflections of an object over itself is definitely hard.
Ray tracing, however is able to do this properly, although it can be expensive. Optix ray tracing is definitely good at this since it takes advantage of GPU computation strength.

The idea of hybrid rendering is to keep OpenGL for the fast work, while deferring the work of complex rendering, like inter-reflections, to OptiX. Finally, the two images would then be put together, according to any kind of post-processing work.

I implemented a way to allow CgFX to declare what is required in order to have Optix ready for a pass. This implementation (again, using semantics and annotations !) allowed me to :
  • Declare references to PTX code that we will need to send to Optix
  • declare an Optix context
  • associate the references of PTX code to various Optix layers
  • declare arbitrary variables with default values that Optix will gather and send to the PTX code (some ray tracing code would for example need any sort of fine-tuning parameters for precision or recursive limits...)
  • Declare the resources that Optix needs : input resources (pixel informations needed to start the ray tracing); output resource where to write the final image of inter-reflections; intermediate buffers if needed...
  • tell a CgFX pass that it needs to trigger the OptiX rendering, rather than doing as usual a simple OpenGL/D3D scene graph rendering
Here is an example of a scene-level technique that would contain an OptiX pass:
  • Pass 1: render the scene with OpenGL : render 3 targets : colors, normals and world-space position of each fragment (pixel)
  • Pass 2: trigger OptiX rendering. the starting point of this rendering are the pixels of the previous render targets : given the camera position, it is possible to compute the reflection... the final reflection colors are then stored in a texture
  • Pass 3: read the texture from previous OptiX pass and merge it with the Pass 1 RGBA color
Another case-study is "pure ray tracing" : with the same exact integration in CgFX, I can perform a 100% ray traced image. In such a case, the technique would look as follow:
  • Pass 1: trigger the OptiX rendering. The RGBA scene result will be stored in a texture
  • Pass 2: draw the texture with a fullscreen quad
The result are promising and opened the door to a lot more freedom in how to implement scene-level effects with complex rendering features.

However many details need to be addressed in order to make such an approach fully working :
  • The 3D scene is mirrored within Optix Environment (and hashed into 'acceleration structures', according to some fancy rules). This necessary redundancy can be tricky and takes some memory in Video Memory...
  • The shaders that are written in Cg should somehow be also writtent in CUDA code for Optix.
  • CUDA doesn't provide as good texture sampling as what does a pixel shader : the Mipmaping isn't available, neither are cubemaps...
  • Many other concerns.
What about performances when adding ray-tracing ? I didn't do any strict measurement, yet. But on a good board, the rendering is still interactive. I got ~6fps for some inter-reflection on top of an OpenGL rendering.
Our new next-gen GPU can be up to 3x faster. I could get hybrid rendering ~16 to 18fps...

CUDA in a CgFX pass

When I saw how interesting was the integration of OptiX, I just realized that we could get the same out of a CUDA integration in CgFX.


CUDA is of course very generic. I decided to integrate it in the technique/pass process for any sort of post-processing. This means that the typical Grid/Block division will be used to partition some processing on some rendered image... I am for example thinking of





  • simple Glow post-processing
  • convolution post-processing
  • Bokeh filter (accurate depth of field)
  • deferred shading (for example, check GDC about deferred shading using Compute...)
  • etc.
I think that CUDA integration in a pass could also be good for some particle generation that the next pass could exploit... more complex use of CUDA will require deeper investigation.

No need to get into much details for CUDA integration : the idea is the same as for OptiX. I needed to find a way to define resource I/O and temporary buffers; define Kernel modules and define how to attache CUDA-compiled code to the modules. I needed to define a way to connect the Kernel arguments to CgFX parameters.

The early result looked promising : without specializing the C++ code of the application, I could instanciate some CUDA code and make it part of the rendering process for some final post-processing.

Next time, I will try to give more concrete details on how the CgFX code is looking. I should also be able to post few pictures...

To conclude, I was myself amazed by the strength of CgFX in such extreme situations. Msft HLSL/Effects would never have been able to provide me such flexibility.
The Cg team made a good job at defining an API and developing a runtime that, despite its hairy look, almost answers to every fancy needs to push CgFX even beyond its limits !

Note: I am doing this work as part of NVIDIA company. The purpose of such a project is to write some interesting examples on how to use our software technologies; how to take full advantage of it to leverage our NVIDIA Chips. I should be able to publish all the source code in the SDK that we (Devtech group) deliver on a regular basis.

Note 2: Keep in mind that Cg/CgFX is multi-HW vendor and must also work on competitor's HW, as long as you chose a profile that works on the other Hardware (generally, GLSL profile)

Note 3: OptiX and CUDA are still pretty much NVIDIA-specific. But they go through PTX assembly code. And PTX specification is opened and could be used by any other company. But this is not the case, nowadays.

More pictures (Models in Courtesy of Watershot Digital Imaging):


3 comments:

  1. Good stuff! it would be nice to see what the syntax of the annotations looks like. Hopefully it's better than the much dreaded SAS annotations...

    ReplyDelete
  2. Thanks. I will post some samples of the effects in a next Post. I didn't want to overload too much this one, for now.

    ReplyDelete
  3. Very interesting reading, especially about the integration of OptiX and CgFX. I might follow up on this for my thesis regarding ray tracing in the next-gen AAA-engine. Would be interesting to implement some other ray-tracing-renderings, perhaps photon mapping for caustics and perhaps radiosity for color bleeding. This could be implemented in a CUDA-pass (or OpenCL for that matter if I might dare to mention).

    ReplyDelete