Monday, March 15, 2010

Looking for animation technology in GDC...

This last WE on Saturday, I stopped again at GDC for my personal entertainment (essentially to see the Independent Games, the best of GDC... for me).
I wanted to see what is the status of Animation in games. It turned out that I found very few interesting things.
I only found in the Exhibition floor two middlewares and/or services :


They showcased something that is very similar to my tests I posted earlier... They told me their solver id more efficient when the amount of characters to IK-correct is high. They also show that their solver is for any kind of skeleton, as opposed to others who may focus on Human skeleton.
Unfortunately, I didn't see much and the demo on the booth wasn't very attractive (although I don't doubt they did an interesting work).





Autodesk "HumanIK"


Right before the end of the show, I discovered that Autodesk released a middleware called "HumanIK". This middleware allows you to do exactly what you need to correct animations to prevent human characters to badly interact with the floor or any other obstacles.

For ~$50,000 per project (if I recall correctly the price he told me), they propose few tools to setup the Character rigging for IK; a set of C++ libraries for those who want to integrate the middleware in their engine; a UE3 module that allows you to setup IK with UE3 graph editor.
The demo in UE3 was rather convincing. They also had another demo available that I really didn't like : the quality wasn't showcasing the expected result.
But I guess the problem was essentially related to an artistic choice, because the games using this middleware, like Prince of Persia and Assassins Creed (or here for another article) did a great job.
Now, this tool is a bit expensive and too much Human-centric.

NaturalMotion Morpheme was not at GDC... but I just wanted to mention that they also do some good work around this topic.

An interesting clue for me : asking people why IK corrections aren't so often integrated in games, the answer is always because it can easily be computational intensive if you get more than one character in a scene. Let's call the GPU for help (I will ;-) !

I found 3 talks dealing with animation and possibly IK (and Physics) : Avalanche Studios Just Cause, where they mix PhysX, IK and animation; Naughty Dog's Uncharted and EA's Fifa Soccer. I just attended Just Cause talk (interestingly, the physics integration was interesting but painfull)...

To conclude, I have the feeling that IK-corrected animation is out-there, but not broadly adopted. I also feel like this approach is going to be more and more adopted, now multi-core machines are commonly used; now that developers started to understand well PS3's SPUs; now that we have GPU-Compute available through DX11 and CUDA.

Sunday, March 7, 2010

My 1990's hand made co-processor for my Amiga



Yet another prehistoric thing I took out from my boxes...
In 1990, when working as a student in my "engineering school" in Paris (EFrEI), someone managed to grab a bunch of Motorola DSP56001 Micro-controllers. I bought one in the hope to do something with it.
I was lucky at this time : I was part of a Club in Melun (France) where lots of people were joining the group to share their passions. I managed to get the whole equipment to perform the job in my bedroom : I borrowed a signal analyzer so that I could make sure that data and address bus were properly synchronized; I borrowed the whole stuff to wrap thing by hand and to make sure signals weren't bothered too much by any electrical noise...
At the same time, the good thing with Amiga was that lots of literature was available in order to understand how the motherboard was made and how the chips were working.

But nothing would have been possible if I didn't get the chance to get the whole development kit from Motorola. It is indeed one thing to wrap thing together. But it is another story to write assembly code; check it; simulate it; send it to the card and finally trigger the DSP to run the whole stuff.
In fact I am not sure it is only about luck. At this time I had lots of passion in anything related to computers. So I just picked-up the phone and tried many times to call Motorola until someone accepted to listen to me. I guess I convinced the guy (or he was desperate to get rid of me at any cost ;-) because he broke the rules by sending me the whole package for free, so I could work with this DSP. I am very grateful to him. It was wonderful to see that some people in such big groups are still willing to help young guys in their adventure.

The board is made of 3 parts: static memory (this was easier because, although fragile, it doesn't require any sort of special refresh system...); address mapper and a bus switch so that I could expose the memory to the Amiga computer in order to upload the programs and read/write some data; and of course the DSP.

So, When the whole stuff was connected, I started to trigger some basic operations that I knew how to identify on the data bus and address bus. This allowed me to correct the signals on the wires.

When everything went alright, I started to write more fancy things. In fact I confess I didn't do so much but the few I did was very cool.
The way I used this micro-controller was as follow: write my code; compile it; simulate it; then an application written in assembly was running in fullscreen to send the compiled code to the memory; send some data to another area; trigger the DSP with an interupt signal; then read back some memory that you can now consider as an offscreen buffer (renderbuffer...) and display the result on screen.
This DSP was running at 16Mhz if I recall correctly. Which wasn't much. But this was way better than the Amiga, especially when you know that a DSP has some specialized opcodes that helped for some computations.

The main application I wrote was a realtime Mandelbrot fractal generator. In fact I could scroll in reatime (~15fps) into Mandelbrot set by computing new upcoming lines depending on the scroll. But even the whole set didn't take much time (few seconds). This was amazing when you know that, at this time, people got the same result in ~25 minutes (!) when using the computer only.
At the same time I was demoing the 3d demo (discussed in my previous post). I took the opportunity to show the board. But first, I had fun by hiding the board and by showing the perfs to some other fellows who were demoing the same Mandelbrot set computation with their Amiga computer.

The second test I wrote was a... rasterizer ! I could render shaded triangles (no texture...) in my offscreen buffer. So I could really speed-up the rendering of some of my assembly tests in filled triangles... I confess that I started the rasterizer from a reverse-engineering work of a game that I hacked : Starglider 2 (this game was a reference for me and I spent lots of energy to take out the whole rendering path to reproduce it on my DSP... I was a bit of a nerd at this time...)

Then time passed and I switched to other topics... but this was fun. I didn't want to make money out from this thing (although I received some offers to develop the idea). I just wanted to make this happen as a challenging experience. This was during my 2nd year of a 5 year school program. My teachers were quite amazed to see how I could make this thing work : some parts were quite clunky. Furthermore, students often try this kind of project even after their 5 years study...

1991's vector Amiga 500 demo

I just discovered that someone posted a video capture of an old demo I wrote in 1991.

This is funny to look back at these things which now really seem so retarded...
But this was good time. I remember I coded the whole thing in assembly code in my bedroom after school...
The demo showcases few things:

1) bird flock. Birds are flying. the physics of the birds is computed depending on how fast their wings are flapping and depending on their orientations. These birds have a short perception of their neighbours and could adjust their flight according to the others or according to some target.

2) Character animation. I integrated in this demo an editor that allowed me to edit and build a whole sequence of animations on various skeletons. Then the demo was of course able to interpolate smoothly between key positions...

3) camera tracking. I could tag in the animation sequence which target the camera had to lock on.

4) Music. I integrated a sequencer; sampled few instruments and wrote a song (I think I got the inspiration from a Lenny Kravitz)

At this time, the challenge was to write things in full assembly code. But the Amiga computer was such an amazing computer that I really enjoyed this time (although I am not sure I would write in assembly code nowadays... ;-)
Some good friends from a computer-club pushed me to participate to a demo contest that I won... I guess thanks to this, the demo got posted and could be found on the web.

Anyhow, this was great personal experience that I would recommend to any young guy eager to start in computer-graphics: stay few days in your room; work on ideas from top to bottom and get out of the room only when you're done ( and don't listen to anybody who would tell you that your idea isn't worth it or is already done by someone else ;-)

Wednesday, March 3, 2010

Mixing NVIDIA Technologies thanks to CgFX



I recently found very interesting to Push the limits of NVIDIA CgFX runtime.
Cg is the equivalent of HLSL for OpenGL (GLSL and other profiles), DirectX (9-10-11) and is multi-platform (Windows, MacOSX, Linux).
Unfortunately, Cg didn't get the expected success, despite the fact it really offers interesting features. But I am confident that things will change...

On top of simple shader declaration, Cg offers a more general layer : CgFX. This layer can be considered the equivalent of what Microsoft exposed as "D3D Effects", but in more powerful.

I decided to push the limits of CgFX and see how we could take full advantage of its flexibility.

Spotlight on few cool CgFX features

CgFX offers some interesting features that very few developers explored :
  • Interfaces : very useful in order to have an abstraction layer between what we want to compute and how it will be computed. For example, lighting can be done through this :lights are exposed in the shader code as interfaces. The implementation (spot, point, directional...) of these lights is hidden and decoupled from the shader that is using this interface.
  • Unsized arrays of parameters and/or interfaces : a good example of using this features is to store the light interfaces in this array. The shader doesn't have to know how many lights are in this array. The array doesn't have to care about the implementation of the lights it holds. So the array is a good way to store an undefined amount of lights of various type
  • Custom States : states are used in the Passes of the Techniques. CgFX allows to customize states and to create new states. You can easily define special states; states that have no meaning in D3D or OpenGL, but have a meaning for what you need to do in the pass with your application. These states are a perfect way to tell the application what to do in the passes. For example, you can create state "projection_mode" that would allow the effect to tell the application to switch from perspective to ortho projection... and by the way, regular OpenGL & D3D states in Cg are also custom states, but they are shipped and implemented by Cg runtime :-)
  • Semantics : very simple feature... but this allowed me to simulate handles of objects (more details later below)
  • A hairy API... but giving you lots of ways to dig into the loaded effect. People often complained about how Cg API is bad looking, or is too complex. I somehow agree. But my recent experience showed that this API in fact offers a lot of interesting ways to do what you need. As usual, there is no magic : flexibility often leads to complex API.
Pushing the limits

The cool features I enumerated above allowed me to start thinking on flexible approaches for complex scene rendering.

Scene Level effect
Something that always bothered me is that DirectX or Cg examples always show examples on how to use Effects (so, Techniques and Passes) for objects; more precisely: how to use effects for object's materials.
I considered another approach : why not use 2 levels of effects :




  • the first layer is the common way : provide effects for each material. For example an effect would be available for some Plastic shading, for objects that need a plastic look... etc.
  • A second layer of effects acting on scene level : prior to rendering any object, why not use the Technique/Pass paradigm to tell the application how it need to process the scene ?
This scene level effect makes totally sense. Here is an example.
If the application needs to render a complex scene (made of intermediate passes like post-processing passes), the Technique/Pass paradigm matches perfectly what we want.
So, let's decide to create an effect for the whole scene, where many techniques will be available for many kind of scene-level processing. Let's say we created a technique for transparent&opaque objects, with some special post-processing (say, Depth of field...) :
  • Pass 1: render the Opaque Objects in a specific render target; let's also store the camera distance of each pixel in another render target (a texture)
  • Pass 2: render the transparent Objects in another render target (a texture); but let's keep the depth buffer of opaque objects so that we get a consistent result
  • Pass 3: perform a fullscreen quad operation that will read the opaque and transparent textures and do the compositing. This operation will allow to fake some refraction on transparent triangles thanks to a bump-map, for example
  • Pass 4: use the camera distance stored earlier to perform another pass that will blur the scene according to some Depth of field equations
It is important to mention here that the job of Pass 1 and 2 is to simply setup how the application is going to render the scene-graph (or whatever you call it): Pass 1 just asked the application to render the scene-graph of opaque-only objects in a specific render target. Pass 2 will ask the scene graph to only render transparent objects (from back to front...)

Another important point is : Pass 1 and 2 will trigger the rendering of complex scenes which are also made of other CgFX effects for materials. For example, Pass 1 will trigger a scene graph where many CgFX effects are needed to render good looking materials (plastic, car-paint...).
As you can see, a Pass of an effect can nest other effects made of other techniques and passes...

Making things consistent

The main problem I had to solve was : how do we make sure that nested material effects are consistent with the scene-level effects ?

Let's use an example to explain the issue:
Let's assume a scene-level effect asked to render the whole scene as usual (RGBA colors...); but this pass also asked for storing the camera distance for each rendered pixel (so that we can do a Depth of field post-processing in a later pass). How can the underlying effects be aware that it needs to output more than RGBA : how can I tell the effect it needs to store the distance to another output (keep in mind that a material effect is a self-sufficient effect file and doesn't know anything about the scene-level effect file) ?

The solution is to use annotations and interfaces :
  • the scene-level pass will use an annotation to tell underlying effects which interface should be used to generate the pixels. Meaning that a special interface was defined and that material effects have to use this interface rather than simply write fragments by itself.
  • the material effects will receive the correct implementation of "pixel output" interface, so that all the render targets get properly updated. Here, the material effect has no knowledge of what finally will be written in the output: it only provides all the possible required data. The interface that the scene-level did choose will do this job...
Resource creation

Usually, developer who are using shaders or effects need to make sure the resources needed for this effect are create and are available. For example, if an effect is using intermediate render targets for RGBA and camera distance, the application will need to allocate the buffers (in OpenGL, known as textures, renderbuffers and put together with FBO - Frame Buffer Object) and bind these buffers.

I found a way to allow the CgFX effect source code to define the resources needed for the effect to work properly. My idea is that a scene-effect that is unable to describe the mandatory resources is incomplete. There is no point of having a flexible tool if it still requires developers to write specific code for each specific case.

The solution again came from the annotation and Semantic features. Let's use a very simple example : let's assume the scene-level effect needs to render some intermediate RGBA colors to a 32 bits floating points buffer (say, for further post-processing of HDR...):
  • the scene-level effect will then contain a simple integer parameters where the name will represent the texture target and its semantic will explicitly tell the application that this integer parameter is a texture resource
  • This integer will contain annotations. One annotation for example will tell that the format of the texture is RGA32F; another annotation will give the width and heigh of the texture
At initialization time, the application will simply need to walk through the parameters that are involved to the current technique and will have to create the resources by using the details provided by the annotations.

____

As you can see, there is a way to make a scne-effect layer live with a material-effect layer; and there is a way to attach the resource creation to the effects.

This approach turned-out to be very successful and very flexible. I could implement many scene-level effects, like Image-based ligthing; HDR with radial blur; Toon shading etc.

The beauty of this implementation is that no additional C++ coding is required to implement many different effects : the whole effect written in CgFX is self sufficient.

Adding other NVIDIA technologies

This approach turned out to be so flexible that I wondered how I could add more of our technologies. I am thinking of course about two famous things:
  • Optix ray tracing (to give the web link...)
  • CUDA (to give the web link...)
OptiX in a CgFX pass

So I started to figure out how I could give all the details to CgFX on how to setup OptiX. My idea was that CgFX is enough flexible to be able to tell what to load and what to associate. I am not sure I can give much details here without getting the reader lost. So I will stick the the very basic concept.
OptiX is a very powerful toolkit that allows the developer to perform any sort of work that implies some rays to gather intersection in a scene. Of course the famous "ray tracing" is the main use-case. But in fact you can do many other things.
OptiX requires you to write some CUDA code and to compile this code to a format called PTX (a pseudo-assembly code for 'compute' operations). This PTX code is what is used to compute intersections with any primitive (triangle, sphere...); shading of any surface etc. (the reader should read the documentation in order to fully understand OptiX)

My idea was to find a way to generically include an Optix pass in the rendering process of an application. I wanted to have OptiX included in a pass to create some results in a resouce (texture or buffer), so that I could use this result for later passes (compositing etc.)

A typical case : hybrid rendering for inter-reflections.

To clarify, Hybrid rendering is a way to render a scene by mixing some fast OpenGL/D3D rendering with slower (but more accurate) Ray tracing. The best use-case is "inter-reflection".
There is no way to solve the complex inter-reflections with typical OpenGL/D3D pipeline. The best we can do is on water surface or any flat mirrors, because the topology of such object is simple. But reflections of an object over itself is definitely hard.
Ray tracing, however is able to do this properly, although it can be expensive. Optix ray tracing is definitely good at this since it takes advantage of GPU computation strength.

The idea of hybrid rendering is to keep OpenGL for the fast work, while deferring the work of complex rendering, like inter-reflections, to OptiX. Finally, the two images would then be put together, according to any kind of post-processing work.

I implemented a way to allow CgFX to declare what is required in order to have Optix ready for a pass. This implementation (again, using semantics and annotations !) allowed me to :
  • Declare references to PTX code that we will need to send to Optix
  • declare an Optix context
  • associate the references of PTX code to various Optix layers
  • declare arbitrary variables with default values that Optix will gather and send to the PTX code (some ray tracing code would for example need any sort of fine-tuning parameters for precision or recursive limits...)
  • Declare the resources that Optix needs : input resources (pixel informations needed to start the ray tracing); output resource where to write the final image of inter-reflections; intermediate buffers if needed...
  • tell a CgFX pass that it needs to trigger the OptiX rendering, rather than doing as usual a simple OpenGL/D3D scene graph rendering
Here is an example of a scene-level technique that would contain an OptiX pass:
  • Pass 1: render the scene with OpenGL : render 3 targets : colors, normals and world-space position of each fragment (pixel)
  • Pass 2: trigger OptiX rendering. the starting point of this rendering are the pixels of the previous render targets : given the camera position, it is possible to compute the reflection... the final reflection colors are then stored in a texture
  • Pass 3: read the texture from previous OptiX pass and merge it with the Pass 1 RGBA color
Another case-study is "pure ray tracing" : with the same exact integration in CgFX, I can perform a 100% ray traced image. In such a case, the technique would look as follow:
  • Pass 1: trigger the OptiX rendering. The RGBA scene result will be stored in a texture
  • Pass 2: draw the texture with a fullscreen quad
The result are promising and opened the door to a lot more freedom in how to implement scene-level effects with complex rendering features.

However many details need to be addressed in order to make such an approach fully working :
  • The 3D scene is mirrored within Optix Environment (and hashed into 'acceleration structures', according to some fancy rules). This necessary redundancy can be tricky and takes some memory in Video Memory...
  • The shaders that are written in Cg should somehow be also writtent in CUDA code for Optix.
  • CUDA doesn't provide as good texture sampling as what does a pixel shader : the Mipmaping isn't available, neither are cubemaps...
  • Many other concerns.
What about performances when adding ray-tracing ? I didn't do any strict measurement, yet. But on a good board, the rendering is still interactive. I got ~6fps for some inter-reflection on top of an OpenGL rendering.
Our new next-gen GPU can be up to 3x faster. I could get hybrid rendering ~16 to 18fps...

CUDA in a CgFX pass

When I saw how interesting was the integration of OptiX, I just realized that we could get the same out of a CUDA integration in CgFX.


CUDA is of course very generic. I decided to integrate it in the technique/pass process for any sort of post-processing. This means that the typical Grid/Block division will be used to partition some processing on some rendered image... I am for example thinking of





  • simple Glow post-processing
  • convolution post-processing
  • Bokeh filter (accurate depth of field)
  • deferred shading (for example, check GDC about deferred shading using Compute...)
  • etc.
I think that CUDA integration in a pass could also be good for some particle generation that the next pass could exploit... more complex use of CUDA will require deeper investigation.

No need to get into much details for CUDA integration : the idea is the same as for OptiX. I needed to find a way to define resource I/O and temporary buffers; define Kernel modules and define how to attache CUDA-compiled code to the modules. I needed to define a way to connect the Kernel arguments to CgFX parameters.

The early result looked promising : without specializing the C++ code of the application, I could instanciate some CUDA code and make it part of the rendering process for some final post-processing.

Next time, I will try to give more concrete details on how the CgFX code is looking. I should also be able to post few pictures...

To conclude, I was myself amazed by the strength of CgFX in such extreme situations. Msft HLSL/Effects would never have been able to provide me such flexibility.
The Cg team made a good job at defining an API and developing a runtime that, despite its hairy look, almost answers to every fancy needs to push CgFX even beyond its limits !

Note: I am doing this work as part of NVIDIA company. The purpose of such a project is to write some interesting examples on how to use our software technologies; how to take full advantage of it to leverage our NVIDIA Chips. I should be able to publish all the source code in the SDK that we (Devtech group) deliver on a regular basis.

Note 2: Keep in mind that Cg/CgFX is multi-HW vendor and must also work on competitor's HW, as long as you chose a profile that works on the other Hardware (generally, GLSL profile)

Note 3: OptiX and CUDA are still pretty much NVIDIA-specific. But they go through PTX assembly code. And PTX specification is opened and could be used by any other company. But this is not the case, nowadays.

More pictures (Models in Courtesy of Watershot Digital Imaging):


Monday, March 1, 2010

Calderon Dolphin Slaughter in Denmark





























Every year, in Denmark, specifically the Faroe Islands, innocent and helpless Calderon Dolphins are slaughtered brutally by the danes.
This poor dolphins are stabbed a number of times, but as if that weren't enough, they bleed to death, probably in excruciating pain while the whole town watches.
Very surprisingly, young people are supposed to participate to this shame, so that they can show they became "real civilized men". Needles to say, that killing a deffensless animal is no proof of anybody's manhood.
It is certainly not our fault if the rest of the world can now consider them as the opposite of what they think the are... brutal, retarded and uncivilized.
I am concerned by the fact that they are somehow part of Europe...

Few notes (to try to find excuses on their acts) :
  • do these people need to act like this in order to maintain their local way of life and industry ? This could be possible. Maybe the local economy requires fishing these cetaceans. But I keep being amazed that such a massive Slaughter be really needed.
  • we should check on how things are really happening theses days because I had no way to know when these pictures were taken
I also found an interesting discussion about this here : http://honeyreyes.com/tag/massacre-of-calderon-dolphins-in-denmark/

"I forwarded the email to my students and one of them (from Germany) replied:

Hello Honey,
yes, this is sad!
But I think there is something wrong with the message.
Faroe Island is an autonomous province of Denmark.

I think this island is nearer Norway, and Norway is known for disagreeing to the international agreement against whaling.

This island is far away from all of the other continents.
Yes, it’s bad, that they are hunting whales and kill them in that way, but I think you must have a look to the whole situation of that island.

They live from the whales and fish, they are hunting and they have no other industry.
Another thing is, that you didn’t know how old these photos are…

warm regards

Birger"