Tuesday, January 5, 2010

Baking 3D assets


When using any complex file formats (Collada, fbx...), the application ends-up working a lot in re-organizing data before being able to really use them. In a game, data formats like obj, fbx are only used in the production pipeline. But when the game needs to be packaged for shipping, art asset are "baked" in a specific file format, most of the time a format that is almost ready to use.

For my use cases, I wanted to take this idea of baked format so that any sort of sample would be fast to load. For example, say you want to test some sort of shader: loading an Obj file could lead to a long period of time where the application will have to re-compute the tangent and binormals attributes; make unique vertex attributes by duplicating vertices that has many normals etc. I remember waiting for 40 seconds before my sample really started... not acceptable. The approach of a baked format is not only good for speed. It also helps to make the sample application be simpler : less algorithm for data processing because all is already baked. All we need is loading data in memory and resolving pointers.

The idea behind this binary format isn't to define yet-another-file-format, but rather to provide some sort of memory snapshot of data that are needed : a cooked (or baked) version of the meshes, 99.9% ready to be used in the best optimal way.

In order to make this happen, I decided to shift most of the work outside of the application : in the exporter (currently I used Maya). The more work the exporter does, the less the application will do.

For example :

  • attributes packing for GPU;
  • how vertex attributes are interleaved;
  • the way attributes can be on different slots;
  • The topology (primitive type) needed to render some vertex buffers
  • even some arguments related to some graphic APIs...
  • ...


This baked data format is just made of a bunch of nested structures containing data and pointers to other structures or pools of data (vertices...). The only issue with this method is to be able to address 64 bits architectures : I had to borrow 64 bits for each pointer even if at the end only 32 were used in x86 mode...


Example of structures:


struct Node
{
char nodeType;
char name[32-1+4];
unsigned int nodeByteSize;
PTR64(Node *nextNode);
};
//------------------------------------------------
struct FileHeader : public Node
{
unsigned int magic; // unsigned int "MESH"
unsigned int version; // version of this file

PTR64(MeshPool *pMeshes);
PTR64(TransformPool *pTransforms);
PTR64(MayaCurvePool *pMayaCurves);
PTR64(MaterialPool *pMaterials);
PTR64(RelocationTable *pRelocationTable);
FileHeader();
void init();
float* findComponentf(const char *compname, bool **pDirty);
void resolvePointers();
void debugDumpAll();
};
//----------------------------------------
struct RelocationTable : public Node {
long numRelocationOffsets;
int : 32;
PTR64(unsigned long *pRelocationOffsets);
};
//----------------------------------------
struct Mesh : public Node
{
PTR64(SlotPool *pSlots);
PTR64(PrimGroupPool *pPrimGroups);
PTR64(AttributePool *pAttributes);
PTR64(AttributePool *pBSAttributes);

PTR64(SlotPool *pBSSlots);
PTR64(TransformRefs *pTransforms); // Transform offsets : the ones the mesh is using (many refs for the skinning case, 0 or 1 for the basic case)

BSphere bsphere;
AABBox aabbox;

PTR64(FloatArrayPool *pFloatArrays); // array of float coming from curves or anything else derived from FloatArray. For Blendshape animation... for example

};
... etc

To make the data from a file available, I just have to do the following:

  • Load the file and check the version so we know our structures will match
  • Jump to the very end of the file and get the "relocation table" : a list of offsets that you'll use to compute the pointers. Yes... the data in the file doesn't contain pointers, of course. Instead, pointers are empty, and a relocation table tells you how to compute the pointer values with specific offsets.
  • Optional processing : modern Graphics APIs need to create resources. There is nothing we can prepare early, here. Therefore this is the only processing that is requested, after you loaded the data : create index/vtx buffers; Input layouts( in DX); VBOs ( in OpenGL)...

Features I expose:

  • The whole stuff available through very few code. Only 3 or 4 include files. No library; no dll
  • Multiple streams (or Slots in DX10/11), with or without interleaved attributes. This is sometimes useful to have separate attributes for those that are optional, so that we can disable them (say, for the shadow pass...). The way we interleave attributes is specified at export time. This sounds odd but after all this allowed me to really have data ready as fast as possible.
  • A pool of transformations + dependencies for skeletons. Transformations reflect the whole Maya matrix stack and contain Bindposes for skinning.
  • various attributes that I can export, depending on my needs (of course: position, normals... but also skinning weights and offsets; tangent, binormals...)
  • Some special Streams/Slots that contain Blendshape attributes (delta values for position and/or normals and/or other attributes).
  • Maya animation curves as a list of Bezier curves with a reference to where they are connected (most of the time they are connected to some levels of transformations, like translation... rotation...)
  • Basic Material details : diffuse; specular; texture names...
  • More specific material details : effect/shader name and optionally technique/pass name (if using an effect)
  • Attribute details that are needed by APIs to create resources (like InputLayout) or needed to invoke rendering (vertex array size, topology like triangle list; fans etc.). If needed, I will even have exported the flags for DX9/10/11 and OpenGL so that I don't have to bother about it.
  • Be able to read the file when gzip'ed
  • Multiple Meshes available in a single file
  • multiple Primitive groups for each mesh : when vertex buffers are in place, we may invoke more than one Draw call...
  • Bounding boxes, bounding spheres...
  • Tripifier ready to use : the exporter exposes this option of turning the triangle list into a bunch of triangle strips so we save some cache in the GPU
  • Ability to have various primitives (topology, in DX10/11), like Triangles, lines, points, strips, fans, quads etc.
  • Every single piece of information nested into a Node of arbitrary size, so we can iterate throught the file like in a list
As I mentioned earlier, even if the structure is almost ready to use, API need resource creation (VBOs, DX buffers...). So I one helper file for each API (OpenGL/DX9-10-11)... these helpers contain the code for input layout creation, OpenGL VBO/DX buffers creation etc.

Brief overview of the structures that are used in this binary file:

Simplest case

The simplest case ever is when using basic OpenGL. Say we want to display a mesh with the very basic OpenGL usage (no VBO etc) :

FileHeader * loadedFile = load("myfile.ck3d"); 
NVCooked::Mesh *pMesh = loadedFile->pMesh[0];
...
glVertexPointer(
pMesh->layout.attribs[0].numComp,
pMesh->layout.attribs[0].formatGL,
pMesh->layout.attribs[0].strideBytes,
pMesh->layout.attribs[0].pAttributeBufferData);
glNormalPointer(
pMesh->layout.attribs[0].formatGL,
pMesh->layout.attribs[0].strideBytes,
pMesh->layout.attribs[0].pAttributeBufferData);
...
glDrawElements(
pMesh->primGroup[0].topologyGL,
pMesh->primGroup[0].indexCount,
pMesh->primGroup[0].indexFormatGL,
pMesh->primGroup[0].pIndexBufferData);


Here we assume that we know that attrib[0] is the position, attrib[1] is normal (And the structre contains the string name of these attributes... so it could be possible to check...)

A more complex example

This example shows how I do things when I need to create resources. I am going through very simple helpers (they aren't even in a library : just an include file for each API):

  • initialize the meshes
    • create buffers (DX10/9 buffers, VBOs...)
    • declare misc interfaces (input layout...)
    • change the way to use attibutes : for example if the model contains a lot of attributes that you don't need, you probably want to only use the ones that are relevant, while the others are ignored. Therefore, Input layout (for example) would be different...
  • matrices
    • Compute the matrices depending on the hierarchy. Has a 'dirty' bit to avoid recomputing them when they didn't change
  • render the meshes
    • Apply buffers, input layout (if needed)
    • call the proper drawcall, depending on topology, index buffers...
  • Blendshapes : These layers of data can be used in different ways... there are some helpers to bind them as buffers so the vertex buffer uses them as buffer templates...
  • skinning...
Initialization:
meshHelper.load("obj.ck3d"); 
meshHelper.getMesh()->initBuffers(); // getMesh() == getMesh(0). Could me getMesh(10) to get Mesh 10...


Map attributes we really need (in DX10, this will lead to a specific input layout...):

meshHelper.getMesh()->resetAttributeMapping(); 
meshHelper.getMesh()->setAttribute(POSITION, MESH_POSITION); meshHelper.getMesh()->setAttribute(NORMAL, MESH_NORMAL); meshHelper.getMesh()->setAttribute(TEXCOORD0, MESH_TEXCOORD0); meshHelper.getMesh()->setAttribute(TEXCOORD1, MESH_BONESOFFSETS); meshHelper.getMesh()->setAttribute(TEXCOORD2, MESH_BONESWEIGHTS);


Update transformations:

// compute curves
meshHelper.updateAnim(g_time);
...
// propagate the results to the targets (pos, rotation...)
meshHelper.updateTransforms();
// get data as packed array: good for shaders
meshHelper.getMesh()->getTransformArray(matrices);
cgGLSetMatrixParameterArrayfc(
bones_param, 0,
meshHelper.getMesh()->getNumTransforms(), matrices);


Draw:

... bind you shader... etc.

meshHelper.getMesh()->bindBuffers();

// simple case where we have one primitive
meshHelper.getMesh()->draw();

groupmeshHelper.getMesh()->unbindBuffers();



Note that we are still low level here. The application still needs to do the state management, like cycling into techniques/passes (if using effects) etc.

Blendshapes

Blendshapes are exported in a specific way : they are considered as additional slots (Streams).

Imagine that our GPU could be able to handle an infinite amount of attributes per vertex, then you would simple consider each blendshape as an attribute in the vertex layout. 50 blendshapes would lead to 50 additional vertex attributes...

However, this is not possible to inject them all in the GPU pipeline. I use these Slots in another way :

  • Some can be used as attrributes (up to 4 could be appended, often)
  • or read them using Buffer templates (DX10 feature where you can fetch them with a load() from the vertex shader).

Status

Currently, all of this is still under testing and still evolving. I do not pretend to make something that would compete with any available engine or any available tool suite. I just wanted to find a way to make my life easier when working on samples.

So far so good: this binary format helped me to make things go faster and to make the code lighter.

The bad things about a baked format are obvious:

  • less flexibility. Once things are "baked", you pretty much have to use them as they are. However it is still possible to process additional data from the original ones, although this may kill the whole idea...
  • no backward compatibility : the fact that I do a simple mapping of defined structures onto the binaries prevents any sort of flexibility to be able to read older formats... unless we keep older structure declarations in the code and use the ones needed depending on the version...
  • If you want to replace the art asset with another one, you will have to bake the new data before using them... and you will have to make sure all the required data are available. For example if your application need tangent and binormal, the baked data will really have to provide them...
  • this Baked format is not made to be loaded again from your favorite DCC app. I was thinking of improving the exporter and turn it to an exporter/importer...
Next steps

  • I am thinking of exporting what is needed to perform good GPU tessellation. Again, the biggest part of the job would be done by the exporter (or the library...);
  • Rewrite my Maya exporter. The current Maya exporter is too dirty. I am thinking of making a library that I would then use to write the exporter. This would also allow me to create some "conditioners" to bake data from other sources than Maya. For example: given a list of obj files, I would call a cmd line with specific options so I would bake data in a specific way for my specific purpose... like some do in game production... nothing new, here...
  • Write a library so that misc exporter/conditioners can be created from there
  • etc.
Takeaway:
I think that bakind data is not only a task reserved to games or to big applications. I think that baking data is a good way to split the problem in 2 separate sections: the pre-processing and the use of data. Unless you are writing a DCC app, you never really need your application to do what an exporter could have baked for you.

No comments:

Post a Comment