Designing a GPU-oriented geometry abstraction – Part One.

One the inputs of rendering via programmable shading on a modern graphics card is a collection of vertices associated with some per-vertex properties used in shader computations. When programming the GPU, this collection of vertices is commonly abstracted as a vertex buffer, which is essentially just a bag of bytes. The collection of vertices describes a primitive point, line, or triangle topology along with an optional index buffer (if triangle) that describes vertex sharing.  Again, the abstractions for describing these topologies are very weak and essentially amount to flags and arrays, although we can informally describe to the vertex buffer plus its primitive topology description as a geometry. Each vertex first individually undergoes processing in a vertex shader, and then (optionally) per-primitive (point, line, or triangle) in a geometry shader where a new geometry can be specified with fewer or more vertices or even a completely different topology. After vertex and geometry shading, the position of each vertex must be specified. Finally, each pixel of the rasterized primitive is processed by a pixel shader to determine its color. This is a simplification, the GPU supports other features such as instancing, where we reuse the vertex buffer multiple times in a single rendering, and geometry shader stream out, where we save the result of vertex/geometry shading for later re-use.

Bling does a good job of abstracting vertices and pixels: in Bling shader code one value represents both the vertex and pixel, where the vertex position and pixel color are specified with respect to this value. Bling infers whether the code is referring to a vertex or pixel based on a simple dependency analysis: we are referring to a pixel whenever we refer to anything that is only available in the pixel shader, otherwise we are referring to a vertex and the computation result (if used in the pixel shader) will be interpolated based on the analogous results in the primitive’s other vertices. However, Bling’s vertex/pixel abstraction breaks down when dealing with geometry: the input of a a rendering is simply the vertex count and primitive data and Bling has no high-level geometry abstraction. Bling’s lack of a geometry abstraction prevents geometries from being manipulated, composed, and constructed at a high-level, and prevents geometry shaders from being expressed at all unless shader partitioning is expressed explicitly. And this is a pity because geometry shaders can do so many cool things such as dynamically forming fur, hair, feathers, sparkles, blobs, and so on. We definitely need to make manipulating geometries easier.

So here is my new realization: give geometries their own abstraction in Bling, defined as one of the following:

  • The atomic case is a a collection of adjacent primitives. Actually, the only things that need to be specified for this primitive geometry is the number of vertices, vertex adjacency (if more than one primitive is defined), and primitive topology. Vertex properties such as positions and normals are not included as they will be inferred via dependency analysis within the shader. Examples: the face of a cube (four vertices to define two triangles) or the result of sampling a parametric surface such as a sphere.
  • A composition of two different geometries (gn = g0 + g1). The geometries composed must be of the same primitive topology. Example: we could compose a face with a four simple triangles to form a pyramid.
  • A duplication of the same geometry N times (gn = N * g0). We define this separately from geometry composition as it could correspond (but might not depending on efficiency) to geometry instancing in the GPU API. For duplication to be meaningful, each duplicated geometry will always undergo a transformation based on the duplication index. Example: a face is duplicated four times to form a cube, or duplicating a cube N times to do some crude voxel rendering. 
  • A transformation of one geometry’s properties (gn = f(g0)). This is a bit difficult to define since there is no explicit common set of vertex properties, and indeed a vertex might not need a normal or even a position if it is not going to be rendered. However, the most common transformations will be on the position and normal, as the geometry is translated, scaled, and rotated. Example: rotating a face in 6 different ways to represent each face of a cube.

The benefit of this approach is immediately apparent: depending on our need we can define geometries mathematically or load them from a mesh. As properties aren’t included explicitly in the geometry, we can mix and match separate geometries as long as they share the same topology. We can then synthesize the index buffer (if needed) automatically, removing this burden from the user, and infer where geometry instancing occurs.

Unfortunately, vertex properties are more difficult to specify. Before, each vertex had one vertex ID and an optional instance ID, which we would use to compute the vertex’s position and pixel’s color. These could then be used as indexes into a table (in the case of a mesh) to lookup initial values for properties like position and normal, or they could be used in more rich computations; e.g., computing a per-pixel normal from the derivative of a parametric surface. At any rate, we inferred what properties to include in the vertex buffer in a way that allowed an expressed computation to span the CPU and GPU, which is a very good feature from an optimization and reuse perspective.  Now if we allow geometries to be composed, duplicated, and transformed, vertex addressing obviously becomes much more difficult when inferring properties. There are few directions that we could go in to solve this problem:

  1. We could assign an ID to a vertex based on the order that it was defined in a composition, but often vertex properties are defined relative to their geometries, and such a scheme would basically ruin reuse of code of that computes properties for a certain composed geometry.
  2. We could come up with some hierarchical Id scheme. However, vertices are generally thrown all together (in the vertex shader at least), and as such it would be very difficult to reason about vertices whose IDs are not homogenous. 
  3. The other obvious option is to embed properties in geometries, which would solve the geometry reuse problem but then would tie geometries down to a common set of properties that they had to share in order to be combined. Yet this might be reasonable if the set of per-vertex properties are relatively fixed and it was easy to deal with exceptions easily. Whatever properties are declared do not necessarily correspond to what properties go in the vertex buffer; e.g., the position and normal properties of a sampled parametric surface can refer to dynamic CPU-only values (such as a slider’s position), where inference would synthesize the vertex buffer accordingly to include only static values. I’m still thinking about this problem but I guess I’m working in the direction of somehow re-introducing per-vertex properties, although I might have to throw away dynamic typing to make it work in C#’s type system (hopefully, generics will be enough).

Any solution that I come with has to play friendly with geometry shaders, which have the ability transform geometries based on dynamic information. I’ll get into this with my next post.