]]>

So what is a ** geometric query**, anyway? It’s a generic term that covers all questions involving shapes, positions, orientations, and anything else geometry related. Here are some common examples from various parts of a game engine:

- Collision – Did the player collide with anything this frame?
- Graphics – Is a particular object visible to the camera?
- Audio – Can the player hear a particular sound? How loud, and in what direction?
- Input – Does the touch point or mouse click location “select” an object?
- AI – Is an enemy soldier’s path to the room unobstructed?

It’s possible to build more complex geometric queries by combining simpler ones. For that reason, I think it’s important to build a set of basic primitive tests that you can use to solve all kinds of problems with. These primitive queries are so fundamental, that I view them as an extension of the core math library.

Geometric queries come in a variety of forms. Some ask a Boolean question, others ask for specific details such as distance or angles. There are countless queries we can come up with, but we’ll focus on a few key types of basic queries. They are:

- Boolean overlap query – Do two shapes overlap in any fashion? No additional data is returned, only that they overlap or that they don’t. Overlap is defined as sharing any region of space with each other, so even if one is completely contained in the other, they’re considered overlapping.
- Overlap query – This is similar to the Boolean overlap query, except that it also returns information about the overlapping region.
- Intersection query – This is a class of queries that can return more detailed intersection data.
- Ray cast query – If you shoot a ray out from a point in a direction, does it hit anything? Where? How far?

Shapes can range from very simple (points, lines, triangles) to very complex (arbitrary polyhedra, smooth curvy surfaces, etc…). For the purposes of this blog, we’ll start with some simple shapes as our building blocks, and in later posts will cover a few complex ones. However, we’ll generally always stick with convex shapes. This is because queries involving convex shapes generally have simpler solutions (both mathematically so we can understand them, and computationally so they are efficient). Most non-convex shapes can be described in terms of multiple convex ones, so we can usually use a combination of smaller queries to answer the non-convex question.

This post was just an introduction to the concept of geometric queries, giving a reason for why we care about them. In the coming posts, we’ll explore some basic shapes and common queries involving them. I’ll do my best to come up with real-world examples to use for the motivation of each test. I’m currently planning on covering points, lines, triangles, spheres, and axis aligned boxes as the simple shapes. Then, the plan is to talk about using a *change of basis* and the separating axis test as a way to solve more complicated problems, using the non-axis aligned box (also called oriented bounding box) as an example. Once shapes are covered, we’ll talk about higher level algorithms for visibility culling, collision detection, space partitioning, etc…

Also, if you have specific shapes or queries you’re interested in, please let me know and I’ll try to include them.

]]>I spent the past 2 years working on various OS components for the Xbox One, including much of the multitasking work for the system (what allows you switch quickly from app to app, or from game to app and back. And run multiple apps side by side), the live rendering of the game you see in the dash when you pop back with the Guide button, and various other things throughout. That's also the reason why my blog has been pretty quiet, things were intense trying to get a high quality product out the door on time. Now that the Xbox has shipped, I'm moving on to work on the core OS graphics kernel team, so I can help improve graphics performance and capabilities across all of our products (Xbox, Phone, and Windows). Additionally, I'm going to be setting aside time to start making regular posts to this blog again. Given that my new role is in graphics, you can expect more graphics themed posts. However, that doesn't mean I'm abandoning the physics engine series (I need to dust off the work I was doing for that, and get it cleaned up enough to post ).

Looking forward to having some great discussions on here.

-Reza

]]>

The next natural choice is to represent translation, rotation, and scale separately. This makes it trivial to manipulate the values independently. However, since most graphics packages require the final transformation as a matrix, we’ll need to combine the elements later. This isn’t very difficult to do, and we’ve already seen how we can do that in previous posts. Translation and scale are trivially represented as 3 element vectors, which maps well to how we use them, including combining them with matrices. But what about rotation? How can we represent that?

One way to represent the rotation is to just leave it as a pure orthonormal rotation matrix. It is easy to combine this form with position and scale, since the rotation is already in matrix form. We can add more rotation to this matrix by concatenating other rotation matrices using matrix multiplication. This is somewhat expensive, but conceptually simple.

However, as more and more rotations are applied, our matrix could start to stray from being orthonormal (due to rounding errors, etc…). If we stray far enough away from being orthonormal, we can no longer be considered a rotation, and applying this matrix to an object may have unexpected side effects like shearing or deformation. Therefore, we need to regularly check and ensure that our matrix maintains the orthonormal property, which is another somewhat expensive task to do regularly.

Finally, we still have the problem of requiring 9 floats (16 if you only support 4x4 matrices) to store the data. Despite all this, rotation matrices are still a common form of storing rotations.

Another common way to represent rotation is by using 3 angles (called the Euler angles) which represent a rotation around the x, y, and z axes. This representation only requires 3 floating point numbers to store, and allows us to easily adjust rotations around each axis independently. When the axes of rotation are the local coordinate axes of an object, the rotations are sometimes called yaw (rotation about y axis), pitch (rotation about x-axis), and roll (rotation about z axis).

While this scheme sounds simple to implement, it has some serious draw backs. First of all, in which order do you apply the rotations? Do you first rotate about x, then about the y, then about the z axis? Or perhaps a different order? You’ll notice that the result is different for each combination. There is also a condition called gimbal lock which can occur when two axis are made collinear as you rotate. For instance, if we first rotate about the y axis, and this brings our z axis in line with where our x axis would have been, then the z rotation would look more like an x rotation, and would likely not give us the result we’re looking for.

Even with these issues, Euler angles are a common way of representing rotations. They are certainly intuitive, which is likely why they are one of the most common representations in 3D modeling tools, game and level editors, CAD programs, and many other software applications where reading and typing in rotation values is necessary. But for our internal representation in code, we can probably do better.

For any set of rotations, the final orientation can be represented as some final axis of rotation and an angle. This is called axis-angle representation. For example, imagine that your head is at the origin, and that you are looking forward along some axis. If you rotate your head about the y axis by 90 degrees to the left, then rotate about the z axis by 90 degrees so that you are facing upward, you end up looking up and to the left, with your chin held high to your left. This final orientation could have also been achieved by using a single axis, which is at a 45 degree angle going in the up & right direction from where you started. Return your head to the starting position, and imagine that there is an axis going up and to the right. Now rotate your head about that axis and you should see that you end up in the same final location as the first pair of rotations.

The axis and angle are normally stored as a single 3 element vector. The direction of the vector represents the axis of rotation, and the magnitude of the vector is equal to the angle of rotation. This keeps the storage requirements minimal, requiring only 3 floats. Unfortunately, adding two rotations in this form is not a simple task. You can’t just add the two vectors together, as rotations aren’t really vectors. This difficulty of combining them makes these less ideal for use in games, as one would normally convert them to another form, combine them, and then convert back.

The final representation for rotations that we’ll consider is the quaternion. Quaternions are a set of numbers (usually said to be in the space *H*) and are an extension of the complex numbers, and they have a lot of uses in mathematics. For our purposes, however, we are only concerned with *unit quaternions.* A unit quaternion can be used to represent rotations. For a complete discussion of how unit quaternions relate to rotation, and how you can visualize them using a hypersphere, refer to this wiki page.

Before we get into the mathematics and operations of quaternions, let’s examine why they are a better choice than the options we’ve previously discussed. Firstly, they require only 4 floats to store, which makes them much more compact than a rotation matrix, and pretty close in footprint to the other representations. Secondly, combining quaternions is far simpler than for axis angle, and they represent a continuous space so they have no risk of gimbal lock like the Euler angles. While it’s true that we have to normalize quaternions often to prevent rounding error, the normalization process is simple and much more efficient than orthonormalizing a rotation matrix. Lastly, converting to matrix representation is simple and requires no complex operations, so building our composite transform is trivial as well.

How exactly do we represent a quaternion? What do they look like? The quaternion space, *H*, is a 4 dimensional vector space and so we can represent quaternions using a 4 element vector. Quaternions are an extension of complex numbers, and have 1 real component and 3 imaginary components. They are written in the form w + x*i*, + y*j* + z*k*. To store them as a 4 element vector, we just put the exponents for i, j, and k into x, y, and z, and use w to represent the real portion.

The usual convention for writing quaternions is to write the 4 element vector with the real number first:

We can also write it as the real component and the vector component:

Unit quaternions can be thought of as a modified axis-angle (axis r, angle theta), where:

The identity quaternion is:

Quaternion addition is just like any 4 element vector:

Scalar multiplication also works as it does for vectors:

Normalizing a quaternion should also look familiar:

Quaternions have a *conjugate*, which is written as **q***. The *reciprocal *or *multiplicative* *inverse* of the quaternion can be defined in terms of the conjugate, and is written as **q ^{-1}**:

Multiplying, or *concatenating*, two quaternions takes the form:

or:

It is important to note that quaternion multiplication is not commutative, and written and performed like matrix multiplication, from right to left. Finally, rotating a vector by a quaternion can be achieved by using:

where **v _{r}** is the resulting rotated vector, and

Converting to matrix form is also very important for us, since we’ll need to do this to build our transform. Additionally, creating quaternions from a given axis and angle will prove to be much more convenient than trying to determine rotation quaternions directly. For instance, if I want to rotate an object about the y axis by theta degrees, it would be convenient to specify my rotation in those terms.

Converting from an axis * v* and angle theta to a quaternion, which can then get applied to the orientation, is straightforward:

Creating a matrix, which represents the quaternion rotation, takes the form of:

With all of these operations, we can now handle all of our scenarios, and we’ve met our criteria for choosing a representation for rotations. To summarize:

- Requires only 4 floating point numbers to store
- Manipulation is simplified by using axis-angle rotations as input
- Vectors are trivially rotated using the formula above
- Easily converted into matrix form for building transform
- Normalizing to maintain unit quaternion qualities is trivially performed

Given all of these properties and benefits, it’s easy to see why quaternions have become a very popular choice for games to use. As we start to explore the physics engine more in the coming posts, we’ll be using quaternions as our representation of rotation as well, and will be referring back to many of these concepts and formulas for that discussion.

]]>**Topics I have planned:**

- Finish the math primer (rotations & quaternions)
- Collision detection infrastructure
- Common collision detection algorithms in detail (multiple posts)
- Physics simulation infrastructure
- Common physics integration and solver algorithms (multiple posts)
- Architecture and building of a full physics and collision engine (multi part series)
- If there’s enough interest in non-physics topics, I can go into graphics, scripting and general engine design

By the way, I recently wrote an article about collision detection that got published in the April issue of GameCoderMag.** ****Check it out!**

An *affine transform *is a transformation which maps points and vectors in an affine space to another affine space. This allows us to transform positions now, in addition to just vectors. An affine transformation can be thought of as a combination of a basis matrix and a translation to an origin, which is analogous to affine spaces compared to vector spaces, since affine spaces add an origin. When we look at an object, we can define a right, up, and forward vector for it, relative to some frame of reference. But furthermore, we can describe it’s *position *in space. Not the position of a particular point on the object, but the position of the whole object. What we’re actually doing is saying that all the points on the object can be thought of relative to some origin on the object, and that origin has a location in space (relative to some frame of reference of course). This means that even though we move the object’s origin around, we’re not changing the orientation (the right, up, and forward vectors). The reverse is also true, we can rotate or scale our object all we want, but it won’t move the object’s origin. In this way, we see that the origin of the object, and it’s basis are independent, and the formal affine definition reflects this separation. Affine transformations are formally defined as multiplying a point by a matrix component (to define it’s orientation) and then adding a vector component (to define it’s origin’s position):

Here, M is the matrix portion of the transformation, and is sometimes referred to as the linear portion. This can consist of rotation, scaling, or any other linear transformation. The t in the equation is called the *translation *and is the vector defining where the object’s origin is compared to some frame of reference. Let’s take a look at an example to get a better grasp of this:

In Figure 1, we see a frame of reference shown (origin O, with right, up, and forward defining our frame). We also see an object (the square). If we were locally sitting inside that object, then x and y would have been our right and up vectors, and P our local origin. But since we’re examining the object from a different frame of reference, it appears rotated and located at some other location. This placement and orientation can be described as a single affine transformation, as show in the equation above, with M being the linear portion and t being the translation portion. Using our example above, and assuming only rotation (to simplify), our matrix M would likely look like:

Assuming that x, y, and z (which we can’t see in the image) are defined relative to our coordinate space and are unit length. Look back to this post if you’re unsure how we got that. We then would add our translation vector (dashed line t in the figure).

While this works out well enough, the extra addition must be remembered and there’s not a convenient way to store this information. Fortunately, there’s an alternate way to express and use affine transformations which is much more convenient and consistent with what we’ve already seen:

Note that this is now a 4x4 matrix, as apposed to the usual 3x3 we’ve seen so far. It contains the linear part of the affine transformation in the upper left 3x3 block, and the translation vector as the last column. The bottom row contains 0s with the exception of the last entry which is a 1. Here’s an example of the expanded matrix from the previous example:

Before we go on, I wanted to call out that I’ve been using column matrices so far in this post. The same concepts apply if you’re using row matrices, just remember the layout will be transposed from what I’m showing here, with the translation vector as the bottom row of the matrix, and the 0s as the last column, and transposing the upper left block.

Now, we claimed earlier that affine transformations could be used to transform both points and vectors from an affine space to another. Points and vectors are quite different. One is a location and one is a direction. It doesn’t make any sense to translate a direction, so how can we possibly use this matrix to transform vectors? The answer is in the fact that we have to augment a point or vector to be compatible with this matrix. This is a 4x4, and both points and vectors are 3 components. Without some augmentation, the points and vectors are not compatible with this new matrix. How we augment them depends on whether they are points or vectors, and dictates whether the translation component of the affine matrix is applied or not.

If you recall from this post, points are sometimes written as 4 component vectors with a 1 in the last component (sometimes called w). Vectors can likewise be written with 4 components, with the last component equal to 0. The reasons this worked for arithmetic were shown in the post, but we now see there is another critical reason for that distinction. When you multiply a 4 element vector with the affine matrix above, the math works out to:

Note the highlighted portion of the last one there. This is exactly what you’d get using a normal 3x3 transform M multiplied by p. This means that we get the exact same result as applying the linear transformation M and then adding t times p’s w component. If the w component is 0, as in the vector case, it cancels out the translation portion. If the w component is 1, then we translate by exactly t, which is what we expect for points. Notice also that the last element of the new vector is exactly equal to the w component of the starting vector. This means that vectors stay vectors under affine transformation, and points stay points. This is pretty important, it means that an affine space will never turn points into vectors or vice versa. The augmented points and vectors which contain the w component are said to be in *homogenous space*, and another term often used for the affine matrix (since it requires homogenous vectors) is a *homogenous matrix.*

In the next and final matrix-related math primer post, we’ll examine some other operations which can be performed on matrices. Specifically, we’ll take a look at the determinant, adjunct, and inverse. Afterwards, we’ll quickly look at some of the problems with rotations, and introduce quaternions as an alternative way to express rotations which overcomes many of the traditional problems. I’m going to be continuing the physics posts as well.

]]>A transformation is any operation that maps values from one space to another. In the context of standard algebra, we are usually introduced to transformations in the form of *functions*. Functions and transformations are really one and the same, though the term transformation is more common in linear algebra. In fact, sometimes you may even see transformations or functions referred to as *mappings*. Again, for our purposes these are all equivalent ways of conveying the same meaning. We take in a set of values, called the domain, and we produce another set of values, called the range.

When discussing transformations of affine and vector spaces, it is quite convenient to represent the transformations in the form of a matrix. This allows for succinct notation, as well as easily combining transformations through matrix concatenation (multiplication). For example, take the following transformation:

Here, we say that the matrix ** M **is multiplied with the vector

A *linear* transformation is a transformation that maps from vector space to vector space, maintaining the vector addition and scalar multiplication properties of the vectors. The first part of the definition, mapping from vector space to vector space, means that our transformation matrix must be exactly *n* x *n* in size, where *n* is the dimension of the vector space. The number of columns must be n in order to be compatible for multiplication with the vector. The number of rows must be n to give us an n-dimensional vector as a result. For 2 dimensional space, that means a 2x2 matrix. For three dimensional space, that means a 3x3 matrix.

The second part of the definition means that the addition of two vectors, and scalar multiplication of a vector, must be maintained after the transformation. Put in equation form:

The first equation says that the transformation of the sum of two vectors is equal to the sum of the transformation of the original vectors. This is what we mean when we say that the vector addition property of the vectors is maintained. The second equation says that the transformation of a scaled vector is equal to scaling the transformation of the original vector. This is what’s meant by maintaining the scalar multiplication property of vectors. Because of this, an easy way to determine if a transformation is linear is to combine the above equations and try and prove that they hold true for a particular transformation:

If you can prove that the above does not hold true for a given transformation, then that transformation is not linear.

Next, we’ll look at two of the most common linear transformations encountered in games: scaling and rotation. In addition to describing each transformation, I’ll show that they’re linear by using the equations above, and also use each one with some real numbers plugged in as an example. Hopefully, this will make them easier to understand.

The first linear transformation we’ll look at is scaling. Three dimensional scaling can be represented with the following matrix:

Where s_{i }represents the amount of scale along the ** i** vector, s

Which is what we’d expect. Now, let’s show that this is linear by plugging in the two sides of the equation for showing linearity from above and seeing that the results are equal in both cases:

We can see that in either case, we ended up with the same answer. This shows that the scaling transformation is indeed a linear one.

Now, let’s look at an example you may encounter in a game. For this example, let’s say that we are trying to scale up a triangle with vertices (-1, 0, 0), (0, 1, 0), and (1, 0, 0) by a factor of 3 in all directions. I’ve intentionally used simple numbers to make this easier to visualize and show in a blog post. The same concept applies regardless of the numbers. This gives us a matrix with 3 all along the diagonal:

To scale our triangle, we would need to multiply each vertex by the scale matrix:

This gives us the 3 scaled vertices we expect: (-3, 0, 0), (0, 3, 0), and (3, 0, 0).

The other linear transformation we’ll look at is rotation. Rotation consists of an *axis of rotation*, and the *angle of rotation*. When looking directly into the axis of rotation (the axis is pointing towards you), the rotation angle is measured *counterclockwise*. We’ll start by examining the matrix representations of rotations about the axes of the standard basis (** i**,

Let’s start by looking at the matrix for rotation about the ** i, j, **and

We can then combine these through concatenation to form a general rotation matrix representing the combined rotations. However, we’ve seen that concatenation is order dependent, so we must be careful which order we combine them in. Combining pitch, then yaw, then roll may give a different result from combining as yaw, then pitch, then roll. There’s not really any standard for the order in which they’re combined, so be sure to stay consistent.

So, are rotations linear transformations? I’ve said that they are, but let’s show that this is the case. We’ll show this for one of the rotations above, but the same thing can be done for each of them:

As we can see, both ways lead to the same result, which confirms that rotations are in fact linear transformations.

Finally, let’s consider an example with real numbers. Let’s say we have a vector** u** = (3, 4, 5). We want to rotate this vector by 45^{o} counterclockwise about the ** i** vector, then 30

And that gives us the new vector after the two rotations: (5.7795, –0.707, 2.3285).

To wrap up, we’ve discussed transformations in general and looked at a special class of transformations known as linear transformations. We’ve looked at the two most common linear transformations in video games, scale and rotation. In the next installment of the math primer series, we’ll discuss affine transformations. As always, please let me know if there’s anything that I can help to explain better.

]]>

Today, we’ll look at some of the core concepts around physics simulations in video games. We’ll go over some terminology and approaches so that we’re all on the same page when discussing more involved topics in later posts. First off, we’ll discuss what it means to have physics in a video game, and how does that compare with physical simulation in other software fields. Then, we’ll discuss the basic steps in simulating physics in a game, and how it integrates with the rest of the game code.

Video games have become incredibly complicated pieces of software over the past 20 years. I recall a time when PC games could fit on a single density 720KB 5.25” floppy disk. Several years later when CDs were initially becoming available, you could find a single CD with over 1000 games on it! These games ran at measly 320x240 resolutions with 8 or 16 colors.

Today, games easily weigh in at several gigabytes, with many being tens of gigabytes. They have nearly photo-realistic graphics with incredibly high resolutions. Along with that extra visual realism comes an expectation of *immersion *from the user. The point of having incredibly realistic graphics is to make the player *feel* like they’re actually in the game. However, for the illusion to hold, the virtual world around them must also *behave *in a believable manner. The best graphics in the world wouldn’t make a player feel like they’re actually immersed if the characters move unnaturally, pass through walls or floors, or things don’t fall and topple as if they had realistic mass and material properties. These are just some of the challenges of making the world feel believable.

Physical simulation software is nothing new, and has been used for decades for scientific research and study. Even the use of physics in games isn’t entirely new, though it’s gotten a lot more attention in recent years. Even some of the earliest games, such as Pong, had very primitive physics simulations in the sense that the trajectory of the ball bouncing off the paddles was determined in a way that would seem intuitive and consistent with our understanding of physics. What is new, however, is the recent adoption of general purpose physics simulations in video games. The physics in early games was very much written for the specific scenarios of that game. For example, it could handle projecting the trajectory of a ball off of a paddle, but nothing else. This meant that for each new effect the game wanted to have, a new piece of physics code would have to be written to handle it. Additionally, a new game written would have to have new physics code written for it as well. As game worlds became more complicated, the need for more physical scenarios increased, and the cost of writing specialized functions for each scenario became too prohibitive. During this time, the concept of having a general purpose piece of code that, given some overall parameters, could simulate arbitrary physical conditions and scenarios came to be. These general purpose physics *engines* could allow game developers to create games with many more interesting scenarios, and do it much more quickly.

Before moving on to discussing the anatomy of these physics engines, there’s one more important point to consider. Physics simulations are just that, simulations. Physics, *in general*, is very complex and not possible to replicate in code. However, if we take certain core aspects or principles of physics, and simulate them with some acceptable level of accuracy, then performing this in code becomes much more reasonable. The most common class of physics simulated in games is called Rigid Body Dynamics. Rigid body simulations use Newtonian principles of movement and mass to simulate rigid objects (spheres, boxes, a tower, a boulder, a wall, etc…). In games, the physical accuracy of the simulation only needs to be high enough to be believable, and is often times reduced to improve performance. A smooth running, albeit slightly less accurate, game is much more immersive than a perfectly accurate game that runs poorly and constantly drops frames.

There are many different physics engines out on the market today. Each one works somewhat differently than the next, and each uses different approaches to solving the general physics problem for games. However, in order to solve the problem for games, there are certain things that every engine must do. For the purposes of this post, we’ll be focusing on rigid body dynamics as it’s the most common type of physics in games. I’ll try and include posts later to discuss other types of simulations such as cloth, fluid, or gas. All of the components of the physics engine can be summarized in a single general operation called the *simulation step*.

Video games display many rendered frames per second. Comparing this to animation, many small in-between images are shown quickly to give the impression of movement. The de facto standard for most PC and console games is to target 60fps, which means that each frame represents about 0.01667s of time. For the physics simulation to match, only 0.01667s worth of physics should happen each frame. That means an object moving forward at a velocity of 1m/s should only move 0.01667 meters that frame, and then again the next frame, and then again, and finally after 60 frames have gone by, the object would have covered 1 meter of distance.

These intervals of moving objects along their trajectories is called the physics simulation step. Almost every physics simulation equation involves time, and the time used for solving these is this time slice determined by the game. Most engines will take in the time as a value each frame, as it could change from frame to frame. Some games use variable frame rates, and so a static number should never be assumed.

So, what exactly *happens *during the simulation step?

A typical physics engine normally tracks all the objects that it’s simulating in a data structure somewhere. For simplicity let’s call it a list, though this isn’t always the case. For each object, the physics engine needs to know some important information, such as it’s mass, current velocity, current position and orientation, and outside forces acting on the object. Each step, the *integration *portion of the physics code will *solve *for the new current positions and velocities for every object. This involves using equations of motion from physics to approximate the positions and velocities for an object in some future time (current time plus the time slice for the frame), using the current positions and velocities as starting points. In addition, outside forces such as gravity, springs, friction, or anything else relevant to the game are also considered to ensure the newly computed positions and velocities make sense.

If there were only a single object moving in a vacuum, then we’d be done! However, most games involve more than one object, and these objects move in interesting environments. Therefore, there arises a situation where two objects are moving towards each other or end up running into each other. What happens? If we don’t do anything, these objects will just pass right through each other. The renderer certainly has no knowledge that these are separate, solid objects that shouldn’t overlap. It will happily draw them intersecting each other. In most games, however, you don’t want the objects to pass through each other. An asteroid colliding with a ship should crash into it, possibly destroying the ship. A character walking on top of a floor should stay above it, gravity shouldn’t pull the character down through the ground. In order to handle these scenarios, the game needs to know that* *two objects are overlapping. The process of identifying these scenarios is called *collision detection*, and is normally one of the other major tasks the physics code must do. Generally, the job of the collision detection code is to determine all such pairs of overlapping objects, possibly including some additional data such as how far they overlap and in what orientation, and providing this data to the game for further processing.

Once the physics engine has identified that a pair (or many pairs) of objects are overlapping, then what do we do? In many cases, this is something specific to the rules of the game. For instance, in a space shooter game, when an asteroid overlaps the player’s ship, the game may decide to stop drawing the player’s ship and instead draw an explosion animation. Following that, the game would probably start the level over and reduce the number of lives of the player. All of these reactions to the collision are driven entirely by the game itself, and not the physics engine. This is because they are very specific to the game in question. However, there are certainly cases where the game doesn’t care to be involved. For instance, in a first person shooter, if the player knocks over a chair in a room, the game doesn’t need to do anything specific for this case. There are no game rules invoked, and therefore the game just wants the motion of this chair and player to continue to be simulated in a realistic fashion. This likely means the chair falls over in response to being bumped by the player, and when it strikes the floor, it probably rolls or tumbles slightly. This class of reaction is another common, and arguably the most complex, job of the physics engine and is generally referred to as *collision response* or *collision resolution*.

We’ve covered the basics of all physics engines. Every simulation suitable for games will have these general components, though it’s common to include many more features such as joints, cloth simulations, physics-based animation, and other interesting things. I’ll be going into a lot more details for each component as we continue to discuss physics in games. I’m planning on presenting a sample architecture for a physics engine, and then drilling into each section piece by piece to discuss the design, why it was designed that way, what algorithms are used, what other alternate algorithms are there, and what optimizations where made and why. I’ll be providing sample code snippets and probably entire posts dedicated to certain algorithms and optimizations. Please note that the architecture used will be something of my own invention, and is not that of any other physics engine out there. While there will certainly be similarities due to the fact that we’re solving the same problem, all of the code and design presented will have been created specifically for the purpose of education on this blog.

]]>Few things are as ubiquitous in game and graphics programming as a matrix. In this installment of the math primer, we take a look at these structures, investigating not only their numerical significance, but also what they represent visually. Next time, we’ll see how to combine them with vectors and with other matrices to form complex transformations, which we rely heavily on in game code.

When we refer to a coordinate system, we represent the system with a central point (origin), and 2 (for 2D) or 3 (for 3D) linearly independent vectors, also called *axes.* Look back to the earlier primer articles for a refresher if you need it. To measure a point with respect to this coordinate system, we use an ordered 2- or 3-tuple, such as (3, 5, 1), which represents starting at the origin, then moving 3 units along the first vector, 5 units along the second vector’s direction, and finally 1 unit along the 3rd vector’s direction. The three vectors used in this way are called a *basis*. A basis is the set of linearly independent vectors used to define a coordinate system (or more accurately, a coordinate *frame* or *frame of reference*). If we label these vectors **u**, **v**, and **w**, then we can write the 3-tuple above as 3**u** + 5**v** + 1**w**. We can even define left or right handedness of the basis by looking at the way we orient the third vector with respect to the first two. Specifically, third here means the one that we write last when expressing the tuple. When someone mentions the *standard basis*, they’re referring to the set of vectors (normally labeled **i**,** j**, and** k**), where the values are (1, 0, 0), (0, 1, 0), and (0, 0, 1) respectively, normally centered around the origin **O** (0, 0, 0). Let’s take a look at Figure 1 for a more visual representation.

Something important to note here is that we can express any basis in *terms* *of another basis*. That means that we can express any basis in terms of the standard basis. If we look at Figure 1 a little more closely, we see that even though** u**, **v**, and** w** are vectors making up a basis, they’re still just vectors, and that means we can express them in terms of **i**, **j**, and** k**. For instance, if we use the basis consisting of **u**, **v**, and **w** as our frame of reference, then **u** would most likely be written as (1, 0, 0). However, using the standard basis as our point of reference, we might measure **u** to be some vector (0.6, 0.6, 0) or something of the like. This is a key concept to keep in mind: basis vectors can all be measured and represented in terms of another basis, including the standard one.

It’s getting a little tedious to constantly keep referring to a basis in a manner such as “the basis consisting of basis vectors **u**, **v**, and **w**”. Wouldn’t it be nice if there was a succinct notation to capture that? Well, there is! We can write the basis vectors in block form, like:

Where **u, v, **and **w** are expressed as vectors in terms of the standard basis. This is a matrix, and the expanded form looks like this:

Now, we need to clarify a few things here before continuing. The way I’ve laid out the matrix above is not the only correct way to lay out a basis. Just as coordinate systems have left handed and right handed variations, which are just a matter of preference and convention, matrices too have some convention considerations to make. Matrices may be used in what’s called row-major or column-major form, which directly correlates with how you wish to write vectors. Vectors can be written as row-vectors or column-vectors. See Figure 2 for a comparison of the two. It’s important to realize that just as picking one handedness of coordinate system over another had no bearing on the result, just the shape of the formulae, the same is true of how we express our vectors and matrices.

**Figure 2: The row-vector on the left is written with the 3 components horizontal. The column vector on the right is written with the 3 components laid out vertically.**

How we write out matrices is directly related to what kind of vectors we use. If we use row vectors, then we lay down our basis vectors as *rows* of the matrix. If we are using column vectors, then we lay down the basis vectors as *columns* in the matrix. See Figure 3 below:

**Figure 3: The row-major matrix on the left consists of the basis vectors written as rows of the matrix. The column-major matrix on the right consists of the basis vectors written as columns in the matrix.**

Again, which we choose doesn’t make a difference as long as we are consistent with our convention and form our formulae appropriately for the form we’ve chosen. In mathematics, and most texts on mathematics and graphics, column form is more widespread. However, in the actual game and graphics world things are a little more divided. For instance, DirectX and XNA both use row form, but OpenGL uses column form. You could write an entire library in column form, but still use DirectX or XNA to render, you just need to convert between the forms at the appropriate places. The conversion between the two, called a transpose, will be covered a bit later in this post.

I will choose to use column vector form even though most of my coding examples will likely be either DirectX or XNA. You might think that seems counterintuitive, but I believe writing out mathematics in a manner which is more consistent with academia is more natural, and will match what you find in research papers, math books, and other references around the web. My coding examples are just that: examples. I don’t think it’s justified to form my mathematical explanations around my examples’ choice of graphics API.

NOTE: It is important to realize that this discussion of row versus column major matrix is only relevant when talking about vectors. As we’ll see in a moment, matrices can be used for many other things besides vectors, and in those cases there is only a single form of a matrix. In other words: row major and column major matrices are still the same matrix, we’ve just chosen to impose a convention on *how we write vectors in terms of matrices. *

While it’s convenient to write vectors as single row or column matrices, and basis vectors as the rows or columns of a 9 element square matrix, these are far from the only uses of a matrix. In fact, most formal definitions of a matrix are something along the lines of “a rectangular arrangement of numbers, organized into rows and columns”, which says nothing about vectors.

Let’s try and define matrices a little more generally now. We know that they look like blocks of numbers, and that they are considered to have rows and columns, so let’s add that we can refer to the number of rows and columns as “*n* by *m*”, or *n* x *m*, where n is the number of rows and m is the number of columns. This is summarized in Figure 4.

**Figure 4: Matrices are labeled using row x column. From left to right, the matrix sizes are: 2x3, 1x2, 3x1, and 3x3.**

We can see from the first matrix in the figure that elements within the matrix are always labeled using the row, then the column number, and are 1-based. The subscripts in the first matrix show the row and column numbers of element a_{ij}. The second and third matrices could be interpreted as row and column vectors, respectively. The final matrix in the figure leads us to a few more definitions.

If the number of rows and columns in a matrix are the same, the matrix is said to be a *square matrix.* Since the number of rows and columns are the same, we can refer to square matrices with a single size dimension. For instance, we can say something is a dimension 3 square matrix, meaning it’s a 3x3. The set of all elements in the vector for which the row and column are the same (a_{11}, a_{22}, etc…) is called the *diagonal* of the matrix. For example, in the rightmost matrix above, the diagonal is the set of numbers (4, 3, 8). If the only non-0 elements in a matrix are within the diagonal, then the matrix is called a *diagonal matrix.* So, to summarize, our final matrix above can be called a *square, diagonal matrix*. Finally, as a matter of notation, I’ll use capital, bold, italic letters to represent a matrix. This should make it easier to tell them apart from vectors (bold lowercase) and points (italic capital). For example, a matrix ** M**.

There is a special matrix, called the *identity matrix*, which is a *square, diagonal matrix *with only 1s in the diagonal. It can be any size, is normally written as ** I** or

Now that we have notation and terminology out of the way, let’s start looking at operations we can do with matrices!

The most trivial operation we can perform on a matrix is taking it’s transpose. This swaps all the rows of the matrix with all of the columns. If the matrix begins as an n x m matrix, then the transpose is an m x n. The transpose of ** M** is written as

There are a few important observations to make. Firstly, if we look back to our discussion on row and column vectors and matrices, then we can see clearly now that to convert between the two we take the transpose. Secondly, it’s important to note that the diagonal of a matrix remains the same after taking it’s transpose. This will always be the case. In fact, many like to think of taking the transpose as taking the reflection of the non-diagonal elements across the diagonal.

Addition and subtraction of two matrices doesn’t come up quite as often in games, but no discussion of matrix operations would be complete without them. Matrix addition can only be done between two matrices of the same exact dimensions, and is a trivial operation of summing the elements at the same location in each:

Subtraction is done in the exact same way, again requiring the matrices be of the same dimension and shape.

Scalar multiplication is as straightforward as can be. Multiplying a matrix ** M** by a scalar

Multiplying, also called *concatenating*, two matrices is by far the most common operation done on matrices in game code. We’ll explore why when we talk about transformations later in this post. Multiplying matrices isn’t as straightforward as adding or subtracting, but with a little help visualizing what it is we’re doing, it’s not too bad. Let’s take the following two matrices, ** A** and

Multiplication of matrices requires that the number of *columns *of the first matrix match the number of *rows *of the second. We can refer to this dimension as *d*. We can see that our matrices ** A** and

Using our example matrices ** A** and

By this definition, matrix multiplication is not commutative, since the column and row requirements may not be met, and even if they were, the result would be different. The only matrices which meet the row and column requirements when reversed are square matrices, but again the result of the multiplication is different.

While we can certainly think of multiplication in this way, I find it far easier to visualize the multiplication in a more vector-oriented way. If we imagine the rows of the first matrix as vectors (like a row major basis), and the columns of the second matrix as vectors (like a column major basis), then what we’re really doing is taking the dot product of each possible pair of vectors, with the dot product of the *i*th row of the first matrix and the *j*th column of the second matrix making up the element m* _{ij}* in the resulting matrix. In other words:

We now can see what the purpose of the identity matrix, ** I**, is. Multiplying any matrix by an identity matrix of the appropriate size, yields the original matrix:

Multiplying a vector by a matrix is called *transforming* the vector. We’ll look at transformations in the next post in this series, but let’s understand the math behind it first. This is where our discussion of column versus row vectors becomes most relevant. To multiply a vector and a matrix, we treat the vector as a single row or column matrix and multiply as usual. This implies that which side of the matrix the vector goes on is important, and must satisfy the row and column requirements of the multiplication. If our vector is a row vector, then we could write a 3-vector and 3x3 matrix multiplication as:

We could similarly write a column vector multiplied by the same matrix as:

Notice that the vector is on the other side of the matrix. This is required to make the multiplication work. Looking at the expressions above, the product of the multiplication would be different in each case. However, we know intuitively that transforming a vector by a matrix can only have a single answer, and that our choice to use row or column vectors shouldn’t impact the result. In order to move the vector from one side of the matrix to the other to satisfy the multiplication requirements, we must also *transpose* our matrix to ensure that the *result* remains the same. If we do that, the product of the multiplication will be the same regardless of whether we choose row vectors or column vectors. This is exactly what we were talking about up above in the basis section. So the correct form for the second equation becomes:

Which will ensure we get the same result as the first case.

There are many more operations we can do with matrices, and I’ll cover some as we go through the next few blog posts. But for now, we have enough covered that we can start to look at linear transformations. The next installment will start our exploration of transformations, beginning with linear transformations. I hope you enjoyed the introduction of matrices, and as always let me know if there’s anything you’d like to see me explain in more detail.

]]>I know it’s been a little while since my last post, and I apologize. I’ll try and keep the posts a little more frequent moving forward.

In the last post, we briefly encountered barycentric coordinates and loosely defined them as the coefficients of an affine combination. While that’s true, we can do better. We can define a more precise definition, and we can take a closer look at what they really mean, both numerically and geometrically. That’s the topic of this post, as well as taking a brief look at how we could use them in a real world game programming context.

First, let’s take a look at the formal definition of the coordinates, then we’ll consider a slightly refactored version that works better for our situations. Consider a triangle ABC (see Figure 1), and then imagine a little weight at each vertex. We can assign each weight as having 100% weight contribution from that vertex, and 0 contribution from the other vertices. So, for point A that’s (1, 0, 0), for point B it’s (0, 1, 0), and for point C it’s (0, 0, 1). This means that if you’re at the vertex, you only get the weight from that 1 vertex. Anywhere else within the triangle would be a combination of these weights. The *barycenter* of the triangle is the point inside the triangle where you could balance the weights. In other words, the weights would be contributing evenly to that point. Assuming that the weights at each vertex are equal (1), this would be the mean of the weights (or vertices):

The barycentric coordinates for the barycenter then become (1/3, 1/3, 1/3) given our equation above. Notice that this is exactly the equation of an affine combination, which is why we stated that the coefficients of an affine combination are also the barycentric coordinates. The dashed lines in the image represent the barycentric axes. A barycentric axis starts on a triangle edge, where the weight for the opposite vertex is 0. They then extend through the barycenter of the triangle to the opposite vertex, where the weight for that vertex is 1. Notice that the other coordinates in the base of each axis are 1/2. This is just a coincidence since we happen to be looking at an equilateral triangle. This won’t hold true for other kinds of triangles.

Now, what else can we observe? Well, for each axis, our values only extend from 0 to 1, since they are weights of that vertex. In other words, a value less than 0 or greater than 1 would be outside of our simplex (triangle in this case). Furthermore, the sum of the coordinates is necessarily equal to 1. Since these coordinates represent the amounts of each weight that you observe at that point, they are percentages, and therefore must sum up to the total. Otherwise, you’d be missing some of the weight from the system. These two observations will be extremely helpful as we examine uses in game code. The other thing I wanted to mention here is that if the restriction that coefficients of an affine combination must sum up to 1 didn’t make sense after my explanation before, I hope that looking at it from the perspective of barycentric coordinates helps to justify the restriction.

Now that we’ve examined the formal definition and use of barycentric coordinates, let’s take a slightly modified look at them, and see how we can make them more useful to us as game developers. We saw in the last post that any affine combination could be refactored and expressed as a single point, or origin, added to a linear combination. Let’s take our barycentric equation (which is an affine combination) and refactor it in this way now, using s, r, and t as our barycentric coordinates (coefficients):

We’ve substituted u and v as (B-A) and (C-A), respectively. Relating our result to the figure above, we see that we’ve picked A as our local origin, and two of the triangle’s edges as u and v. Our barycentric coordinates have been reduced to just r and t, and are now expressed relative to the local origin A. It’s important to note that mathematically, this ‘reduced’ form of the barycentric coordinates is equivalent to the formal version, but this format is much more usable to us. Because r and t are still barycentric coordinates, they must still fall between 0 and 1, and their sum still cannot exceed 1. Notice that we said exceed this time, and not sum up to 1. This is a subtle difference than before, and it can be explained like this: previously, we had the sum of s, r, and t = 1. This still must hold true. However, we’ve made s *implicit* in our new reduced form, and therefore it is not directly included in our sum. If r and t sum up to 1, then s is 0. However, r and t can sum to less than 1, and s will be equal to the remainder (see the second and third steps of how we arrived to our reduced form). In summary:

Now let’s see how we can use this. Imagine we’re trying to write a function in our game that determines whether or not a point is contained within a triangle. This is actually quite the common problem to solve in collision detection. In two dimensions, it might be a direct collision detection query. In three dimensions, we normally will first determine if a point is in the plane of a triangle, and if so, then reduce it to a two dimensional problem exactly like the normal 2D case. In either situation, we need a robust way to determine if a point is contained within a triangle. We can use barycentric coordinates in the reduced form to solve this, by taking the following steps:

1. Pick a vertex of the triangle to be our local origin, which we’ll refer to as A

2. Compute u and v using the differences of the other two vertices and our origin as we did above (ex: u = B-A, v = C-A)

3. Compute the r and t barycentric values for our point P with respect to A

4. Check that r and t are both within 0 and 1, and that their sum is less than or equal to 1. If so, return true. Otherwise, the point is outside, return false.

It’s actually quite straightforward, with the exception that we haven’t yet discussed how you’d complete step 3, computing the barycentric coordinates. Let’s take a look at that now, and then with r and t computed, we’ll be able to complete the function. There are several approaches to finding the barycentric coordinates in this step. The simpler of the two uses some identities and properties of vectors we’ve covered up to this point, which is why I’ll choose to show that one now. However, after we take a good look at matrices and solving systems of linear equations with them, we’ll see that there’s a more efficient way to compute them. See Figure 2 to see the problem we’re trying to solve.

Let’s start by refactoring our equation slightly, and move the A over to the other side:

We’ve replaced P-A with the vector w. We need to now solve for r and t. If we look at some of the vector rules that we covered earlier, we’ll remember that any vector crossed with itself (or any collinear vector) will result in the 0 vector. So, to eliminate t, let’s take the cross product of both sides with v:

We can see that by having the cross product of v with itself go to the 0 vector, we were able to eliminate t from the equation and solve for r. We can repeat this same process for t to obtain:

At this point, we can determine whether or not r and t are going to be greater than 0 or not. This is the first requirement of our test. The equations for r and t above represent ratios between two cross products. The two cross products each represent a vector, so in other words we’re taking the ratio of two vectors. The only way r or t can be negative is if these vectors point in opposite directions. So, let’s use the dot product to determine if the numerator and denominator in each case point in the same direction:

The sign() of each side will be > 0 if the vectors are pointing the same direction, or < 0 if the vectors are pointing away from each other. At this point, if either is < 0, we can exit the function with a false result.

The next requirement we need to meet is that r and t must each be smaller than 1, and that their sum must also be less than 1. To do this, let’s take the norm of each of the equations above. Since we already know that r and t are positive, we know that the norm of r and t is just r and t.

Again, we can repeat this same process to solve for t, and we’ll get:

Also, since swapping the order of a cross product doesn’t change the magnitude of the resulting vector, only it’s direction, then we can also say that

Which gives us a common denominator in our r and t formulas, so we only have to compute the value once. And there we have it, we now have r and t computed, which means we can complete our function. A sample implementation, written in C# and using XNA might look like this:

///<summary>

/// Determine whether a point P is inside the triangle ABC. Note, this function

/// assumes that P is coplanar with the triangle.

///</summary>

///<returns>True if the point is inside, false if it is not.</returns>

public static bool PointInTriangle(ref Vector3 A, ref Vector3 B, ref Vector3 C, ref Vector3 P)

{

// Prepare our barycentric variables

Vector3 u = B - A;

Vector3 v = C - A;

Vector3 w = P - A;

Vector3 vCrossW = Vector3.Cross(v, w);

Vector3 vCrossU = Vector3.Cross(v, u);

// Test sign of r

if (Vector3.Dot(vCrossW, vCrossU) < 0)

return false;

Vector3 uCrossW = Vector3.Cross(u, w);

Vector3 uCrossV = Vector3.Cross(u, v);

// Test sign of t

if (Vector3.Dot(uCrossW, uCrossV) < 0)

return false;

// At this point, we know that r and t and both > 0.

// Therefore, as long as their sum is <= 1, each must be less <= 1

float denom = uCrossV.Length();

float r = vCrossW.Length() / denom;

float t = uCrossW.Length() / denom;

return (r + t <= 1);

}

And that concludes this post on barycentric coordinates, and one of their uses in game code. I hope that this post also served to solidify some of the previous information about affine combinations. Finally, it gave us a chance to use what we’ve covered so far to so something useful. I hope you enjoyed it, and I’ll be moving on to matrices next.

]]>