VGO is Un-Intuitive

Submitted by epreisz on Sat, 11/17/2007 - 03:09.

VGO is Un-intuitive

Intuition can be your enemy when optimizing games. This element of VGO makes it very interesting, yet also somewhat frustrating. VGO is only intuitive when understanding every element of what you are optimizing, which is sometimes very difficult or nearly impossible. This is especially true when video games utilize proprietary hardware and software solutions. Using your intuition involves assumptions. The inability to reliably make assumptions about how your code executes is the bane of optimizing video games.

Simply using a 3rd party’s API and achieving correct results is not a sufficient enough condition to assume we are achieving optimal results. When using a 3rd party’s API or hardware, we must understand the assumptions that the 3rd party developer made while developing there interface. For graphics APIs, these assumptions are not intuitive, and if a programmer doesn’t understand them, it is unlikely that you are using their API efficiently. When we use an API to achieve correct, yet un-optimal results, we are abusing the API.

Hardware developers sometimes design implementations that are unintuitive. Graphics processing hardware can contradict years of knowledge accumulated during traditional software processing. Hardware implementations, aside from the overhead required to set up the hardware, will execute much faster than software implementations.

Write-combine buffers provide an excellent example of how a hardware implementation can change the way we write our software. Dynamic vertex buffers utilize write-combined memory. Write-combined memory does not travel through our cache system like other implementations of memory. Instead, write-combined memory writes to an intermediary buffer, similar to a cache line, called a write-combined buffer.

The write-combined buffer eventually writes to system memory in bursts of batched memory. If we update all 64 bytes of a write-combined buffer, it will write to memory in one burst. If we update only a portion of that memory, it writes in 8 byte increments. Is it intuitive to believe that writing more data in c++ can be faster than writing less? This is the case with write combined buffers. Therefore, if we are updating a vertex buffer using the CPU, it is faster to update all values of a vertex even if they haven’t changed.

Guessing where your bottlenecks and hotspots reside does not work. At least, it does not work well. It is an interesting challenge to look at an application and determine what is the likely performance bottleneck or hotspot using only a visual inspection of the rendered scene. And although you might be able to determine what may be the limiting factor, it is improbable that you can always be accurate without using tools, reviewing the source code, or following a process. Using intuition to determine where bottlenecks and hotspots lie may work sometimes, but not always.