Navigation |
Optimizing CPU ProcessingSubmitted by epreisz on Fri, 02/23/2007 - 02:39.
Our modern CPUs have an amazing ability to process both arithmetic and logic. Games, stress our CPU with physics, culling, and AI and other tasks. Sometimes, our game’s biggest limit in increasing performance is the processing of arithmetic and logic. When are we Application Processing bound? In order to figure out if we are processor bound, we must first determine if we are CPU limited or GPU limited. We can do this by measuring GPU utilization using either PIX, nVPerfHud, or gDEBugger. If our GPU utilization is less than 90%, then it is likely that we are CPU bound. When GPU utilization is less than 90%, it suggests that our GPU is spending 10% or more running an idle process. Why is there idle time? Because our CPU isn’t able to generate commands fast enough to satisfy the power of the GPU. Smart Design & Flexibility I have heard the phrase that suggests the fastest triangle you can draw, is the one you don’t draw; this concept is also true for CPU processing. Consider the following example. Solutions Application Processing Bound When your game’s performance is bound by the math or logic that you wrote, then your game is application processing bound. There are many different system, application, and micro level optimizations that will increase the performance of applications bound by their processing. Because we have access to the code of games that are application processing bound, we have the opportunity to rewrite the code directly; however, a bad habit of many game developers is to optimize code that we think is slow, not code that we have determined to be slow with a proper detection phase. System Level Solutions When games are limited by application processing we have two system level solutions. One solution, is a technique that greatly increases the difficult of coding – that solution is multi-threading. It is important to note that multi-threading is a solution for instruction and logic processing, but not for applications that are memory bound (at least with the current memory architecture). We will cover multi-threading in a later reading. Another solution involves balancing. Balancing is when we move code from over-utilized resources to under-utilized resources. It is common for the CPU to be over-utilized, especially with OpenGL and DX9. By moving work from the CPU to the GPU can free your CPU for doing other work that is not well suited for the GPU. Since GPUs are scaling performance faster than CPUs, then the code I write today for the GPU will gain better performance in 18 months than if it were running on the CPU. Animations and particle systems are some common code modules that ported from the CPU to the GPU. Application Level Solutions There are hundreds of application level solutions but those are outside the scope of these readings. Why? Because, those resources are available all over the internet. Application level solutions that reduce application processing are what game engines are all about. System such quad trees, oct trees, bsp trees, are all examples of application level solutions to games that are limited by application processing. There are hundreds of application level solutions but those are outside the scope of these readings. Why? Because, those resources are available all over the internet. Application level solutions that reduce application processing are what game engines are all about. System such quad trees, oct trees, bsp trees, are all examples of application level solutions to games that are limited by application processing. One concept overlooked in game engine design is memoitzation. Memoitzation, which sounds similar to memorization, is when a function or algorithm saves its result in anticipation of calling that function again with the same parameters. Subsequent calls of the functions perform a look-up to see if the answer has already been stored. Memoitzation is essentially a dynamic look-up table. Lookup tables, described in detail in the next section, are solutions that trade memory for performance; however, with modern cache designs, using too much memory, or accessing it in a non-cache friendly way, may cause cache misses. Trading instruction processing for cache misses is a fools errand. Since our game runs in a loop, there are many opportunities to use memoitzation. There are many aspects of our game that may not change every frame. For example, a vehicle driving on the terrain is not likely to teleport across terrain tiles. Therefore, we should consider first assuming what leaf node of our culling system the vehicle is in. Only rarely will a vehicle move out of one node and into another. Micro Level Solutions. Micro level application processing techniques are the most recognized and discussed techniques in all of optimization. Messages boards and mailing lists are full of people arguing over these techniques. Micro optimizations are usually easier to add to an engine than system or application optimizations. There are some downsides to micro optimizations, and in my opinion, a worthwhile system or application optimization is more valuable than many micro optimizations. One of the problems with micro optimizations deals with the amount of PC configurations that exist. The micro optimization that increases performance on one configuration may not increase performance on another. I have heard a programmer say, “it doesn’t crash on my machine”. The same is true of some micro optimizations. You may hear someone say, “it runs fast on my machine”. Micro optimizations are not as preferred as system and application optimizations, but they do have their place. Below are some common strategies to increasing instruction processing with micro optimizations. Replacing Strings with Integers One mistake many entry level programmers sometimes do is to use a string when they should be using an integer. Comparing a string is much slower than comparing an integer. When you compare a string, you must compare many values, each character, to determine the difference between two strings. Since a string requires more memory, it is also going to require more work. A string is a great interface for a human, not for a computer. We need strings for debugging and presenting to a user (like we do when we put the player’s name above their head). Microsoft’s FX Framework provides a useful example of mapping a string to an integer. When referencing a variable in a shader, you can do so by referring to the variable as a string, or as a handle. A handle, is simply a integer number that is mapped to the shader’s contestant address. The most efficient way to set a variable using the Microsoft FX Framework is to first use the string to generate a handle, then reference use that handle for the remaining lifetime of your application. By doing this, you are replacing a string compare with an integer. This method is also important in network programming. Instead of sending the string through the network every time a client needs it, register the string on the server with an integer. Then, when you send that string to the client, do so, but also send along the global integer id for that string. From that point on, you only need to send the id across the network and resolve that value on the client. In-lining Functions Coming soon… Look-up Tables Coming soon… Assembly Coming soon… Loop Unrolling Coming soon… Performance of Operations Latency and Throughput Coming soon… Memory Alignment Coming soon… SSE, SSE2, SSE3, SSE4 Instructions Coming soon… Write Combined Buffers Coming soon… |
User login |
Recent comments
8 hours 7 min ago
8 hours 13 min ago
8 hours 13 min ago
8 hours 15 min ago
8 hours 15 min ago
8 hours 15 min ago
8 hours 19 min ago
8 hours 21 min ago
8 hours 24 min ago
8 hours 25 min ago