Navigation |
PC Graphics System OverviewSubmitted by epreisz on Sun, 02/11/2007 - 07:49.
The foundation of graphics optimization is an understanding of the hardware. On a fixed platform, like the Xbox360 or the PS3, exploiting the hardware is the best way to optimize an application. On the PC however, there are so many configurations that trying to target any single configuration may lead to a slow game on the majority of systems. There are many systems that we must consider when writing a video game. Of all our resources, we will mostly be observing the CPU, GPU, AGP/PCIe, Memory, and Video Memory. Hard drive and/or DVD and networking hardware also play key roles in certain game genres. CPU The CPU is a major focus in video game applications. I have been to many presentations by ATI and NVidia and a common theme of optimization discussions concern CPU bound video games. When your video game is CPU limited, it doesn’t matter how good your graphics card is, your game will only see an improvement in framerate if you buy a faster CPU. Hardcore gamers will likely be a target market of your video game, don’t waste their hard earned money. When we discuss the CPU, we will often include a discussion of how memory operates. A CPU can be slow because we have too much math or because of too much loading and storing of system memory. A CPU can be slow because incorrect usage of an API or driver. Your graphics drivers operate on your CPU, and without proper tools, you may never realize that they are causing a performance issue. It is very common issue to have large hotspots in your drivers if you are using the graphics API incorrectly. Since you didn’t write the API (I’m guessing you didn’t write OpenGL or DirectX) you must understand the rules of the API. Documentation does not always expose these rules well and merely seeing your results on the screen doesn’t mean you have an optimized scene. Your CPU does calculations on integer and floating points. The key to reducing performance issues is to ensure that you are doing as much math as possible every clock tick. The architecture of the P4 cpu allows for parallel processing of floating point numbers, integer numbers, and memory read/write operations. Tricks, such as reducing data dependencies can help increase the performance of calculations. Your CPU also does the work required to maintain your operating system while your game is running. Although the OS is rarely a bottleneck, it does happen, and understanding this concept will help you during the detection phase of optimization. GPU The GPU should be the heart of your vertex and pixel processing. There are some hoops to jump through to make sure that happens, but it’s your goal. The CPU will create the triangles, store them in system memory, and pass them to the graphics card when necessary. The major stages of the GPU include vertex transfer, vertex fetching, vertex processing, rasterizing, pixel processing, texture processing and framebuffer operations. The GPU stages operate in parallel, meaning they can process more than one element at a time, but a triangle will pass from one end of the pipeline to the other with the final result being a pixel on the screen. Since the stages represent a parallelized pipeline, the speed of your graphics pipeline will run at the speed of the slowest single stage. Bus There are three bus types that are used for moving data from the CPU and Memory to the GPU. Originally, the PCI bus was the only way to move data. PCI cards are pretty much no longer in use. Because the PCI bus was such a bottleneck, motherboard and graphics card vendors added the AGP bus. AGP stands for accelerated graphics port, and since its only job was to move data to the graphics card, they could optimize it for that purpose. There are several versions of AGP, each with faster rates. They are: AGP 2x, AGP 4x and AGP 8x. When the computer uses the AGP, it blocks off a section of memory for exclusive use of the AGP bus. We can set the size of this memory by setting the AGP aperture within the bios. Although there are many systems using AGP 4x and 8X their days are limited. PCI express, or PCIe, offers a significant improvement over AGP. First off, PCIe is many times faster than AGP. Also, PCIe is fast in both uploading and downloading data, whereas AGP is only fast for uploading. Sometimes you will notice a performance issue related to the bus when playing a game. If you position your camera in a single location and spin on one axis you may notice a hitch. That hitching is a result of the flood of vertex buffers and textures that are streaming across the bus. There are several techniques to reduce this limitation. Memory If your CPU limitation isn’t algorithmic computation, it’s possible that you are performance limited by memory. Everyone knows that ram memory is faster than hard drive memory, but when we start observing memory from the standpoint of optimization, memory is slow. Because reading and writing to memory is slow, computer architects created memory caches. Caches are small, fast, expensive, pieces of memory. When used correctly, a cache can give you considerable performance boot. When used incorrectly, an application would probably run faster without a cache. Luckily, the rules for optimizing for maximum cache efficiency are simple. Writing your code to be cache friendly is not always so simple. There are several layers of cache, starting with L1, L2, and an optional L3. When a CPU loads memory it looks in the L1. If the data exists, the load executes quickly. If the data isn’t located in L1, it will need to search L2 cache. If the address does not exist in L2 cache it will need to either search in the optional L3 cache or grab the data from system memory. However, to get the data back through to the CPU, it will need to travel back through the cache system. The CPU accesses memory in cache lines, and when you grab one piece of memory you perform a cache line fill and load other nearby pieces of memory. Also, when you are using memory access in predictable pattern, the hardware can guess what memory you are about to use and have it waiting in cache before you attempt to read it. Therefore, keeping your memory access sequential can be much faster than grabbing it at random intervals across memory. When your CPU makes a guess about what cache to fill we call that an automatic hardware prefetch. Sometimes the solution to increasing performance on a CPU is to introduce multi-threading. If you application is limited by memory access, you may cause more cache misses as two or more threads fight for the finite space of your caches. Multi-threading is unlikely to solve a performance problem caused by poor use of memory. In fact, multi-threading will be likely to make the problem worse. Video Memory Your graphics card has its own memory system. There is not much you can do to directly increase it’s performance, but improper use of the API can cause more transfers across the bus to occur. Graphics resources all share the same memory. Textures, vertex buffers, stencil buffer, depth buffers, color buffers, and render targets, all share this memory space. Increasing the amount of Full-Scene anti-aliasing directly reduces your available memory. |
User login |