CPU

Submitted by epreisz on Sun, 02/11/2007 - 08:37.

Understanding the basics of a CPU will help you give you insight into performing micro optimizations in your application code. For our examples, we will focus on the Intel P4 architecture. Although no processor is the same, understanding the basics of one processor will help us in the discussion of optimization for the CPU.

Three Stage Operations

The P4 runs through 3 stages, the front-end, execution, and the back-end. The front-end decodes instructions and turns them into micro-ops. The execution stage contains an instruction pool where the micro-ops are stored for execution. After the micro-ops execute, they are retired in the retirement stage where the results of the micro-ops are stored in memory.

Front End

The front-end stage main task is to fetch instructions from memory, decode them into micro-ops, and predict branches. The front-end stage runs in-order, which means that the stage processes it’s data in the same order it is written by the your compiler.

Fetching and decoding instructions can be a slow process and on the P4 fetching and decoding performance is extremely important. The biggest boost in performance on the P4 occurs in the execution stage; however, if the front-end stage cannot supply the execution stage with enough micro-ops, then the performance increase is not effective.

In order to increase performance the front-end stage contains a cache. Instead of decoding an operation twice, the front-end will first check its cache to see if the it has recently decoded the operation.

Something interesting occurs at this stage. Programmers are familiar with if-then-else statements. An if-then-else statement is one type of conditional branch. When the front-end stage encounters a conditional branch, it guesses what the answer will be and sends those encoded micro-ops to the execution stage. What happens if it guesses incorrectly? We will answer that question in a few more paragraphs.

But first, let’s talk a bit more about the guess. Luckily, the guess is not random. BTBs, otherwise known as branch target buffers, track the if-then-else to determine what the likelihood is for that conditional. Branch prediction is surprisingly accurate.

Execution Stage

The execution stage is the sweet spot for fast execution on the P4. Unlike the front-end stage, the execution stage runs out of order. That means this stage processes its operations in a different order than the order created by the compiler.

The execution stage achieves its performance by operating in parallel. That’s right, even your single processor can perform multi-processing. The execution stage contains an instruction pool where micro-ops buffer in order to perform parallel execution. The dispatch unit selects micro-ops from the instruction pool so that the execution unit can perform as many operations at once. Let’s take a more detailed look into the execution unit.

Getting overly specific about the execution unit would probably not be useful to the average game optimization guru, so let’s look at a high-level view of the execution unit. The execution unit, when simplified for this discussion contains six execution units. Two of the units are for integer operations, two are for floating-point operations, and two are for memory operations. Of the two memory operations, one is for loading, and one is for storing.

The dispatch unit acts like we do when trying to optimize our applications. If it were to schedule only one unit at a time, our utilization of the execution units would only be 16%. Like application developers like us, the dispatch unit will do it’s best to keep utilization near 100%. Since the dispatch unit is choosing the micro-ops from a pool that is out-of-order, it is more likely to achieve this goal.

By writing our programs with this architecture in mind, we can ensure that the instruction pool is able to accommodate the dispatch unit. Your compiler, if at least recently modern, will be aware of this architecture and will do it’s best to try and create the environment as well. But relying on the compiler is not always a guarantee of performance.

Back End

The back-end stage retires micro-operations from the instruction pool, updates the machine state, updates the BTBs, and throws exceptions. The back-end stage executes in-order returning the flow of operation processing to the order originally defined by the compiler that created the application.

The back-end stage is where we notify the pipe-line of anything that didn’t go as expected. If the front-end stage incorrectly guessed the result of a branch, the back-end stage notifies the front-end of the mistake and the processing of the correct branch begins. The back-end stage updates the BTBs to reflect the new statistics for future branch predictions. Sometimes, and exception can occur.

Exceptions occur for many reasons, but in general and exception occurred because you, via your compiled code, asked the processor to perform and operation in a way it didn’t like. One example of an exception is trying to divide a number by zero. Dividing by zero is an impossible task and even though processors are smart, they can’t divide by zero.

Exceptions are signs of something incorrect happening. Creating exceptions are a cause for concern and if you find exceptions occurring, you should work to remove them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Buy Valium

Viagra

Cheap Ambien

Cheap Phentermine

Buy Valium

Buy Ambien

Replica watches

Ativan

Cialis

Buy Viagra

Buy Valium

Buy Phentermine

Buy Valium

Cialis

Phentermine

Ambien

Cheap Ambien

Cheapest Cialis

Buy Phentermine

Buy Xanax

Buy Viagra

Buy Cialis Online

agaKfTC

OXCticf agaKfTC