A look at Direct3D 12 – Part 1

Need for a new API

Direct3D, as any 3D API on PC, has two main goals:

  1. Provide a low overhead graphics API
  2. Provide a single API that works on different hardware

With APIs and GPUs becoming more and more complex, it has become difficult to achieve these goals. D3D’s abstraction layer (HAL) requires extra work from the driver, and leads to higher CPU overhead compared to fixed hardware platforms like consoles.

Performance on PC is not optimal. In fact, it’s very likely you’ll be CPU bound if you naïvely port your console game to PC.

We need a console level efficiency API: more CPU efficiency, and better CPU parallelism. The later is not achieved really well by D3D11 and the fact is most of the job will be done by one core anyway.

D3D12 intend to be a low CPU overhead API, and make it more efficient to

  • generate rendering commands
  • reuse rendering commands
  • efficiently generate commands amongst multiple threads.


Pipeline State Object : PSO

API calls can be costly. Each call introduce some CPU and Driver overhead.
D3D10 reduced CPU overhead over D3D9 by introducing Render State Objects and allowing the application to setup a set of related state values in one single API call.

DX9 Style, 1 call sets 1 state value


DX10/11 Style, 1 call sets all blend state values

float blendFactors[] = {0.0f, 0.0f, 0.0f, 0.0f};
Device->OMSetBlendState(BlendStateObject, blendFactors, 0xffffffff);

In D3D11, states are usually recorded into a set of commands, and resolved by the driver at Draw/Dispatch time into a set of GPU commands. We call this hardware mismatch overhead.

State Overhead

If we take a look at the diagram above [1], we can see GPU states (on the right) depends on multiple pipeline states (on the left). At draw time, the driver has to check all these pipeline states in order to generate a set of GPU commands that reflects states set by the app.

Engineers at Microsoft made the observation that in a typical modern game, there are around 200 to 400 complete pipeline states per frame. What if we let the app create them and switch from one to another when needed ? Again, this is the same idea behind the move from D3D9 single state values to D3D10 render state objects, one step further.

We come up with this new design, a single Pipeline State Object [1]. Pipeline State Optimized

D3D12 replaces Render State Objects by grouping them together into a Pipeline State Object (PSO).
To keep the number of unique PSOs low, some states that tend to change very frequently (viewport, scissor), are kept out of PSO States and named Non PSO States.

Pipeline State Objects include all set shaders, and a signifcant amount of the state objects. The only way to change one of the states in a PSO is to set a new PSO.
With such a design, the driver knows exactly how to program the hardware for a given PSO, and can preprocess the GPU commands to setup HW States.

Stay tuned for part 2..


[1] Max McMullen, “Direct3D 12 API Preview”, BUILD 2014.

This entry was posted in API, D3D12, GPU. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s