During my discussion sessions at TechEd 2011 this year (VIR471-INT - Hyper-V FAQs, Tips, and Tricks), one of the first things we discussed was the topic of C-States. I've been recommending people disable C-States ever since we started seeing the technology on our systems (at both the desktop and server level), but it's taken some time for a lot of information to be publicly available outside of the hearsay realm. Over the past year or two, some information has finally begun to accumulate on the web, much of which now shows up in KB format from Microsoft. Before we go any further, let's first stop and discuss the first question:
So what are C-States?
In short, C-States are power saving states that your CPU can enter into to save electricity (and hopefully some money, if you pay for your own electricity).
Hey, that sounds great! I love money! Why would I disable them then?
In short, because while C-States sound great in theory, they don't always work as great in practice. On almost any system we've every deployed, both at the desktop and server level, it's just a matter of time before a system with C-States begins to manifest problems, ranging from performance issues to bug checks. Without going into too much more detail here, the gist of it is that once the processor starts to enter deeper states of sleep (like C3), it doesn't wake up as quickly as it should, and then things start to go wrong. For more information about C-States in general, you can check out this article from Hardware Secrets, but if you just want to know where the problems lie, you can skip that and read on in this post.
OK, you've got my attention. So tell me about what can go wrong.
The first major sign of something going wrong with C-States appeared right after Windows 2008 R2 shipped in September 2009, when lots of people started seeing their Hyper-V servers crash after enabling the Hyper-V role when C-States were enabled. Microsoft quickly released KB974598 - “You receive a "Stop 0x0000007E" error on the first restart after you enable Hyper-V on a Windows Server 2008 R2-based computer” for this particular issue.
However, within a couple days, a bigger issue began to emerge with early adopters, which was that Hyper-V systems seemed to randomly, intermittently, but somewhat regularly crash on any system with a Nehalem Processor. Microsoft responded with a patch for this particular issue by mid-October 2009: KB975530 - Stop error message on a Windows Server 2008 R2-based computer that has the Hyper-V role installed and that uses one or more Intel CPUs that are code-named Nehalem: "0x00000101 - CLOCK_WATCHDOG_TIMEOUT"
At this point, many had already decided to just leave C-States disabled, but for those that decided to keep them enabled with the patches, people still noticed intermittent performance issues, such as the two KBs below:
OK, OK, I get your point. So is this issue unique to Hyper-V?
C-States problems are definitely not limited to Hyper-V. This just happened to be a Hyper-V centered discussion. Do a couple web searches on C-States, and you'll see that people have reported performance problems with C-States using VMWare and OpenSolaris (search on C-States, and you'll find where they finally pointed their fingers at C-States being the root cause), and IP over Infiniband.
So... Now that you're armed with the links, you can make the call on your servers, but we're going to continue disabling C-States for the foreseeable future.
Good luck, and happy virtualizing.