girl fun: 17/02/08

Measuring the performance of an operating system is a tricky thing. At the same time, it's the right and necessary thing to do, because performance is one of many criteria important to customers. Part of the trick of measuring performance is to time testing execution with the product cycle such that the results are as meaningful as possible for customers; this helps them make a better decision by making use of the full array of available information. As one example, about a year ago we commissioned a firm called Principled Technologies to conduct a study comparing Windows XP SP2 to Windows Vista RTM. That study found the performance measures of the two operating systems were within the same range for many tasks that home and business users frequently perform under real-world conditions.

My point is that we waited to conduct these benchmarking tests until Windows Vista had reached the RTM milestone in the product cycle, as this allowed us to provide our customers the most meaningful data available at the time -- the data most likely to directly affect their decision to upgrade to Windows Vista. We do a whole range of performance tests at every stage of the OS development process, but, as a general rule, we avoid sharing benchmark tests of software that hasn't gone RTM (i.e., final code). This explains why we have not to date published any findings of benchmark tests (nor commissioned anyone to do so) on performance improvements brought about by Windows Vista SP1. Publishing benchmarks of the performance of Windows Vista SP1 now wouldn't be a worthwhile exercise for our customers, as the code is still in development and, to the degree that benchmarking tests are involved, remains a moving target.

Aside from that point, let me also emphasize that there are a variety of ways to benchmark the performance of a PC. Different techniques can yield different results. Some benchmark techniques simply test PC hardware performance by running a series of tasks at superhuman speed. Such tests tend to exaggerate small differences between test platforms and consequently are used less frequently nowadays, replaced in favor of benchmarks running tasks at human speeds with realistic waits and data entry. Benchmarks that run at superhuman speeds often deliver results that don't tell the whole story. In fact, we made deliberate choices during the development of Windows Vista to focus on real-world scenarios affecting user experience, rather than focusing on improvement of microsecond operations imperceptible to the user. In addition, in Windows many operations can require additional processing time for work that is done for reasons that benefit the customer; these can include security, reliability or application compatibility checks conducted when a program launches. These operations may add microseconds to an individual application's launch that under real usage isn't perceivable to the human eye. When thousands such operations are strung together through automation, those few microseconds can have a cumulative effect on the benchmark result, causing performance to appear much better or worse than expected.

I've included below a video we captured depicting a "benchmark test" running a window-open, window-close routine at accelerated speed. You can see that it isn't representative of real-world user behavior and hence isn't an accurate gauge of the actual end-user experience. Further, tests like these only measure a very small set of Windows capabilities and so aren't representative of the user's overall day-to-day experience of working with Windows and running applications.

Methods like those of Principled Technologies that actually approximate the experience of using the PC, taking an OS through the paces of completing actual tasks at the approximate pace a user might click through them, tend to provide results far more useful to our customers. The typical Windows customer generally wants to know how his/her actual computing experience will change (read: improve) with an upgrade. The Principled Technologies tests do that.

For what it's worth, I can personally attest that I prefer to get my work done on Windows Vista SP1 RC bits. I run Windows Vista RTM on two production machines and SP1 RC bits on two others; in fact, I'm writing this post on a machine with SP1 RC bits installed. As a part of our internal SP1 testing program, I know that we continue to develop and improve SP1 every day, in large part based on feedback and bug submissions from external an internal Beta-test program members. IMO, the perceived gains in performance between SP1 Beta and SP1 RC code are significant. As I said at the beginning, though, performance is only part of the story -- don't forget that SP1 also brings support for new types of hardware and several emerging standards, and further eases an IT administrator's deployment and management efforts.

But don't take my word alone for it. We'll broaden the testing pool of SP1 RC bits soon (very soon), so when I post that notice here on the blog, you'll be able to put Windows Vista SP1 RC through its paces yourself. I think you'll find the experience worthwhile and satisfying.

Posted by Nick White on Windows Vista Blog