Measuring performance

written 21 Dec 2008
TL;DR: Dates are terrible; use timestamps. Don't expect good resolution from Safari/Opera/IE on Windows.


A common testing method in benchmarks, including our own Talos tests, is to measure and compare performance by recording how long it takes to run a given set of tests. In JavaScript, this is often done by using to record the start time and the end time. The duration is computed by subtracting the start from the end. This seemingly simple method has hidden complexities and imposes undue requirements upon the implementation.


Let's take a step back to consider the assumptions this testing method makes upon For clarity and simplicity, I assume that time travel is impossible and relativity doesn't come in to play. Thus if we were to have some function called continuously that represents the current time as a number, it would be continuous and increasing with a constant slope. (Ok, this is starting to fall into the realm of philosophy and physics). These are the two properties we want in our function that records the start and end times for our benchmark.


Dates do not have these properties. In the notation used by humans for dates, the representations are ambiguous and finding the duration between two dates can be complicated in that notation. Even with the notation used internally by the computer, issues still arise.


In the United States on days where daylight savings changes, 2am occurs twice or never. This breaks the continuous and increasing properties. Switching to UTC avoids this, but having to convert from UTC into your local timezone is burdensome. We can have the computer automate this, sure, but local times are still ambiguous if that's how we look at them. It's hard to tell duration when you have to factor DST into account. This is furthermore complicated by the fact that the laws governing DST change frequently and vary from country to country. Converting from UTC thus depends on having a system that has up to date DST information and of course a correct implementation. So if you are looking at performance results on an out of date system, you may see different durations than on an up to date system. How many Linux/UNIX/Java/PHP users have up to date zoneinfo? Microsoft pushes out DST updates via Windows Update and Apple presumably does the same via their Software Update. Older systems like Windows 98, OS 10.2 and most Linux distributions from more than a few years ago no longer receive updates. So when doing performance tests (or checkins around the DST change), dates can be problematic because you cannot assume that there are 24 hours in a day. Thanks to leap seconds, you cannot even assume 60 seconds per minute. I'm not sure that any operating system supports them though any box using Unix time does not.

This matters why?

But computers don't use human notation for the dates. Windows, OSX, and Linux all represent dates as an offset from a epoch so why bother concerning ourselves with the human representation? User interface matters. Start/stop dates for the tinderbox are important to get right to track regressions and connect them to checkins (though with hg, figuring out the changeset is much easier). Like DST, timezones are problematic for date representation, but so long as they are clearly marked and the user is aware of the timezone, it is not too problematic. Still, many apps are not aware of timezone changes (Thunderbird included sadly) so this small burden remains on the user to be aware of which timezone apps are using.

Still not enough

Supposing that the DST problems are overcome, there remains the problem of clock drift. Nearly every system routinely synchronizes time via NTP with some master time servers. This results in small and usually unnoticeable corrections to the system time which break our desired properties. One might argue that such offsets are rare and small and thus hardly noticeable, but they are on the order of milliseconds to 100s of milliseconds which means that they can come up and make a difference. Turning off NTP sync is undesirable since clock sync between computers is important for coordinating distributed tasks and having a reliable clock to look at to find out if you're late to a meeting or not.


Which brings up the next concern: time resolution. Put yourself in an operating system architect's shoes for a moment. You're designing an API and internal structures to keep track of the time. Let's look at some use cases using Windows as an example. 1) The taskbar clock - this updates every minute or so 2) The analog clock in Vista - this updates once a second to draw the second hand 3) A fancy analog clock - this updates at a continuous 60 frames per second to draw a smoothly animating second hand I cannot think of any other use cases for needing knowing the whole date at a rate faster than 60 fps (~16 ms). Coincidently (or maybe not), the Windows system clock updates roughly every 15.6ms on most NT based systems. And given the use cases discussed above it is a reasonable rate. However, once we start using for performance tests, 15.6ms is not fast enough. Some of the SunSpider benchmarks from Apple run in just a few ms. Either these benchmarks need to be made longer or the resolution of needs to increase. There are drawbacks to both. Increasing the test length means that the Talos boxes take longer to run and so regressions take longer to find and we need more machines to keep up with the rate of checkins. Also, sitting around for hours waiting to find out if your changesets regressed performance is no fun. Looking at the second option, it's not immediately clear how to increase the resolution on Windows' GetSystemTimeAsFileTime API. This was my challenge in bug 363258. OSX and Linux have a 1ms resolution via gettimeofday() so this is a Windows-only issue. Just as Windows/OSX/UNIX store dates as an offset from an epoch, so can we. There are a few methods for measuring durations in Windows: GetTickCount, timeGetTime and QueryPerformanceCounter. There are drawbacks to all of them.


GetTickCount returns the number of milliseconds since the NT kernel booted. A tick is the timer interrupt from the APIC on x86 systems (and because Mozilla doesn't run on the other platforms supported by NT I won't consider them). At each tick, the system time is updated by a certain amount as specified by the Get/SetSystemTimeAdjustment or "the system's internal adjustment mechanisms." This computed time since boot is stored into a shared readonly page mapped into each process so the API call is fast since it just reads from that page, but resolution is still limited by the kernel tick rate. Furthermore, the value returned a DWORD (32bit int) so every 49.7 days, the value will overflow. There is a GetTickCount64 which does not have that problem but it is Vista only.


timeGetTime() suffers from the same problems as GetTickCount by default. By calling timeBeginPeriod(1), the kernel tick rate is increased to 1ms sometime after the call returns. In my brief testing with this function, I found it took effect immediately on Vista and one or two ticks on XP. This also increases the resolution of GetTickCount. Ignoring the overflow problem, why can't we just call timeBeginPeriod and be done with it? Well, increasing the kernel tick rate increases the system load; more work is being done just to maintain a higher resolution clock. This would mean that leaving Firefox running while your laptop is on battery could decrease the battery's performance by up to 25% (see this Intel article or this Microsoft presentation) There is a corresponding timeEndPeriod call that can restore the old tick rate, but that would require us to know when the last high-performance call to is made.

QueryPerformanceCounter and RDTSC

QueryPerformanceCounter returns a 64 bit time stamp so it does not suffer from the overflow problem. This timestamp increases at the rate returned by QueryPerformanceFrequency. The API states that this rate does not change while the system is running. That's not quite true. On older systems, QPC is implemented using the rdtsc instruction for x86. This returns the number of clock cycles since the processor booted. It is very high resolution and very accurate on those systems if they are uniprocessor. Newer processors such as Intel's Centrino and Core processors dynamically change the clock speed of the processor in response to the workload and certain HALs (Hardware Abstraction Layer) for certain versions of Windows still use rdtsc to implement QPC. Needless to say, this makes it difficult to obtain accurate timings. Suppose we could force the processor into its highest speed; this can be done by the user fairly easily, especially if we're running a CPU intensive benchmark. Note that I stated rdtsc returns the number of cycles since the processor booted. On multiprocessor systems (including multi core), the cores don't boot at the same time. There is sometimes a noticeable difference between the two depending on your OS. Windows attempts to keep the CPU TSCs roughly in sync but the keyword there is roughly. Even in single threaded programs there is the issue of reading different time stamps when the thread switches cores. There is no way to perfectly tell which core you are on when the TSC is read because context switches can happen between any two instructions (in user space). On newer systems, there is a dedicated timer on the motherboard which QPC reads. This results in a system call and a read from the device so it is far more expensive than the TSC read but it is stable. To add to the challenge, neither the TSC nor hardware timer are reliable after suspend/hibernate.


So to summarize, we have GetTickCount/timeGetTime which suffer from resolution and rollover issues and we have QPC which has superb resolution but lousy consistency in some cases (though it is very fast then). How can we use these to generate a reasonably reliable high performance (multithreaded) implementation? Read the bug or the source or perhaps I'll write another blog post describing the implementation. If you were hoping for a clever and perfect solution, there is none right now. My point is that high resolution date calculation is tricky and sometimes unreliable to calculate the date with high precision on Windows. So let's revisit the duration calculation.

Duration revisited

duration = end - start where start and end are calculated using an offset from some epoch. Expanding the known implementations of gettimeofday() we get duration = (epoch + offset_end) - (epoch + offset_start) which reduces to duration = offset_end - offset_start where both offsets are measured in seconds. As you probably already knew, the current date doesn't matter, it's all relative. Calculating the full date to high precision just wastes time and effort. What we really care about is the offsets and it doesn't matter what epoch they're from so long as it's consistent for the lifetime of the process. Looking back on our choices in Windows, QPC is the only viable option since we don't know when needs to be high precision and when it doesn't. It would be much nicer to have a separate API like Timestamp.sample(). To use timeGetTime or GetTickCount, we could extend this to include Timestamp.begin() and Timestamp.end(). This would leave us with fairly fast implementations for and the Timestamp functions so that JS users that would like to do performance tests or animations could do so without sacrificing performance. This API is fairly trivial to implement for other systems since they can reuse their gettimeofday() functions for Timestamp.sample() or use higher performance timers (librt for example) the same as Windows does.

Moving forward

Of course, there should be some concerns with allowing a web page to alter the tick rate of the user's computer so this should be discussed and more thoroughly thought out before committing to an implementation. And it should be standardized. Doesn't seem right to have a Mozilla-only API (though for Talos, it wouldn't be the end of the world).

Points of interest

And for those who are curious about other browser's implementations, John Resig had a nice article comparing JS time resolution on Windows. SpiderMonkey/TraceMonkey implementation NSPR's PR_Now() NSPR's interval API Chrome's implementation seems to be here V8's implementation (look for Time::SetToCurrentTime()) Note that Chrome and V8 don't have the multithread-safe requirement that SpiderMonkey does. As for Safari, IE and Opera, I don't know but it looks like they just call GetSystemTimeAsFileTime based on John Resig's performance results.