Time 🟡¶

With webgpu.hpp

Resulting code: step095-timestamp-queries

Vanilla webgpu.h

Resulting code: step095-timestamp-queries-vanilla

We start by measuring compute time, which is often the most valuable resource.

Warning

As of September 6, 2023, wgpu-native does not support timestamp queries yet. I suggest you follow this chapter with Dawn only for now.

Asynchronicity¶

Importantly, measuring GPU time is quite different from measuring CPU time, because as you may recall we only interact with the GPU through remote calls issued in our CPU code (C++).

In an imperative CPU code, measuring time looks like this:

# Pseudocode of a simple CPU benchmarking
start_time = get_current_time()
do_on_cpu(something)
end_time = get_current_time()
ellapsed = end_time - start_time

But when doing this on the GPU, we only submit operations that run on a different timeline:

# Pseudocode of a **wrong** GPU benchmarking
start_time = get_current_time()
submit_to_do_on_gpu(something)
end_time = get_current_time()
ellapsed = end_time - start_time # wrong!

The “something” may not even have started at this point. What we measure is the time it takes to submit instructions, not to actually execute them!

Timestamp Queries¶

We must instruct the GPU to run some equivalent of get_current_time() on its own timeline. The result of this operation is stored in a dedicated object called a timestamp query.

# Pseudocode of a correct GPU benchmarking
start_timestamp_query = create_timestamp_query()
end_timestamp_query = create_timestamp_query()
submit_to_do_on_gpu(write_current_time, start_timestamp_query)
submit_to_do_on_gpu(something)
submit_to_do_on_gpu(write_current_time, end_timestamp_query)

We must then fetch the timestamp values back to the CPU, through a mapped buffer like we see in Playing with buffers.

🫡 Okay, got it, so what about actual C++ code?

Whether they measure timestamps or other things, GPU queries are stored in a QuerySet. We typically store both the start and end time in the same set:

With webgpu.hpp

// Create timestamp queries
QuerySetDescriptor querySetDesc;
querySetDesc.type = QueryType::Timestamp;
querySetDesc.count = 2; // start and end
QuerySet timestampQueries = device.createQuerySet(querySetDesc);

Vanilla webgpu.h

// Create timestamp queries
WGPUQuerySetDescriptor querySetDesc;
querySetDesc.nextInChain = nullptr;
querySetDesc.type = WGPUQueryType_Timestamp;
querySetDesc.count = 2; // start and end
WGPUQuerySet timestampQueries = wgpuDeviceCreateQuerySet(device, &querySetDesc);

Note

I base this example on step095, from chapter Simple GUI

Note

I created a initBenchmark() method (called in onInit) to initialize benchmark-related objects like the query set. I also created a terminateBenchmark() to release these resources.

However, if you try to add the code block above to your application, you will face an error:

Device error: (Dawn) Timestamp queries are disallowed because they may expose precise timing information. (wgpu-native) Features(TIMESTAMP_QUERY) are required but not enabled on the device.

Enabling Timestamp Feature¶

Dawn toggles¶

Let us start with Dawn: for privacy reasons, Dawn disables timing information. This is relevant when running on the Web, but not in our native application context. Fortunately, this safeguard can easily be disabled.

Dawn has a list of so-called “toggles” that can be turned on or off at the scale of the whole WebGPU instance: the list is available in Toggles.cpp.

To enable toggles, we use the Dawn-specific DawnTogglesDescriptor extension, which can be chained to the instance descriptor:

With webgpu.hpp

// At the very beginning of onInit()
InstanceDescriptor instanceDesc;
#ifdef WEBGPU_BACKEND_DAWN
// Dawn-specific extension to enable/disable toggles
DawnTogglesDescriptor dawnToggles;
// [...]
instanceDesc.nextInChain = &dawnToggles.chain;
#endif
m_instance = createInstance(instanceDesc);

Vanilla webgpu.h

// At the very beginning of onInit()
WGPUInstanceDescriptor instanceDesc;
#ifdef WEBGPU_BACKEND_DAWN
// Dawn-specific extension to enable/disable toggles
WGPUDawnTogglesDescriptor dawnToggles;
// [...]
instanceDesc.nextInChain = &dawnToggles.chain;
#endif
m_instance = wgpuCreateInstance(&instanceDesc);

We then specify that we want to enable the allow_unsafe_apis feature:

With webgpu.hpp

#ifdef WEBGPU_BACKEND_DAWN
DawnTogglesDescriptor dawnToggles;
dawnToggles.chain.next = nullptr;
dawnToggles.chain.sType = SType::DawnTogglesDescriptor;

std::vector<const char*> enabledToggles = {
    "allow_unsafe_apis",
};
dawnToggles.enabledToggles = enabledToggles.data();
dawnToggles.enabledTogglesCount = enabledToggles.size();
dawnToggles.disabledTogglesCount = 0;

instanceDesc.nextInChain = &dawnToggles.chain;
#endif

Vanilla webgpu.h

#ifdef WEBGPU_BACKEND_DAWN
WGPUDawnTogglesDescriptor dawnToggles;
dawnToggles.chain.next = nullptr;
dawnToggles.chain.sType = WGPUSType_DawnTogglesDescriptor;

std::vector<const char*> enabledToggles = {
    "allow_unsafe_apis",
};
dawnToggles.enabledToggles = enabledToggles.data();
dawnToggles.enabledTogglesCount = enabledToggles.size();
dawnToggles.disabledTogglesCount = 0;

instanceDesc.nextInChain = &dawnToggles.chain;
#endif

Note

The toggles descriptor can also be used as an extension of the adapter or device request options. In that case, device toggles supersede adapter toggles, which supersede instance toggles.

The error message we get is now slightly different:

Device error: (Dawn) Timestamp query set created without the feature being enabled.

This is in substance the same error than the one reported by wgpu-native above, we treat both in the next section.

Feature request¶

When creating our WebGPU device, we mentioned already that we can set up specific limits. But we can also request specific features from the WGPUFeatureName enum. In particular, we need to enable FeatureName::TimestampQuery.

With webgpu.hpp

std::vector<FeatureName> requiredFeatures = {
    FeatureName::TimestampQuery,
};
deviceDesc.requiredFeatures = (const WGPUFeatureName*)requiredFeatures.data();
deviceDesc.requiredFeaturesCount = (uint32_t)requiredFeatures.size();

Vanilla webgpu.h

std::vector<WGPUFeatureName> requiredFeatures = {
    WGPUFeatureName_TimestampQuery,
};
deviceDesc.requiredFeatures = requiredFeatures.data();
deviceDesc.requiredFeaturesCount = (uint32_t)requiredFeatures.size();

The error messages should now be fixed! You may also want to check that the adapter, and then the device, support this feature:

With webgpu.hpp

std::vector<FeatureName> requiredFeatures;
if (m_adapter.hasFeature(FeatureName::TimestampQuery)) {
    requiredFeatures.push_back(FeatureName::TimestampQuery);
}
// [...] Create device
if (!m_device.hasFeature(FeatureName::TimestampQuery)) {
    std::cout << "Timestamp queries are not supported!" << std::endl;
}

Vanilla webgpu.h

std::vector<WGPUFeatureName> requiredFeatures;
if (wgpuAdapterHasFeature(m_adapter, WGPUFeatureName_TimestampQuery)) {
    requiredFeatures.push_back(WGPUFeatureName_TimestampQuery);
}
// [...] Create device
if (!wgpuDeviceHasFeature(m_device, WGPUFeatureName_TimestampQuery)) {
    std::cout << "Timestamp queries are not supported!" << std::endl;
}

Note

Timestamp queries are specified as an explicit feature because some devices/adapters may not support them.

Writing timestamps¶

There are different ways of writing timestamp into queries. The closest one to our pseudocode above is commandEncoder.writeTimestamp(), which writes the GPU-side time into a query whenever the command is executed in the GPU timeline.

If you want more specifically to measure the time taken by a render or compute pass, you can also pass timestamp queries to the passes descriptor:

With webgpu.hpp

// In Application::onFrame()
std::vector<RenderPassTimestampWrite> timestampWrites(2);
timestampWrites[0].location = RenderPassTimestampLocation::Beginning;
timestampWrites[0].querySet = m_timestampQueries;
timestampWrites[0].queryIndex = 0; // first query = start time
timestampWrites[1].location = RenderPassTimestampLocation::End;
timestampWrites[1].querySet = m_timestampQueries;
timestampWrites[1].queryIndex = 1; // second query = end time

renderPassDesc.timestampWriteCount = (uint32_t)timestampWrites.size();
renderPassDesc.timestampWrites = timestampWrites.data();
RenderPassEncoder renderPass = encoder.beginRenderPass(renderPassDesc);

Vanilla webgpu.h

// In Application::onFrame()
std::vector<WGPURenderPassTimestampWrite> timestampWrites(2);
timestampWrites[0].location = WGPURenderPassTimestampLocation_Beginning;
timestampWrites[0].querySet = m_timestampQueries;
timestampWrites[0].queryIndex = 0; // first query = start time
timestampWrites[1].location = WGPURenderPassTimestampLocation_End;
timestampWrites[1].querySet = m_timestampQueries;
timestampWrites[1].queryIndex = 1; // second query = end time

renderPassDesc.timestampWriteCount = (uint32_t)timestampWrites.size();
renderPassDesc.timestampWrites = timestampWrites.data();
WGPURenderPassEncoder renderPass = wgpuCommandEncoderBeginRenderPass(encoder, &renderPassDesc);

Note

I initialize the query set only once and store it into an attribute m_timestampQueries of the Application class.

Using timestamp values¶

Display¶

We can finally manipulate timestamp values on the CPU! At first we can display them in the terminal: in the map callback, when mapping was successful:

// Use timestampData
uint64_t begin = timestampData[0];
uint64_t end = timestampData[1];
uint64_t nanoseconds = (end - begin);
float milliseconds = (float)nanoseconds * 1e-6;
std::cout << "Render pass took " << milliseconds << "ms" << std::endl;

You get in the end a little less than 1 log line per frame:

Render pass took 0.484128ms
Render pass took 0.538624ms
Render pass took 0.49056ms
Render pass took 0.490912ms
Render pass took 0.504864ms
Render pass took 0.491872ms
Render pass took 0.487808ms
Render pass took 0.587872ms
Render pass took 0.493504ms
Render pass took 0.498112ms
Render pass took 0.547136ms
Render pass took 0.452928ms

Statistics¶

Usually, I am not interested in one line per frame, but rather in showing in the UI the mean and standard deviation of my measure. I use for this my TinyTimer.h:

// In Application.h
#include "TinyTimer.h"

class Application {
    // [...]
    TinyTimer::PerformanceCounter m_perf;
};

// In fetch timestamp callback
m_perf.add_sample(milliseconds * 1e-3);

// In Application::updateGui()
ImGui::Text("Application average [...]", /* [...] */);
ImGui::Text("Render pass duration on GPU: %s", m_perf.summary().c_str());
ImGui::End();

../../_images/render-pass-timer.png — Our GPU timer displayed in the application.¶

Note

In this example, we can see that the render pass takes much less time than a frame. This is because the limiting factor here is the VSync that caps the number of frames per second to 60 (the maximum refresh rate of my display).

Important

When reporting and comparing benchmark values, and statistics in general, it is important to look at the standard deviation, but also at the number of samples on which this standard value is estimated.

Conclusion¶

You are now able to use precise GPU-side timers, which is essential to evaluate the performances of your application and identify the bottlenecks. Remember that:

GPU timers don’t live on the same timeline as CPU timers.
You need to create timestamp queries, then write to them, resolve them, and finally fetch them back asynchronously.
You must pay attention not to fetch before the resolve/copy operations are not only encoded by submitted to the GPU.

I would suggest to create a little class responsible solely for managing the timers in your application, so that the boilerplate is isolated to your application’s logic.

Note

If you want to measure performances for events that do not happen at each frame, you should keep for each such counter a boolean telling whether the counter has been updated, so that you add_sample upon fetch callback only when the timestamps were actually updated.

With webgpu.hpp

Resulting code: step095-timestamp-queries

Vanilla webgpu.h

Resulting code: step095-timestamp-queries-vanilla

Time 🟡¶

Asynchronicity¶

Timestamp Queries¶

Enabling Timestamp Feature¶

Dawn toggles¶

Feature request¶

Writing timestamps¶

Reading timestamps¶

Resolving timestamps¶

Fetching timestamps¶

Using timestamp values¶

Display¶

Statistics¶

Conclusion¶