Playing with buffers π’ΒΆ
Resulting code: step017-next
We have seen how to send instructions to the GPU, and we now see how to send data.
In this chapter:
We see how to allocate data buffers on the GPU.
We see our first useful GPU command to copy buffers.
Memory allocationΒΆ
Allocating memory on the GPU (in the VRAM) is slightly more complex than allocating memory on the CPU (in the RAM).
Recall about CPU-side allocationΒΆ
When we talk about (dynamic) memory allocation on CPU, we may think of something like this:
// Allocation in RAM
char* buffer = (char*)malloc(256); // C
char* buffer = new char[256]; // C++
auto buffer = std::vector<char>(256); // C++ using STL
The malloc
function, and everything that is built on top of is like the C++ new[]
operator or the default allocator from the standard template library (STL), is seen as a low-level black box that asks the OS for some contiguous memory buffer.
In most cases, the level of abstraction of these is enough, and we do not need control about where/how this memory is allocated.
It is nonetheless possible to go lower level and for instance implement a custom std::allocator
so that we can inform the runtime about our intended usage of this memory, which it can use to find a more appropriate place in its memory.
We will not do custom allocation on the C++ side, but I mention all this to help justifying how things work on the GPU side.
Note
We talk here about dynamic allocation, where the allocation may depend on runtime inputs. This is opposed to static allocation, like when declaring char buffer[256];
, which is known at compile time and thus handled differently.
In case of GPU memory, static allocation happens when compiling shaders (see next chapter), but our C++ code can only interact with dynamically allocated data.
GPU-side allocationΒΆ
Because GPU programs are highly parallel, they process a lot of data, to a point where data transfers within the GPU (between computing units and VRAM) is often the limiting bottleneck of our programs.
For this reason, we always specify an intended usage when allocating data on the GPU!
Let us get practical: to allocate data, we create a WGPUBuffer
object, which follows the usual object creation idiom:
// 1. We build a descriptor (called 'A' because we will have multiple buffers)
WGPUBufferDescriptor bufferDescA = WGPU_BUFFER_DESCRIPTOR_INIT;
{{Fill in buffer descriptor A}}
// 2. We create the buffer from its descriptor
WGPUBuffer bufferA = wgpuDeviceCreateBuffer(device, &bufferDescA);
First of all, we of course need to specify the size of the buffer, like we do when calling malloc
:
bufferDescA.size = 256;
Then, as described above, we tell the device how we intend to use this buffer. The usage is given as a bitmask, i.e., an integer where each bit is a flag, that we can combine with others. The following bits are defined (comments are mine):
// Definition of WGPUBufferUsage bit flags values in webgpu.h
// The buffer can be *mapped* to be *read* on the CPU side
static const WGPUBufferUsage WGPUBufferUsage_MapRead = 0x0000000000000001;
// The buffer can be *mapped* to be *written* on the CPU side
static const WGPUBufferUsage WGPUBufferUsage_MapWrite = 0x0000000000000002;
// The buffer can be used as the *source* of a GPU-side copy operation
static const WGPUBufferUsage WGPUBufferUsage_CopySrc = 0x0000000000000004;
// The buffer can be used as the *destination* of a GPU-side copy operation
static const WGPUBufferUsage WGPUBufferUsage_CopyDst = 0x0000000000000008;
// The buffer can be used as an Index buffer when doing indexed drawing in a render pipeline
static const WGPUBufferUsage WGPUBufferUsage_Index = 0x0000000000000010;
// The buffer can be used as an Vertex buffer when using a render pipeline
static const WGPUBufferUsage WGPUBufferUsage_Vertex = 0x0000000000000020;
// The buffer can be bound to a shader as a uniform buffer
static const WGPUBufferUsage WGPUBufferUsage_Uniform = 0x0000000000000040;
// The buffer can be bound to a shader as a storage buffer
static const WGPUBufferUsage WGPUBufferUsage_Storage = 0x0000000000000080;
// The buffer can store arguments for an indirect draw call
static const WGPUBufferUsage WGPUBufferUsage_Indirect = 0x0000000000000100;
// The buffer can store the result of a timestamp or occlusion query
static const WGPUBufferUsage WGPUBufferUsage_QueryResolve = 0x0000000000000200;
Note
Only the first usages make sens to us for now, we will progressively discover what the others mean in later chapters.
The rule of thumb is simple: only specify the flags you really need. If you miss one, WebGPU will complain with a message that should hint you about the missing usage flag.
At this point, we need a little scenario for our example, so that we can determine a usage.
Simple exampleΒΆ
In our simple example scenario, we copy a slice of Buffer A into Buffer B.ΒΆ
Let us say we want to create 2 buffers, write data in the first one (bufferA
), copy it into the second one (bufferB
) on the GPU-side, and finally read bufferB
back on the CPU.
So, we create a second buffer, with a second descriptor:
// We build a second buffer, called B
WGPUBufferDescriptor bufferDescB = WGPU_BUFFER_DESCRIPTOR_INIT;
{{Fill in buffer descriptor B}}
WGPUBuffer bufferB = wgpuDeviceCreateBuffer(device, &bufferDescB);
Note
We create buffers before the command encoding test from previous chapter:
// Before encoding commands:
{{Create buffer A}}
{{Create buffer B}}
{{Write initial value in Buffer A}}
To makes things slightly more interesting, I will make buffer B shorter than buffer A, so that we only copy a slice of buffer A into buffer B.
// buffer B is shorter than buffer A in this example
bufferDescB.size = 32;
Note
We could also reuse the same WGPUBufferDescriptor
struct for both buffer creations, but using a separate one is clearer for the context of this guide.
UsageΒΆ
Back to the question of usage, we now know what to specify:
// Buffer A is *written* on CPU, and used as *source* of a GPU-side copy
bufferDescA.usage = WGPUBufferUsage_MapWrite | WGPUBufferUsage_CopySrc;
// Buffer B is *read* on CPU, and used as *destination* of a GPU-side copy
bufferDescB.usage = WGPUBufferUsage_MapRead | WGPUBufferUsage_CopyDst;
Note how the pipe operator (|
, also called bitwise OR) is used to combine the usage flags together.
LabelsΒΆ
We could stop here with descriptors: specifying a byte size and a usage is all the device requires to allocate a buffer.
It is however good practice to take benefit from the possibility to name our objects, especially now that we have 2 objects of the same nature (2 buffers). This greatly helps understanding error messages.
bufferDescA.label = toWgpuStringView("Buffer A");
bufferDescB.label = toWgpuStringView("Buffer B");
Initial mapping stateΒΆ
There is one last field in the buffer descriptor which tells whether the buffer is mapped to the CPU side upon its creation. Mapping a buffer means to make it temporarily available on the CPU side, although it is a GPU buffer.
The mappedAtCreation
field is of course only relevant for buffers that have declared a mapping usage (MapWrite
or MapRead
), and is only really useful for the MapWrite
case.
In order to write the initial value of our bufferA
, we are interested in having it mapped at creation:
// Map Buffer A at creation in order to upload its initial value
bufferDescA.mappedAtCreation = true;
Note
Technically, the field mappedAtCreation
has type WGPUBool
, which is actually a uint32_t
and not a bool
. The boolean type is not a built-in type of C and it sometimes induces special behaviors in C++ (e.g., std::vector<bool>
is something special) so it is usually not used in C APIs. Use 1
instead of true
if you compiler complains.
In newer versions of WebGPU, macros WGPU_TRUE
and WGPU_FALSE
are defined to clarify this.
Freeing memoryΒΆ
If you ever played with manual allocation using malloc
or new
, you must know that we always free our dynamically allocated memory.
In the case of WGPUBuffer
, its associated memory in VRAM is freed as soon as all references to it are released. We thus simply release our buffers:
wgpuBufferRelease(bufferA);
wgpuBufferRelease(bufferB);
Sometimes, we want to force freeing VRAM memory even if there may remain references to our buffers somewhere else in our program (for instance through a bind group, an object that we will discover later). For this, we may call wgpuBufferDestroy(buffer)
. Other references to the buffer are then no longer usable.
Copying buffersΒΆ
Our GPU buffers are allocated in VRAM, we can now populate Buffer A and proceed with the GPU-side copy operation.
Initial valueΒΆ
Before copying, we need something to copy, so we will set the initial value of bufferA
. Given that we had it mapped at creation, we can get the address in (CPU-side) RAM where it is mapped and simply write in there!
To get this address, we use the following function:
// The signature of the wgpuBufferGetMappedRange function as it is in webgpu.h
void * wgpuBufferGetMappedRange(WGPUBuffer buffer, size_t offset, size_t size);
As you can see, this can be used to map only a sub range of the buffer, but in our case we want to map the whole buffer, so we can use the special sentinel value WGPU_WHOLE_MAP_SIZE
to mean exactly that:
// Get a pointer to the entire mapped buffer and interpret it as 8-bit unsigned integers
uint8_t* bufferDataA = static_cast<uint8_t*>(
wgpuBufferGetMappedRange(bufferA, 0, WGPU_WHOLE_MAP_SIZE)
);
Note that I also cast the returned void*
pointer into something we can actually write to, e.g., a uint8_t
in this example. We can now simply write whatever we want:
// Write 0, 1, 2, 3, ... in bufferA
for (size_t i = 0 ; i < 256 ; ++i) {
bufferDataA[i] = static_cast<uint8_t>(i);
}
Caution
In this simple example, we write elements of type uint8_t
whose byte size is exactly 1 (and can have values up to 255). If we would use a larger type we would need to adapt the byte size of the buffer. For instance, we should allocate 256 * sizeof(int)
bytes if we want our buffer to store 256 integers.
Importantly, we must unmap the buffer once we no longer need it on the CPU side, before doing anything else with it!
wgpuBufferUnmap(bufferA);
// Do NOT use bufferDataA beyond this point!
Note
There are other ways to set the initial value of a buffer. If you want to copy from another CPU buffer, you can use wgpuBufferWriteMappedRange
when the buffer is mapped, or even wgpuQueueWriteBuffer
, which only requires usage CopyDst
.
Copy operationΒΆ
Okey, we have two buffers bufferA
and bufferB
, the former contains some values, and we want to copy a slice of it into the latter.
Copying buffers is done using wgpuCommandEncoderCopyBufferToBuffer
. From the name of this function, we know that this instruction is sent through the command encoder object that we created in the previous chapter.
Note
This will replace the calls to wgpuCommandEncoderInsertDebugMarker
that we used previously to mock instructions.
The full buffer copy command defines a source and a destination buffer, but also the offset at which the copy starts in both source and offset, and the overall number of bytes to copy:
// The signature of the wgpuCommandEncoderCopyBufferToBuffer function as it is in webgpu.h
void wgpuCommandEncoderCopyBufferToBuffer(
WGPUCommandEncoder commandEncoder,
WGPUBuffer source,
uint64_t sourceOffset,
WGPUBuffer destination,
uint64_t destinationOffset,
uint64_t size
);
The command copies size
bytes from byte sourceOffset
of buffer source
into the bytes starting at destinationOffset
in destination
.ΒΆ
In our case, we copy bufferDescB.size
bytes (because buffer B is smaller than buffer A), and let us say we start at an offset of 16 in the source:
wgpuCommandEncoderCopyBufferToBuffer(
encoder,
bufferA,
16, // sourceOffset
bufferB,
0, // destinationOffset
bufferDescB.size
);
Did this work? The only way to know is to read back the content of buffer B and check that it contains \((16, 17, \dots, 47)\)!
Reading backΒΆ
As you may have expected, the only way to read data back from the GPU is through an asynchronous operation. This is by design, because reading may take a lot of time (relatively to simple CPU operations).
Asynchronous mappingΒΆ
Reading from a buffer is done by mapping it onto the CPU. This is similar to the way we filled in our input buffer A above, except that this time buffer B is not mapped by default, so we need to ask for this with the following asynchronous operation:
// The signature of the wgpuBufferMapAsync function as it is in webgpu.h
WGPUFuture wgpuBufferMapAsync(
WGPUBuffer buffer,
WGPUMapMode mode, // can be WGPUMapMode_Read or WGPUMapMode_Write
size_t offset,
size_t size,
WGPUBufferMapCallbackInfo callbackInfo
);
We first introduced asynchronous operations in chapter The Adapter, so you should already guess how to use this and in particular the callbackInfo
.
Launching the operationΒΆ
We call wgpuBufferMapAsync
for our buffer B, with a map mode of WGPUMapMode_Read
(read-only). Like we did above, we can use WGPU_WHOLE_MAP_SIZE
to mean that we want to map the entire buffer:
{{Define callback handling the mapped buffer B}}
{{Build callback info for mapping buffer B}}
// And finally we launch the asynchronous operation
wgpuBufferMapAsync(
bufferB,
WGPUMapMode_Read,
0, // offset
WGPU_WHOLE_MAP_SIZE,
callbackInfo
);
What should we do in the callback? We can check that the mapping is successful and anyways let the rest of the program know that the operation ended:
// Context passed to `onBufferBMapped` through theuserdata pointer:
struct OnBufferBMappedContext {
bool operationEnded = false; // Turned true as soon as the callback is invoked
bool mappingIsSuccessful = false; // Turned true only if mapping succeeded
};
// This function has the type WGPUBufferMapCallback as defined in webgpu.h
auto onBufferBMapped = [](
WGPUMapAsyncStatus status,
struct WGPUStringView message,
void* userdata1,
void* /* userdata2 */
) {
OnBufferBMappedContext& context = *reinterpret_cast<OnBufferBMappedContext*>(userdata1);
context.operationEnded = true;
if (status == WGPUMapAsyncStatus_Success) {
context.mappingIsSuccessful = true;
} else {
std::cout << "Could not map buffer B! Status: " << status << ", message: " << toStdStringView(message) << std::endl;
}
};
Note
Instead of remembering the status of the map operation, we could retrieve this information later with wgpuBufferGetMapState()
, e.g.:
if (wgpuBufferGetMapState(bufferB) == WGPUBufferMapState_Mapped) {
// [...]
}
// We create an instance of the context shared with `onBufferBMapped`
OnBufferBMappedContext context;
// And we build the callback info:
WGPUBufferMapCallbackInfo callbackInfo = WGPU_BUFFER_MAP_CALLBACK_INFO_INIT;
callbackInfo.mode = WGPUCallbackMode_AllowProcessEvents;
callbackInfo.callback = onBufferBMapped;
callbackInfo.userdata1 = &context;
Waiting for mappingΒΆ
We can follow the same pattern than before, using wgpuInstanceProcessEvents
.
Note
We could also use the approach proposed in the Futures and asynchronous operations appendix by using the WGPUFuture
object returned by wgpuBufferMapAsync()
.
// Process events until the map operation ended
wgpuInstanceProcessEvents(instance);
while (!context.operationEnded) {
sleepForMilliseconds(200);
wgpuInstanceProcessEvents(instance);
}
Once the operation ended, and if it was successful, we can read and display our output buffer B on the CPU. To recap, here is the outline of our read back process:
{{Map buffer B}}
{{Wait for mapping of buffer B}}
if (context.mappingIsSuccessful) {
{{Display buffer B}}
}
Note
We add the mapping at the end of the main body, right before releasing things. Note that this replaces the βWait for completionβ of the previous chapter, since the mapping will always happen after the previously submitted commands.
// Removed
Displaying output bufferΒΆ
Now that the buffer B is mapped, we can get its CPU-side address like we did to fill in the input buffer A.
The only difference is that instead of wgpuBufferGetMappedRange
we use wgpuBufferGetConstMappedRange
, which returns a const
pointer, because the buffer is mapped in read-only mode.
// Get a (read-only) pointer to the buffer content
const uint8_t* bufferDataB = static_cast<const uint8_t*>(
wgpuBufferGetConstMappedRange(bufferB, 0, WGPU_WHOLE_MAP_SIZE)
);
With this pointer in hand, we can simply print its content:
// Display
std::cout << "Buffer B: [";
for (size_t i = 0 ; i < bufferDescB.size ; ++i) {
if (i > 0) std::cout << ", ";
std::cout << static_cast<int>(bufferDataB[i]); // cast to display as int rather than char
}
std::cout << "]" << std::endl;
And finally we unmap the buffer once we no longer need it:
// Unmap
wgpuBufferUnmap(bufferB);
// Do NOT use bufferDataB beyond this point!
Finally, you should see the following output:
Buffer B: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
Like we expected, the values from index 16 to 47 (= offset 16 + size 32 - 1) were copied from buffer A to buffer B, congrats!
ConclusionΒΆ
We know how to allocate, populate, *copy and read back values from GPU-side buffers. This is an important base because it allows us to transfer significant information back and forth with the GPU.
Note that buffers are not the only type of memory object on the GPU: there are also textures, which we will see later on. Think of textures as a chunk of memory with even more detailed information about its intended usage, which enables the use of dedicated circuits in the hardware.
Buffer are nonetheless enough to start running operations on the GPU, which we will do in Our first shader. But one last chapter before that: let us learn to deal with errors!
Resulting code: step017-next