Wgpu Instancing

Our scene right now is very simple: we have one object centered at (0,0,0). What if we wanted more objects? This is were instancing comes in.

我们现在的场景非常简单:我们有一个以(0,0,0)为中心的对象。如果我们想要更多的东西呢?这是实例的目标。

Instancing allows us to draw the same object multiple times with different properties (position, orientation, size, color, etc.). There are multiple ways of doing instancing. One way would be to modify the uniform buffer to include these properties and then update it before we draw each instance of our object.

实例化允许我们使用不同的属性(位置、方向、大小、颜色等)多次绘制同一对象。有多种方法可以进行实例化。一种方法是修改uniform缓冲区以包含这些属性,然后在绘制对象的每个实例之前更新它。

We don’t want to use this method for performance reasons. Updating the uniform buffer for each instance would require multiple buffer copies each frame. On top of that, our method to update the uniform buffer currently requires use to create a new buffer to store the updated data. That’s a lot of time wasted between draw calls.

出于性能原因,我们不希望使用此方法。更新每个实例的uniform缓冲区将需要每个帧有多个缓冲区副本。除此之外,我们更新uniform缓冲区的方法目前需要创建一个新的缓冲区来存储更新后的数据。在两次draw calls之间浪费了很多时间。

If we look at the parameters for the draw_indexed function in the wgpu docs, we can see a solution to our problem.

如果我们查看wgpu文档中draw_indexed函数的参数,我们可以看到问题的解决方案。

1
2
3
4
5
6
pub fn draw_indexed(
&mut self,
indices: Range<u32>,
base_vertex: i32,
instances: Range<u32> // <-- This right here
)

The instances parameter takes a Range. This parameter tells the GPU how many copies, or instances, of our model we want to draw. Currently we are specifying 0..1, which instructs the GPU to draw our model once, and then stop. If we used 0..5, our code would draw 5 instances.

instances的参数为Range。此参数告诉GPU我们要绘制模型的多少副本或实例。目前我们正在指定0..1,它指示GPU绘制一次我们的模型,然后停止。如果我们使用0..5,我们的代码将绘制5个实例。

The fact that instances is a Range may seem weird as using 1..2 for instances would still draw 1 instance of our object. Seems like it would be simpler to just use a u32 right? The reason it’s a range is because sometimes we don’t want to draw all of our objects. Sometimes we want to draw a selection of them, because others are not in frame, or we are debugging and want to look at a particular set of instances.

instances是一个Range的事实可能看起来很奇怪,因为使用1..2 for instances仍然会绘制对象的一个实例。看起来使用u32会更简单,对吗?它是一个Range的原因是因为有时我们不想绘制所有的对象。有时,我们希望选择它们,因为其他的不在视野中,或者我们正在调试并希望查看一组特定的实例。

Ok, now we know how to draw multiple instances of an object, how do we tell wgpu what particular instance to draw? We are going to use something known as an instance buffer.

好的,现在我们知道了如何绘制一个对象的多个实例,我们如何告诉wgpu要绘制的特定实例?我们将使用实例缓冲区。

The Instance Buffer

We’ll create an instance buffer in a similar way to how we create a uniform buffer. First we’ll create a struct called Instance.

我们将以类似于创建uniform缓冲区的方式创建instance缓冲区。首先,我们将创建一个名为Instance的结构。

1
2
3
4
5
6
7
8
// main.rs
// ...

// NEW!
struct Instance {
position: cgmath::Vector3<f32>,
rotation: cgmath::Quaternion<f32>,
}

A Quaternion is a mathematical structure often used to represent rotation. The math behind them is beyond me (it involves imaginary numbers and 4D space) so I won’t be covering them here. If you really want to dive into them here’s a Wolfram Alpha article.

四元数是一种常用于表示旋转的数学结构。它们背后的数学是我无法理解的(它涉及虚数和4D空间),所以我不会在这里讨论它们。如果你真的想深入了解它们,这里有一篇Wolfram Alpha文章

Using these values directly in the shader would be a pain as quaternions don’t have a WGSL analog. I don’t feel like writing the math in the shader, so we’ll convert the Instance data into a matrix and store it into a struct called InstanceRaw.

直接在着色器中使用这些值会很痛苦,因为WGSL没有四元数模拟。我不想在着色器中编写数学,所以我们将实例数据转换为矩阵,并将其存储到名为InstanceRaw的结构中。

1
2
3
4
5
6
// NEW!
#[repr(C)]
#[derive(Copy, Clone, bytemuck::Pod, bytemuck::Zeroable)]
struct InstanceRaw {
model: [[f32; 4]; 4],
}

This is the data that will go into the wgpu::Buffer. We keep these separate so that we can update the Instance as much as we want without needing to mess with matrices. We only need to update the raw data before we draw.

这是将进入wgpu::Buffer的数据。我们将它们分开,这样我们就可以随心所欲地更新实例,而无需弄乱矩阵。我们只需要在绘制之前更新原始数据。

Let’s create a method on Instance to convert to InstanceRaw.

1
2
3
4
5
6
7
8
// NEW!
impl Instance {
fn to_raw(&self) -> InstanceRaw {
InstanceRaw {
model: (cgmath::Matrix4::from_translation(self.position) * cgmath::Matrix4::from(self.rotation)).into(),
}
}
}

Now we need to add 2 fields to State: instances, and instance_buffer.

1
2
3
4
struct State {
instances: Vec<Instance>,
instance_buffer: wgpu::Buffer,
}

We’ll create the instances in new(). We’ll use some constants to simplify things. We’ll display our instances in 10 rows of 10, and they’ll be spaced evenly apart.

我们将在new()中创建实例。我们将使用一些常量来简化事情。我们将以10行10列的形式显示我们的实例,并且它们将均匀地间隔开。

1
2
3
const NUM_INSTANCES_PER_ROW: u32 = 10;
const NUM_INSTANCES: u32 = NUM_INSTANCES_PER_ROW * NUM_INSTANCES_PER_ROW;
const INSTANCE_DISPLACEMENT: cgmath::Vector3<f32> = cgmath::Vector3::new(NUM_INSTANCES_PER_ROW as f32 * 0.5, 0.0, NUM_INSTANCES_PER_ROW as f32 * 0.5);

Now we can create the actual instances.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
impl State {
async fn new(window: &Window) -> Self {
// ...
let instances = (0..NUM_INSTANCES_PER_ROW).flat_map(|z| {
(0..NUM_INSTANCES_PER_ROW).map(move |x| {
let position = cgmath::Vector3 { x: x as f32, y: 0.0, z: z as f32 } - INSTANCE_DISPLACEMENT;

let rotation = if position.is_zero() {
// this is needed so an object at (0, 0, 0) won't get scaled to zero
// as Quaternions can effect scale if they're not created correctly
cgmath::Quaternion::from_axis_angle(cgmath::Vector3::unit_z(), cgmath::Deg(0.0))
} else {
cgmath::Quaternion::from_axis_angle(position.normalize(), cgmath::Deg(45.0))
};

Instance {
position, rotation,
}
})
}).collect::<Vec<_>>();
// ...
}
}

Now that we have our data, we can create the actual instance_buffer.

1
2
3
4
5
6
7
8
let instance_data = instances.iter().map(Instance::to_raw).collect::<Vec<_>>();
let instance_buffer = device.create_buffer_init(
&wgpu::util::BufferInitDescriptor {
label: Some("Instance Buffer"),
contents: bytemuck::cast_slice(&instance_data),
usage: wgpu::BufferUsage::VERTEX,
}
);

We’re going to need to create a new VertexBufferLayout for InstanceRaw.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
impl InstanceRaw {
fn desc<'a>() -> wgpu::VertexBufferLayout<'a> {
use std::mem;
wgpu::VertexBufferLayout {
array_stride: mem::size_of::<InstanceRaw>() as wgpu::BufferAddress,
// We need to switch from using a step mode of Vertex to Instance
// This means that our shaders will only change to use the next
// instance when the shader starts processing a new instance
step_mode: wgpu::InputStepMode::Instance,
attributes: &[
wgpu::VertexAttribute {
offset: 0,
// While our vertex shader only uses locations 0, and 1 now, in later tutorials we'll
// be using 2, 3, and 4, for Vertex. We'll start at slot 5 not conflict with them later
shader_location: 5,
format: wgpu::VertexFormat::Float32x4,
},
// A mat4 takes up 4 vertex slots as it is technically 4 vec4s. We need to define a slot
// for each vec4. We'll have to reassemble the mat4 in
// the shader.
wgpu::VertexAttribute {
offset: mem::size_of::<[f32; 4]>() as wgpu::BufferAddress,
shader_location: 6,
format: wgpu::VertexFormat::Float32x4,
},
wgpu::VertexAttribute {
offset: mem::size_of::<[f32; 8]>() as wgpu::BufferAddress,
shader_location: 7,
format: wgpu::VertexFormat::Float32x4,
},
wgpu::VertexAttribute {
offset: mem::size_of::<[f32; 12]>() as wgpu::BufferAddress,
shader_location: 8,
format: wgpu::VertexFormat::Float32x4,
},
],
}
}
}

We need to add this descriptor to the render pipeline so that we can use it when we render.

1
2
3
4
5
6
7
8
9
let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
// ...
vertex: wgpu::VertexState {
// ...
// UPDATED!
buffers: &[Vertex::desc(), InstanceRaw::desc()],
},
// ...
});

Don’t forget to return our new variables!

1
2
3
4
5
6
Self {
// ...
// NEW!
instances,
instance_buffer,
}

The last change we need to make is in the render() method. We need to bind our instance_buffer and we need to change the range we’re using in draw_indexed() to include the number of instances.

我们需要做的最后一个更改是render()方法。我们需要绑定实例缓冲区,并且需要更改在draw_indexed()中使用的范围,以包括实例数。

1
2
3
4
5
6
7
8
9
10
render_pass.set_pipeline(&self.render_pipeline);
render_pass.set_bind_group(0, &self.diffuse_bind_group, &[]);
render_pass.set_bind_group(1, &self.camera_bind_group, &[]);
render_pass.set_vertex_buffer(0, self.vertex_buffer.slice(..));
// NEW!
render_pass.set_vertex_buffer(1, self.instance_buffer.slice(..));
render_pass.set_index_buffer(self.index_buffer.slice(..), wgpu::IndexFormat::Uint16);

// UPDATED!
render_pass.draw_indexed(0..self.num_indices, 0, 0..self.instances.len() as _);

Make sure if you add new instances to the Vec, that you recreate the instance_buffer and as well as camera_bind_group, otherwise your new instances won’t show up correctly.

如果向Vec添加新实例,请确保重新创建instance_buffer和camera_bind_group,否则新实例将无法正确显示。

We need to reference the parts of our new matrix in shader.wgsl so that we can use it for our instances. Add the following to the top of shader.wgsl.

我们需要在shader.wgsl中引用新矩阵的部分,以便将其用于实例。将以下内容添加到shader.wgsl的顶部。

1
2
3
4
5
6
struct InstanceInput {
[[location(5)]] model_matrix_0: vec4<f32>;
[[location(6)]] model_matrix_1: vec4<f32>;
[[location(7)]] model_matrix_2: vec4<f32>;
[[location(8)]] model_matrix_3: vec4<f32>;
};

We need to reassemble the matrix before we can use it.

我们需要重新组装矩阵才能使用它。

1
2
3
4
5
6
7
8
9
10
11
12
13
[[stage(vertex)]]
fn main(
model: VertexInput,
instance: InstanceInput,
) -> VertexOutput {
let model_matrix = mat4x4<f32>(
instance.model_matrix_0,
instance.model_matrix_1,
instance.model_matrix_2,
instance.model_matrix_3,
);
// Continued...
}

We’ll apply the model_matrix before we apply camera_uniform.view_proj. We do this because the camera_uniform.view_proj changes the coordinate system from world space to camera space. Our model_matrix is a world space transformation, so we don’t want to be in camera space when using it.

在应用camera_uniform.view_proj项目之前,我们将应用model_matrix。我们这样做是因为camera_uniform.view_proj将坐标系从世界空间更改为camera空间。我们的model_matrix是一个世界空间变换,所以我们不希望在使用它时处于摄影机空间。

1
2
3
4
5
6
7
8
9
10
11
[[stage(vertex)]]
fn main(
model: VertexInput,
instance: InstanceInput,
) -> VertexOutput {
// ...
var out: VertexOutput;
out.tex_coords = model.tex_coords;
out.clip_position = camera.view_proj * model_matrix * vec4<f32>(model.position, 1.0);
return out;
}

With all that done, we should have a forest of trees!

trees

Challenge

Modify the position and/or rotation of the instances every frame.

Check out the code!