1. 痛点

昨天在CS的博客笔记里我琢磨了下DrawMeshInstancing，有提到他一次draw最多支持1023个，在多就要加drawcall，此外如果是动态数据还要每帧向GPU索要数据并在CPU端更新网格坐标，因此有如下两个痛点：

多于1023个坐标就要切分数组，加drawcall，1023*1024个物体就是1024个drawcall
GPU和CPU数据每帧传递浪费时间

下面就有一个新的方法来解决以上痛点，u1s1，百度出来的资料真的是不行，还是要科学冲浪。
api：Graphics.DrawMeshInstancedIndirect
他的原理是将物体的坐标等信息逐mesh的打包成数组传给GPU显存，依据instanceID为数组下标，在shader中直接获取某个mesh的信息（浅见）。
也就是在shader中加一个StructuredBuffer<>并以instanceID为下标来区分不同的mesh draw，属于是CS和shader的缝合怪。
这一下就解放双手了

Shader "Custom/UnlitInstanceIndirectShader"
{
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100

        Pass
        {
            CGPROGRAM

            #pragma vertex vert
            #pragma fragment frag
            #pragma multi_compile_instancing nolightmap nodirlightmap nodynlightmap novertexlight
            
            #include "UnityCG.cginc"

            struct MeshProperties 
            {
                float4x4 mat; // world matrix
                float4 color;
            };
            StructuredBuffer<MeshProperties> _Properties;

            struct appdata
            {
                float4 vertex : POSITION;
            };

            struct v2f
            {
                float4 vertex : SV_POSITION;   
                float4 color : COLOR;
            };

            v2f vert(appdata v, uint instanceID : SV_InstanceID)
            {
                v2f o;
                o.vertex = mul(_Properties[instanceID].mat, v.vertex);
                o.vertex = mul(UNITY_MATRIX_VP, o.vertex);
                o.color = _Properties[instanceID].color;
                return o;
            }

            fixed4 frag(v2f o) : SV_Target
            {
                return o.color;
            }
            ENDCG
        }
    }
}

这样，CPU只需要在初始化的时候给GPU传一次数据，然后把全部的工作交给GPU就可以了，由GPU托管数据，由CS来更新数据，由shader渲染数据，完美。

2. 如何使用

CS，shader和C#都要安排，尝试写一个能用一个坐标推开一百万个物体的demo
shader上面已经贴了，下面写C#

2.1 CPU端

Graphics.DrawMeshInstancedIndirect	参数描述
mesh	要绘制的mesh
submeshIndex	mesh的子集
material	使用的材质
bounds	包围盒，如果这个包围盒超出视锥，unity会自动取消这次drawcall（不是剔除画出来的某单独的东西），这一堆渲染物体内部的剔除还是要自己做
bufferWithArgs	一个包含5个参数的ComputeBuffer，里面存着一些八股参数
argsOffset	后面都暂时用不着
properties	这个没用了，因为最多就存1023个数据
castShadows
receiveShadows
layer
camera
lightProbeUsage

这个bufferWithArgs一开始我没搞懂是干啥的，后来发现没啥用，跟着官方demo抄就行，参数个个顾名思义，好像没什么修改意义。

public class TestComputeShader : MonoBehaviour
{
   
    const int Range = 1024;
    const int Population = Range * Range;

    public Mesh mesh;  // 手拖unity内置mesh
    public Material mat;  // 手拖

    private ComputeBuffer meshPropertiesBuffer;
    private ComputeBuffer argsBuffer;

    public Transform pusher;  // 手拖
    public ComputeShader CS;  // 手拖

    private struct MeshProperties
    {
   
        public Matrix4x4 mat;
        public Vector4 color;

        public static int Size() {
    return sizeof(float) * 4 * 4 + sizeof(float) * 4; }
    }

    private void Start()
    {
   
        Init();
    }

    private void Update()
    {
   
        UpdateWorldMatAndDraw();
    }

    private void OnDisable()
    {
   
        meshPropertiesBuffer.Release();
        meshPropertiesBuffer = null;

        argsBuffer.Release();
        argsBuffer = null;
    }


    private void Init()
    {
   
        uint[] args = new uint[5] {
    0, 0, 0, 0, 0 };
        args[0] = (uint)mesh.GetIndexCount(0);
        args[1] = (uint)Population;
        args[2] = (uint)mesh.GetIndexStart(0);
        args[3] = (uint)mesh.GetBaseVertex(0);
        argsBuffer = new ComputeBuffer(1, args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
        argsBuffer.SetData(args);

        MeshProperties[] properties = new MeshProperties[Population];
        for (int i = 0; i < Population; ++i)
        {
   
            properties[i].mat = Matrix4x4.TRS(new Vector3(Random.Range(-Range, Range), Random.Range(-Range, Range), Random.Range(-Range, Range)),
                Quaternion.Euler(Random.Range(-180, 180), Random.Range(-180, 180), Random.Range(-180, 180)),
                Vector3.one);
            float color = (float)i / (float)Population;
            properties[i].color = new Vector4(color, color, color, 1);
        }
        meshPropertiesBuffer = new ComputeBuffer(Population, MeshProperties.Size());
        meshPropertiesBuffer.SetData(properties);
        mat.SetBuffer("_Properties", meshPropertiesBuffer);

        int kernelHandler = CS.FindKernel("CSMain");
        CS.SetBuffer(kernelHandler, "_Properties", meshPropertiesBuffer);
    }

    private void UpdateWorldMatAndDraw()
    {
   
        int kernelHandler = CS.FindKernel("CSMain");
        CS.SetVector("_ColliderPosition", pusher.position);
        CS.Dispatch(kernelHandler, Population / 64, 1, 1);

        const float BoundSize = 10000.0f;
        Graphics.DrawMeshInstancedIndirect(mesh, 0, mat,
            new Bounds(Vector3.zero, new Vector3(BoundSize, BoundSize, BoundSize)),
            argsBuffer);
    }
}

硬着头皮把代码看一遍就完了，全是api，没啥算法。

2.2 CS更新坐标

如果不管CS的话，上面的C#代码只是配合shader渲染了一百万个静止的物体，要让物体动起来，就要用CS修改坐标。
可以看到CS和Shader使用的ComputeBuffer是同一个，因此在显存里也是完全共享的。

这里先做个区分，CS里的计算是逐mesh的，因为数据量就是mesh粒度的，而shader里的计算是逐顶点的，虽然用shader来更新物体坐标也可以（Graphics.SetRandomWriteTarget），但是在本demo中实在没必要。
其次，shader里的像素着色器无法参与instanching的区分，只有顶点着色器能接收instanceID

#pragma kernel CSMain

#define thread_group_x 16384
#define thread_group_y 1
#define thread_x 64
#define thread_y 1

struct MeshProperties
{
    float4x4 mat;
    float4 color;
};

float3 _ColliderPosition;
RWStructuredBuffer<MeshProperties> _Properties;

[numthreads(64,1,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    float4x4 mat = _Properties[id.x].mat;
    float3 position = float3(mat[0][3], mat[1][3], mat[2][3]);
    
    float dist = distance(position, _ColliderPosition);
    
    dist = 5.0f - clamp(0.0f, 5.0f, dist);
    
    float3 push = normalize(position - _ColliderPosition) * dist;
    
    float4x4 translation = float4x4(
    1, 0, 0, push.x,
    0, 1, 0, push.y,
    0, 0, 1, push.z,
    0, 0, 0, 1);

    
    _Properties[id.x].mat = mul(translation, mat);
}

这样CS更新坐标+shader渲染物体，就不用CPU操心了。

转载：https://blog.csdn.net/MaxLykoS/article/details/117024411

查看评论

小言_互联网的博客

小言_互联网的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章

升级版GPUInstancing

1. 痛点

2. 如何使用

2.1 CPU端

2.2 CS更新坐标

* 以上用户言论只代表其个人观点，不代表本网站的观点或立场