Performance Optimization of Continuous Memory Block Data Operations in CLR Source Code

2021年7月20日 62点热度 0人点赞 0条评论
内容目录

This article mainly introduces some binary processing classes and simple usage methods of Span in the C# namespace System.Buffers.Binary. These binary processing types form the basis for higher-level applications to handle binary data. By mastering these types, we can easily handle the conversion between types and binary data, as well as improve program performance.

C# Primitive Types

C# distinguishes between value types and reference types based on memory allocation.

When classifying by base types, C# contains built-in types, generic types, custom types, anonymous types, tuple types, and CTS types (Common Type System).

The basic types in C# include:

  1. Integer types: sbyte, byte, short, ushort, int, uint, long, ulong
  2. Floating-point types: float, double, decimal
  3. Character type: char
  4. Boolean type: bool
  5. String type: string

The primitive types in C# are value types from the basic types and do not include string. The size in bytes of a primitive type can be obtained using sizeof(), and all types except bool have two fields: MaxValue and MinValue.

sizeof(uint);
uint.MaxValue
uint.MinValue

We can also distinguish them in generics; except for string, the other types are struct.

<T>() where T : struct
{
}

For more information, please click here: https://www.programiz.com/csharp-programming/variables-primitive-data-types

1. Optimizing Array Performance with Buffer

Buffer can operate on arrays of primitive types (int, byte, etc.). By utilizing the Buffer class in .NET, we can enhance application performance by accessing data in memory more quickly. Buffer can retrieve a specified number of bytes directly from an array of primitive types or set the value of a specific byte.

Buffer is mainly used for direct memory data manipulation and working with unmanaged memory; using Buffer provides a safe and high-performance experience.

| Method | Description |
| --------------------------------------------- | ------------------------------------------------------------ |
| BlockCopy(Array, Int32, Array, Int32, Int32) | Copies a specified number of bytes from a source array starting at a specific offset to a destination array starting at a specific offset. |
| ByteLength(Array) | Returns the number of bytes in a specified array. |
| GetByte(Array, Int32) | Retrieves the byte at a specified position in a specified array. |
| MemoryCopy(Void*, Void*, Int64, Int64) | Copies a number of bytes from one memory address to another specified by long integer values. This API does not comply with CLS. |
| MemoryCopy(Void*, Void*, UInt64, UInt64) | Copies a number of bytes from one memory address to another specified by unsigned long integer values. This API does not comply with CLS. |
| SetByte(Array, Int32, Byte) | Assigns a specified value to a byte at a specific position in the specified array. |

CLS refers to the Common Language Specification. Please refer to https://www.cnblogs.com/whuanle/p/14141213.html#5,clscompliantattribute

Next, let's introduce some usage methods of Buffer.

BlockCopy can copy part of an array to another array, as shown below:

        int[] arr1 = new int[] { 1, 2, 3, 4, 5 };
        int[] arr2 = new int[10] { 0, 0, 0, 0, 0, 6, 7, 8, 9, 10 };

        // int = 4 byte
        // index:       0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 ... ...
        // arr1:        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00 05 00 00 00
        // arr2:        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 09 00 00 00 0A 00 00 00

        // Buffer.ByteLength(arr1) == 20 ,
        // Buffer.ByteLength(arr2) == 40


        Buffer.BlockCopy(arr1, 0, arr2, 0, 19);

        for (int i = 0; i < arr2.Length; i++)
        {
            Console.Write(arr2[i] + ",");
        }

.SetByte() allows for fine-grained control over the values in the array, enabling you to set arbitrary values at specific positions in the array, as shown below:

        //source data:
        // 0000,0001,0002,00003,0004
        // 00 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        int[] a = new int[] { 0, 1, 2, 3, 4 };
        foreach (var item in a)
        {
            Console.Write(item + ",");
        }

        Console.WriteLine("\n------\n");

        // see : https://stackoverflow.com/questions/26455843/how-are-array-values-stored-in-little-endian-vs-big-endian-architecture
        // memory save that data:
        // 0000    1000    2000    3000    4000
        for (int i = 0; i < Buffer.ByteLength(a); i++)
        {
            Console.Write(Buffer.GetByte(a, i));
            if (i != 0 && (i + 1) % 4 == 0)
                Console.Write("    ");
        }

        // Hexadecimal
        // 0000    1000    2000    3000    4000

        Console.WriteLine("\n------\n");

        Buffer.SetByte(a, 0, 4);
        Buffer.SetByte(a, 4, 3);
        Buffer.SetByte(a, 8, 2);
        Buffer.SetByte(a, 12, 1);
        Buffer.SetByte(a, 16, 0);

        foreach (var item in a)
        {
            Console.Write(item + ",");
        }

        Console.WriteLine("\n------\n");

It is recommended to copy the code for personal testing, use breakpoints for debugging, and observe the process.

2. Byte Array Fine-grained Operation with BinaryPrimitives

System.Buffers.Binary.BinaryPrimitives is used to read from or write to byte arrays with precision, and can only be applied to byte or byte arrays. Its application scenarios are widespread.

The implementation principle of BinaryPrimitives is based on BitConverter, which encapsulates BitConverter in a certain way. The primary usage of BinaryPrimitives involves reading information from a byte or byte array in a specific format.

For example, BinaryPrimitives can read four bytes from a byte array all at once, with the sample code as follows:

        // source data:  00 01 02 03 04
        // binary data:  00000000 00000001 00000010 00000011 000001000
        byte[] arr = new byte[] { 0, 1, 2, 3, 4 };

        // read one int,4 byte
        int head = BinaryPrimitives.ReadInt32BigEndian(arr);


        // 5 byte:             00000000 00000001 00000010 00000011 000001000
        // read 4 byte(int) :  00000000 00000001 00000010 00000011
        //                     = 66051

        Console.WriteLine(head);

In BinaryPrimitives, there is a distinction between big-endian and little-endian. In C#, most of the time it is little-endian first and big-endian second, which may vary depending on the processor architecture.
You can use BitConverter.IsLittleEndian to determine whether C# programs run in big-endian or little-endian mode on the current processor.

Methods beginning with .Read...() allow for byte-level access to data within a byte array.

Methods beginning with .Write...() enable data writing to a specific location.

Here’s another example:

        // source data:  00 01 02 03 04
        // binary data:  00000000 00000001 00000010 00000011 000001000
        byte[] arr = new byte[] { 0, 1, 2, 3, 4 };

        // read one int,4 byte
        // 5 byte:             00000000 00000001 00000010 00000011 000001000
        // read 4 byte(int) :  00000000 00000001 00000010 00000011
        //                     = 66051

        int head = BinaryPrimitives.ReadInt32BigEndian(arr);
        Console.WriteLine(head);

        // BinaryPrimitives.WriteInt32LittleEndian(arr, 1);
        BinaryPrimitives.WriteInt32BigEndian(arr.AsSpan().Slice(0, 4), 0b00000000_00000000_00000000_00000001);
        // to : 00000000 00000000 00000000 00000001 |  000001000
        // read 4 byte

        head = BinaryPrimitives.ReadInt32BigEndian(arr);
        Console.WriteLine(head);

It is recommended to copy the code for personal testing, use breakpoints for debugging, and observe the process.

Improving Code Safety

C# and .NET Core provide many performance-oriented APIs. One major advantage of C# and .NET is the ability to write fast, high-performance libraries without sacrificing memory safety. By avoiding unsafe code and utilizing binary processing classes, we can create high-performance and safe code.

In C#, we have the following types that can efficiently manipulate bytes/memory:

  • Span and C# types provide fast and safe access to memory, representing a contiguous region of arbitrary memory. Using Span allows us to serialize managed .NET arrays, stack-allocated arrays, or unmanaged memory without using pointers. .NET can prevent buffer overflows.
  • ref struct, Span
  • stackalloc is used to create stack-based arrays. stackalloc is helpful to avoid allocations when smaller buffers are needed.
  • Low-level methods for direct conversion between primitive types and bytes, such as MemoryMarshal.GetReference(), Unsafe.ReadUnaligned(), and Unsafe.WriteUnaligned().
  • BinaryPrimitives provides helper methods for efficient conversion between .NET basic types and bytes. For example, reading little-endian bytes and returning unsigned 64-bit numbers. Provided methods are optimized and utilize vectorization: BinaryPrimitives.ReadUInt64LittleEndian, BinaryPrimitive.

Methods beginning with .Reverse...() can swap the endianness of primitive types.

        short value = 0b00000000_00000001;
        // to endianness: 0b00000001_00000000 == 256
        BinaryPrimitives.ReverseEndianness(0b00000000_00000000_00000000_00000001);

        Console.WriteLine(BinaryPrimitives.ReverseEndianness(value));

        value = 0b00000001_00000000;
        Console.WriteLine(BinaryPrimitives.ReverseEndianness(value));
        // 1

3. BitConverter and MemoryMarshal

BitConverter allows conversion between primitive types and bytes, such as converting between int and byte or retrieving and writing any byte of a primitive type.
Here's an example:

        // 0b...1_00000100
        int value = 260;
		
        // byte max value:255
        // a = 0b00000100; loses bits before 00000100 of int.
        byte a = (byte)value;

        // a = 4
        Console.WriteLine(a);

        // LittleEndian
        // 0b 00000100 00000001 00000000 00000000
        byte[] b = BitConverter.GetBytes(260);
        Console.WriteLine(Buffer.GetByte(b, 1)); // 4

        if (BitConverter.IsLittleEndian)
            Console.WriteLine(BinaryPrimitives.ReadInt32LittleEndian(b));
        else
            Console.WriteLine(BinaryPrimitives.ReadInt32BigEndian(b));

MemoryMarshal provides methods for interacting with Memory<T>, ReadOnlyMemory<T>, Span<T>, and ReadOnlySpan<T>.

MemoryMarshal is in the System.Runtime.InteropServices namespace.

First, let's introduce MemoryMarshal.Cast(), which can cast a range of one primitive type to another.

        // 1 int  = 4 byte
        // int [] {1,2}
        // 0001     0002
        var byteArray = new byte[] { 1, 0, 0, 0, 2, 0, 0, 0 };
        Span&lt;byte&gt; byteSpan = byteArray.AsSpan();
        // byte to int 
        Span&lt;int&gt; intSpan = MemoryMarshal.Cast&lt;byte, int&gt;(byteSpan);
        foreach (var item in intSpan)
        {
            Console.Write(item + &quot;,&quot;);
        }

In simple terms, MemoryMarshal can convert one structure to another.

我们可以将一个结构转换为字节:

public struct Test
{
    public int A;
    public int B;
    public int C;
}

... ...

        Test test = new Test()
        {
            A = 1,
            B = 2,
            C = 3
        };
        var testArray = new Test[] { test };
        ReadOnlySpan<byte> tmp = MemoryMarshal.AsBytes(testArray.AsSpan());

        // socket.Send(tmp); ...

还可以逆向还原:

        // bytes = socket.Accept(); ..
        ReadOnlySpan<Test> testSpan = MemoryMarshal.Cast<byte,Test>(tmp);

        // or
        Test testSpan = MemoryMarshal.Read<Test>(tmp);
        static void Main(string[] args)
        {
            int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
            int[] b = new int[] { 1, 2, 3, 4, 5, 6, 7, 0, 9 };

        }

        private static bool Compare64<T>(T[] t1, T[] t2)
            where T : struct
        {
            var l1 = MemoryMarshal.Cast<T, long>(t1);
            var l2 = MemoryMarshal.Cast<T, long>(t2);

            for (int i = 0; i < l1.Length; i++)
            {
                if (l1[i] != l2[i]) return false;
            }
            return true;
        }

程序员基本都学习过 C 语言,应该了解 C 语言中的结构体字节对齐,在 C# 中也是一样,两种类型相互转换,除了 C# 结构体转 C# 结构体,也可以 C 语言结构体转 C# 结构体,但是要考虑好字节对齐,如果两个结构体所占用的内存大小不一样,则可能在转换时出现数据丢失或出现错误。

4,Marshal

Marshal 提供了用于分配非托管内存,复制非托管内存块以及将托管类型转换为非托管类型的方法的集合,以及与非托管代码进行交互时使用的其他方法,或者用来确定对象的大小。

例如,来确定 C# 中的一些类型大小:

            Console.WriteLine("SystemDefaultCharSize={0}, SystemMaxDBCSCharSize={1}",
         Marshal.SystemDefaultCharSize, Marshal.SystemMaxDBCSCharSize);

输出 char 占用的字节数。

例如,在调用非托管代码时,需要传递函数指针,C# 一般使用委托传递,很多时候为了避免各种内存问题异常问题,需要转换为指针传递。

IntPtr p = Marshal.GetFunctionPointerForDelegate(_overrideCompileMethod)

Marshal 也可以很方便地获得一个结构体的字节大小:

public struct Point
{
    public Int32 x, y;
}

Marshal.SizeOf(typeof(Point));

从非托管内存中分配一块内存和释放内存,我们可以避免 unsafe 代码的使用,代码示例:

        IntPtr hglobal = Marshal.AllocHGlobal(100);
        Marshal.FreeHGlobal(hglobal);

实践

合理利用前面提到的二进制处理类,可以在很多方面提升代码性能,在前面的学习中,我们大概了解这些对象,但是有什么应用场景?真的能够提升性能?有没有练习代码?

这里笔者举个例子,如何比较两个 byte[] 数组是否相等?最简单的代码示例如下:

        public bool ForBytes(byte[] a, byte[] b)
        {
            if (a.Length != b.Length)
                return false;
				
            for (int i = 0; i < a.Length; i++)
            {
                if (a[i] != b[i]) return false;
            }
            return true;
        }

这个代码很简单,循环遍历字节数组,一个个判断是否相等。

如果用上前面的二进制处理对象类,则可以这样写代码:

        private static bool EqualsBytes(byte[] b1, byte[] b2)
        {
            var a = b1.AsSpan();
            var b = b2.AsSpan();
            Span<byte> copy1 = default;
            Span<byte> copy2 = default;

            if (a.Length != b.Length)
                return false;

            for (int i = 0; i < a.Length;)
            {
                if (a.Length - 8 > i)
                {
                    copy1 = a.Slice(i, 8);
                    copy2 = b.Slice(i, 8);
                    if (BinaryPrimitives.ReadUInt64BigEndian(copy1) != BinaryPrimitives.ReadUInt64BigEndian(copy2))
                        return false;
                    i += 8;
                    continue;
                }

                if (a[i] != b[i])
                    return false;
                i++;
            }
            return true;
        }

你可能会在想,第二种方法,这么多代码,这么多判断,还有各种函数调用,还多创建了一些对象,这特么能够提升速度?这样会不会消耗更多内存???别急,你可以使用以下完整代码测试:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using System;
using System.Buffers.Binary;
using System.Runtime.InteropServices;
using System.Text;

namespace BenTest
{
    [SimpleJob(RuntimeMoniker.NetCoreApp31)]
    [SimpleJob(RuntimeMoniker.CoreRt31)]
    [RPlotExporter]
    public class Test
    {
        private byte[] _a = Encoding.UTF8.GetBytes("5456456456444444444444156456454564444444444444444444444444444444444444444777777777777777777777711111111111116666666666666");
        private byte[] _b = Encoding.UTF8.GetBytes("5456456456444444444444156456454564444444444444444444444444444444444444444777777777777777777777711111111111116666666666666");

        private int[] A1 = new int[] { 41544444, 4487, 841, 8787, 4415, 7, 458, 4897, 87897, 815, 485, 4848, 787, 41, 5489, 74878, 84, 89787, 8456, 4857489, 784, 85489, 47 };
        private int[] B2 = new int[] { 41544444, 4487, 841, 8787, 4415, 7, 458, 4897, 87897, 815, 485, 4848, 787, 41, 5489, 74878, 84, 89787, 8456, 4857489, 784, 85489, 47 };

        [Benchmark]
        public bool ForBytes()
        {
            for (int i = 0; i < _a.Length; i++)
            {
                if (_a[i] != _b[i]) return false;
            }
            return true;
        }

        [Benchmark]
        public bool ForArray()
        {
            return ForArray(A1, B2);
        }

        private bool ForArray<T>(T[] b1, T[] b2) where T : struct
        {
            for (int i = 0; i < b1.Length; i++)
            {
                if (!b1[i].Equals(b2[i])) return false;
            }
            return true;
        }

        [Benchmark]
        public bool EqualsArray()
        {
            return EqualArray(A1, B2);
        }

        [Benchmark]
        public bool EqualsBytes()
        {
            var a = _a.AsSpan();
            var b = _b.AsSpan();
            Span<byte> copy1 = default;
            Span<byte> copy2 = default;

            if (a.Length != b.Length)
                return false;

            for (int i = 0; i < a.Length;)
            {
                if (a.Length - 8 > i)
                {
                    copy1 = a.Slice(i, 8);
                    copy2 = b.Slice(i, 8);
                    if (BinaryPrimitives.ReadUInt64BigEndian(copy1) != BinaryPrimitives.ReadUInt64BigEndian(copy2))
                        return false;
                    i += 8;
                    continue;
                }

                if (a[i] != b[i])
                    return false;
                i++;
            }
            return true;
        }

        private bool EqualArray<T>(T[] t1, T[] t2) where T : struct
        {
            Span<byte> b1 = MemoryMarshal.AsBytes<T>(t1.AsSpan());
            Span<byte> b2 = MemoryMarshal.AsBytes<T>(t2.AsSpan());

            Span<byte> copy1 = default;
            Span<byte> copy2 = default;

            if (b1.Length != b2.Length)
                return false;

            for (int i = 0; i < b1.Length;)
            {
                if (b1.Length - 8 > i)
                {
                    copy1 = b1.Slice(i, 8);
                    copy2 = b2.Slice(i, 8);
                    if (BinaryPrimitives.ReadUInt64BigEndian(copy1) != BinaryPrimitives.ReadUInt64BigEndian(copy2))
                        return false;
                    i += 8;
                    continue;
                }

                if (b1[i] != b2[i])
                    return false;
                i++;
            }
            return true;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<Test>();
            Console.ReadKey();
        }
    }
}

使用 BenchmarkDotNet 的测试结果如下:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1052 (21H1/May2021Update)
Intel Core i7-10700 CPU 2.90GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.301
  [Host]        : .NET Core 3.1.16 (CoreCLR 4.700.21.26205, CoreFX 4.700.21.26205), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.16 (CoreCLR 4.700.21.26205, CoreFX 4.700.21.26205), X64 RyuJIT


|      Method |           Job |       Runtime |     Mean |    Error |   StdDev |
|------------ |-------------- |-------------- |---------:|---------:|---------:|
|    ForBytes | .NET Core 3.1 | .NET Core 3.1 | 76.95 ns | 0.064 ns | 0.053 ns |
|    ForArray | .NET Core 3.1 | .NET Core 3.1 | 66.37 ns | 1.258 ns | 1.177 ns |
| EqualsArray | .NET Core 3.1 | .NET Core 3.1 | 17.91 ns | 0.027 ns | 0.024 ns |
| EqualsBytes | .NET Core 3.1 | .NET Core 3.1 | 26.26 ns | 0.432 ns | 0.383 ns |

可以看到,byte[] 比较中,使用了二进制对象的方式,耗时下降了近 60ns,而在 struct 的比较中,耗时也下降了 40ns。

在第二种代码中,我们使用了 Span、切片、 MemoryMarshal、BinaryPrimitives,这些用法都可以给我们的程序性能带来很大的提升。

这里示例虽然使用了 Span 等,其最主要是利用了 64位 CPU ,64位 CPU 能够一次性读取 8个字节(64位),因此我们使用 ReadUInt64BigEndian 一次读取从字节数组中读取 8 个字节去进行比较。如果字节数组长度为 1024 ,那么第二种方法只需要 比较 128次。

当然,这里并不是这种代码性能是最强的,因为 CLR 有很多底层方法具有更猛的性能。不过,我们也看到了,合理使用这些类型,能够很大程度上提高代码性能。上面的数组对比只是一个简单的例子,在实际项目中,我们也可以挖掘更多使用场景。

更高性能

虽然第二种方法,快了几倍,但是性能还不够强劲,我们可以利用 Span 中的 API,来实现更快的比较。

        [Benchmark]
        public bool SpanEqual()
        {
            return SpanEqual(_a,_b);
        }
        private bool SpanEqual(byte[] a, byte[] b)
        {
            return a.AsSpan().SequenceEqual(b);
        }

You can try

StructuralComparisons.StructuralEqualityComparer.Equals(a, b);

Performance test results:

|      Method |           Job |       Runtime |      Mean |     Error |    StdDev |
|------------ |-------------- |-------------- |----------:|----------:|----------:|
|    ForBytes | .NET Core 3.1 | .NET Core 3.1 | 77.025 ns | 0.0502 ns | 0.0419 ns |
|    ForArray | .NET Core 3.1 | .NET Core 3.1 | 66.192 ns | 0.6127 ns | 0.5117 ns |
| EqualsArray | .NET Core 3.1 | .NET Core 3.1 | 17.897 ns | 0.0122 ns | 0.0108 ns |
| EqualsBytes | .NET Core 3.1 | .NET Core 3.1 | 25.722 ns | 0.4584 ns | 0.4287 ns |
|   SpanEqual | .NET Core 3.1 | .NET Core 3.1 |  4.736 ns | 0.0099 ns | 0.0093 ns |

As we can see, the speed of Span.SequenceEqual() is simply overwhelming. This concludes the introduction to binary processing techniques in C#. By reading the CLR source code, we can learn many advanced operations. Readers are encouraged to read the CLR source code, as it can greatly enhance their technical skills.

痴者工良

高级程序员劝退师

文章评论