An Introduction to Memory and GC Knowledge in .NET Core

2021年1月10日 548点热度 0人点赞 0条评论

内容目录

Managed Code
Automatic Memory Management
Garbage Collection
- GC
Memory
- Physical Memory
- Virtual Memory
.NET Memory Composition
Memory in CLR
Conditions for Garbage Collection
Managed Heap

Reference:

【1】https://docs.microsoft.com/en-us/dotnet/standard/managed-code

【2】:https://docs.microsoft.com/en-us/dotnet/standard/clr

Managed Code

In .NET, the CLR (Common Language Runtime) is responsible for extracting managed code and compiling it into machine language before executing it. During this process, the CLR provides services such as automatic memory management, safety boundaries, and type safety, ensuring code security.

Managed code refers to code that is managed by the CLR during its execution. Managed code is a type of high-level language (such as C#, F#, etc.) that can run on .NET. When compiled, managed code generates Intermediate Language (IL).

CLR has implementations such as .NET Core/.NET5+, Mono, .NET Framework, etc. The generated files (IL code) from managed code cannot be executed directly by the operating system; they require CLR implementation (like .NET5) for managed execution, during which they are recompiled into binary code (JIT compilation).

Intermediate Language (IL) is sometimes referred to as Common Intermediate Language (CIL) or Microsoft Intermediate Language (MSIL).

Automatic Memory Management

Automatic memory management is one of the functions of CLR, which can manage memory allocation and release for applications. When managed code is executed, memory management is handled by CLR, ensuring memory safety.

Garbage Collection

GC

GC (garbage collector) translates to 垃圾回收器 in Chinese. In .NET, GC refers to the automatic memory manager in CLR, responsible for managing memory allocation and release for .NET applications.

The benefits of GC are as follows:

Automatic memory management, eliminating the need for manual allocation and release.
Efficient management of objects on the managed heap.
Intelligent object reclamation, freeing memory.
Memory safety: preventing severe errors caused by dangling pointers and wild pointers.

Memory

Physical Memory

Physical memory refers to the memory space on physical memory chips and is the actual size capacity of the physical machine.

Virtual Memory

Virtual Memory is a technique used by computer operating systems to manage memory. It can combine multiple hardware and non-contiguous address fragments into a continuous memory space recognizable by a process.

Virtual memory is supported by the operating system, such as virtual memory on Windows and interactive space on Linux. Virtual memory requires the operating system to map it to real memory address space for utilization. Virtual memory can be scheduled in three ways: paging, segmentation, and segment paging; readers interested can refer to related materials for more information.

Modern operating systems adopt virtual memory management techniques, allowing the operating system to use external storage as memory through abstraction of physical storage devices, providing a memory range larger than physical memory.

The memory composed of these storage devices is referred to as virtual address space, while the addresses developers interact with are virtual addresses, not the actual physical addresses. Virtual space greatly expands memory capabilities, enabling the system to run multiple applications concurrently without strain.

The virtual address space is divided into two parts: user space and kernel space, each program consumes both kinds of space during execution. The ratio in Linux is 3:1 while in Windows it is 2:2.

.NET Memory Composition

In .NET, memory is divided into unmanaged memory and managed memory.

.NET Core/.NET5+ has a driver called dotnet, which is used to execute commands or run .NET programs. When we run a .dll file using the dotnet command, the operating system starts the dotnet driver, which then allocates memory resources from the operating system and the dotnet driver itself. This part is referred to as unmanaged resources, of which the dotnet memory includes the CLR and other components' memory. Even if you haven't used unmanaged code like C/C++ or unmanaged resources, unmanaged memory will still be utilized.

Next, the CLR initializes the new process, allocating managed memory (managed heap). This managed memory is a contiguous region of address space, and .NET safe code can only use managed memory, not directly the physical memory. The garbage collector will allocate and release virtual memory on the managed heap for safe code.

Clearly, the workings of dotnet are quite complex, and the author does not have the capability to explain it thoroughly. Interested readers can refer to related materials for further understanding.

Memory in CLR

The Microsoft .NET CLR documentation states: By default, on 32-bit computers, each process has a 2-GB user-mode virtual address space.

This means that on a 32-bit system, .NET processes use 2GB of user-mode virtual memory, with a virtual address space range of 0x00000000 to 0x7fff; while on a 64-bit system, the address range is 0x000000000000 to 0x7FFFFFFFFFFF, approximately 16TB.

From the above information, we can see that .NET programs consume a significant amount of virtual memory. If a .NET program is running on a 64-bit operating system, its user-mode virtual address space could be far greater than 2GB.

Let's create a program "c1" with the following code:

static void Main(string[] args)
{
    Console.WriteLine(&quot;Hello World!&quot;);
    Console.Read();
}

Run the program in Linux using the dotnet xx.dll command and check its resource usage:

VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3.1g   0.0g   0.0g S   0.3   0.3   0:00.83 dotnet

Use dotnet-counters to view the dotnet process:

GC Heap Size (MB)                                              0
Gen 0 GC Count (Count / 1 sec)                                 0
Gen 0 Size (B)                                                 0
Gen 1 GC Count (Count / 1 sec)                                 0
Gen 1 Size (B)                                                 0
Gen 2 GC Count (Count / 1 sec)                                 0
Gen 2 Size (B)                                                 0
LOH Size (B)                                                   0

Note: When running a .NET project with dotnet run, there will be two processes: dotnet and c1. You will see the creation of both dotnet and c1 processes. Dotnet is the driver program, and after it starts, the CLR compiles the .dll assembly and initializes a new process.

The virtual address space in CLR needs to be in a block of addresses as upon requesting virtual memory allocation, the virtual memory manager must find a single available block that meets the needs. Even if there is a virtual address space greater than 2GB, if it is not contiguous, the allocation will fail. If there is not enough reserved virtual address space or physical space available for submission, memory exhaustion may occur.

CLR Virtual Memory States

The virtual memory in CLR can have three states:

| State | Description |
| :-------------- | :----------------------------------------------------------- |
| Free | The block of memory has no references to it and is available for allocation. |
| Reserved | The block of memory is available for your use and cannot be used for any other allocation request. However, you cannot store data in this memory block until it is committed. |
| Committed | The block of memory is assigned to physical storage. |

Memory Allocation

When CLR initializes a new process, it reserves a contiguous region of address space for the process, known as the managed heap. The managed heap maintains a pointer that initially points to the base address of the managed heap. This pointer moves backward. When memory allocation is needed, CLR allocates the memory area located after this pointer and then updates the pointer to the position right after the allocated object.

Memory Allocation

Since CLR allocates memory for objects by adding values to the pointer, its allocation speed is almost as fast as that from the stack; additionally, newly allocated objects are stored contiguously in the managed heap, allowing the program to access these objects quickly.

When the GC reclaims memory, some objects are released and memory is reclaimed, leading to fragmentation in the managed heap. Afterward, the entire memory segment gets compacted to reform contiguous memory segments, and the pointer is reset to the end of the objects.

Of course, reclamation of the Large Object Heap (LOH) does not compress memory segments; this will be discussed later.

Memory Release

Conditions for Garbage Collection

Based on Microsoft's official documentation, the conditions for garbage collection are organized as follows:

System physical memory is insufficient.
Memory allocated on the managed heap exceeds acceptable thresholds (this threshold will be dynamically adjusted).
Manual invocation of the GC class's API (such as GC.Collect).

Managed Heap

Native Heap

As mentioned earlier, .NET memory comprises unmanaged memory and managed memory. In a process running under CLR, there are both native heap and managed heap. Native memory is allocated via the Windows API's VirtualAlloc function, providing memory required for unmanaged code to the operating system and CLR.

Managed Heap

Managed heap has already been discussed earlier, so it will not be repeated here.

Generations of Managed Heap

Memory in the managed heap is divided into three generations, identified by 0, 1, and 2. Memory allocated by GC first resides in the generation 0 managed heap. When garbage collection occurs, if the object is not released, it will be promoted and stored in the generation 1 managed heap. During garbage collection of the generation 1 managed heap, objects that are not released will also be promoted to generation 2 memory, and then the generation 1 heap will undergo space compaction.

The management of the managed heap is handled by GC, which uses garbage collection algorithms for memory allocation and release.

The garbage collection algorithm is based on the following theories:

① Compressing part of the managed heap is faster than compressing the entire managed heap.
② Newer objects have shorter lifetimes, while older objects have longer lifetimes.
③ Newer objects tend to be related and accessed by the application around the same time.

Understanding these theories is essential for a deeper understanding of the design of the managed heap.

The basic description of the 0 to 2 generation heaps is as follows:

Generation 0: Objects in generation 0 have short lifetimes; garbage collection most frequently occurs in this generation.
Generation 1: Serves as a buffer for both short-lived and long-lived objects.
Generation 2: Stores long-lived objects; objects that are not collected from generations 0 and 1 are promoted to generation 2, with static data allocated to generation 2 from the start.

Before .NET 5, .NET introduced SOH (Small Object Heap) and LOH (Large Object Heap). In .NET 5, POH (Pinned Object Heap) appeared.

The memory segments of the small object heap have generations 0, 1, and 2.

微信图片_20210110194803