Exploring char and string in C#
[TOC]
1. System.Char Character
char
is an alias for System.Char
.
System.Char
occupies two bytes, which is 16 binary bits.
System.Char
is used to represent and store a single Unicode character.
The representation range of System.Char
is from U+0000
to U+FFFF
, and the default value of char
is \0
, i.e., U+0000
.
Unicode representation is typically shown in the form U+____
, consisting of a U
followed by a group of hexadecimal digits.
The char
type can be assigned in four ways:
char a = 'j';
char b = '\u006A';
char c = '\x006A';
char d = (char) 106;
Console.WriteLine($"{a} | {b} | {c} | {d}");
Output:
j | j | j | j
A \u
prefix indicates a Unicode escape sequence (encoding); when using a Unicode escape sequence, it must be followed by four hexadecimal digits.
\u006A Valid
\u06A Invalid
\u6A Invalid
A \x
prefix indicates a hexadecimal escape sequence, which also consists of four hexadecimal digits. If there are leading zeros, they can be omitted. The following examples all represent the same character.
\x006A
\x06A
\x6A
The char
type can be implicitly converted to other numeric types, such as ushort
, int
, uint
, long
, and ulong
. It can also be converted to floating-point types like float
, double
, and decimal
.
The char
type can be explicitly converted to sbyte
, byte
, and short
.
Other types cannot be implicitly converted to char
, but any integer or floating-point type can be explicitly converted to char
.
2. Character Processing
In System.Char
, there are many static methods that help identify and process characters.
A very important enumeration is UnicodeCategory
.
public enum UnicodeCategory
{
UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter,
NonSpacingMark,
SpacingCombiningMark,
EnclosingMark,
DecimalDigitNumber,
LetterNumber,
OtherNumber,
SpaceSeparator,
LineSeparator,
ParagraphSeparator,
Control,
Format,
Surrogate,
PrivateUse,
ConnectorPunctuation,
DashPunctuation,
OpenPunctuation,
ClosePunctuation,
InitialQuotePunctuation,
FinalQuotePunctuation,
OtherPunctuation,
MathSymbol,
CurrencySymbol,
ModifierSymbol,
OtherSymbol,
OtherNotAssigned,
}
In System.Char
, there is a static method GetUnicodeCategory()
that can return the type of a character, i.e., the values from the above enumeration.
In addition to GetUnicodeCategory()
, we also have specific static methods to determine the category of a character.
The following lists the static methods and their usage explanations along with the enumeration categories.
| Static Method | Description | Enumeration Representation |
|---------------------|-------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| IsControl | Non-printable characters with a value less than 0x20
, e.g., \r, \n, \t, \0, etc. | None |
| IsDigit | Digits 0-9 and other numeral systems | DecimalDigitNumber |
| IsLetter | Alphabetic characters A-Z, a-z and other letters | UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter |
| IsLetterOrDigit | Letters and digits | Refers to IsLetter and IsDigit |
| IsLower | Lowercase letters | LowercaseLetter |
| IsNumber | Digits, fractions in Unicode, Roman numerals | DecimalDigitNumber,
LetterNumber,
OtherNumber |
| IsPunctuation | Punctuation in Western and other letter systems | ConnectorPunctuation,
DashPunctuation,
InitialQuotePunctuation,
FinalQuotePunctuation,
OtherPunctuation |
| IsSeparator | Spaces and all Unicode separators | SpaceSeparator,
ParagraphSeparator |
| IsSurrogate | Unicode values from 0x10000
to 0x10FFF
| Surrogate |
| IsSymbol | Most printable characters | MathSymbol,
ModifierSymbol,
OtherSymbol |
| IsUpper | Uppercase letters | UppercaseLetter |
| IsWhiteSpace | All separators plus \t, \n, \r, \v, \f | SpaceSeparator,
ParagraphSeparator |
Example:
char chA = 'A';
char ch1 = '1';
string str = "test string";
Console.WriteLine(chA.CompareTo('B')); //----------- Output: "-1
// (meaning 'A' is 1 less than 'B')
Console.WriteLine(chA.Equals('A')); //----------- Output: "True"
Console.WriteLine(Char.GetNumericValue(ch1)); //----------- Output: "1"
Console.WriteLine(Char.IsControl('\t')); //----------- Output: "True"
Console.WriteLine(Char.IsDigit(ch1)); //----------- Output: "True"
Console.WriteLine(Char.IsLetter(',')); //----------- Output: "False"
Console.WriteLine(Char.IsLower('u')); //----------- Output: "True"
Console.WriteLine(Char.IsNumber(ch1)); //----------- Output: "True"
Console.WriteLine(Char.IsPunctuation('.')); //----------- Output: "True"
Console.WriteLine(Char.IsSeparator(str, 4)); //----------- Output: "True"
Console.WriteLine(Char.IsSymbol('+')); //----------- Output: "True"
Console.WriteLine(Char.IsWhiteSpace(str, 4)); //----------- Output: "True"
Console.WriteLine(Char.Parse("S")); //----------- Output: "S"
Console.WriteLine(Char.ToLower('M')); //----------- Output: "m"
Console.WriteLine('x'.ToString()); //----------- Output: "x"
Console.WriteLine(Char.IsSurrogate('\U00010F00')); // Output: "False"
char test = '\xDFFF';
Console.WriteLine(test); //----------- Output:'?'
Console.WriteLine(Char.GetUnicodeCategory(test));//----------- Output: "Surrogate"
If you want to satisfy your curiosity, you can click here
3. Globalization
In C#, System.Char
provides rich methods for character processing, such as the commonly used ToUpper
, ToLower
.
However, character processing is influenced by the user's culture.
When using methods in System.Char
to process characters, you can call methods with the Invariant
suffix or use CultureInfo.InvariantCulture
for culture-independent character processing.
Example:
Console.WriteLine(Char.ToUpper('i',CultureInfo.InvariantCulture));
Console.WriteLine(Char.ToUpperInvariant('i'));
For character and string processing, the overload parameters and processing methods that may be used are as follows.
StringComparison
| Enumeration | Value | Description |
|----------------------------------|--------|------------------------------------------------------------------|
| CurrentCulture | 0 | Compare strings using culture-sensitive ordering based on current culture |
| CurrentCultureIgnoreCase | 1 | Compare strings using culture-sensitive ordering based on current culture, ignoring case |
| InvariantCulture | 2 | Compare strings using culture-sensitive ordering based on invariant culture |
| InvariantCultureIgnoreCase | 3 | Compare strings using culture-sensitive ordering based on invariant culture, ignoring case |
| Ordinal | 4 | Compare strings using ordinal (binary) ordering |
| OrdinalIgnoreCase | 5 | Compare strings using ordinal (binary) ordering, ignoring case |
CultureInfo
| Enumeration | Description |
|-------------------------|---------------------------------------------------------------|
| CurrentCulture | Gets the CultureInfo object representing the culture used by the current thread |
| CurrentUICulture | Gets or sets the CultureInfo object used by the resource manager to look up culture-specific resources at runtime |
| InstalledUICulture | Gets the CultureInfo object representing the installed culture of the operating system |
| InvariantCulture | Gets the CultureInfo object that is culture-independent (fixed) |
| IsNeutralCulture | Gets a value indicating whether the current CultureInfo represents a neutral culture |
4. System.String String
4.1 String Search
Strings have multiple search methods: StartsWith()
, EndsWith()
, Contains()
, IndexOf
.
StartsWith()
and EndsWith()
can use StringComparison
comparison methods, and CultureInfo
controls culture-related rules.
StartsWith()
: Checks if the string starts with a specified string.
EndsWith()
: Checks if the string ends with a specified string.
Contains()
: Checks if the specified string exists anywhere in the string.
IndexOf
: Returns the index of the first occurrence of the string or character. If the return value is -1
, it indicates no match.
Usage Example:
string a = "痴者工良(高级程序员劝退师)";
Console.WriteLine(a.StartsWith("高级"));
Console.WriteLine(a.StartsWith("高级",StringComparison.CurrentCulture));
Console.WriteLine(a.StartsWith("高级",true, CultureInfo.CurrentCulture));
Console.WriteLine(a.StartsWith("痴者",StringComparison.CurrentCulture));
Console.WriteLine(a.EndsWith("劝退师)",true, CultureInfo.CurrentCulture));
Console.WriteLine(a.IndexOf("高级",StringComparison.CurrentCulture));
Output:
False
False
False
True
True
5
Except for Contains()
, the other three methods have multiple overloads, such as:
| Overload | Description |
|---------------------------------|--------------------------------------|
| (String) | Checks if it matches the specified string |
| (String, StringComparison) | Specifies how to compare the string |
| (String, Boolean, CultureInfo) | Controls case and culture rules for string matching |
These globalization and case matching rules will be discussed in later chapters.
4.2 String Extraction, Insertion, Deletion, Replacement
4.2.1 Extraction
The SubString()
method can be used to extract N characters from a specific start index or the remaining characters.
string a = "痴者工良(高级程序员劝退师)";
Console.WriteLine(a.Substring(startIndex: 1, length: 3));
// 者工良
Console.WriteLine(a.Substring(startIndex: 5));
// 高级程序员劝退师)
4.2.2 Insertion, Deletion, Replacement
Use the following methods:
Insert()
: Insert a character or string after a specified index.
Remove()
: Remove a substring based on the specified index.
PadLeft()
: Extend the string to N characters long using a specific string on the left.
PadRight()
: Extend the string to N characters long using a specific string on the right.
TrimStart()
: Remove a specified character from the left side of the string, stopping when a non-matching character is encountered.
TrimEnd()
: Remove a specified character from the right side of the string, stopping when a non-matching character is encountered.
Replace()
: Replace a set of consecutive characters in the string with a new set of characters.
这是你提供的内容的英文翻译:
## 5. String Intern Pool
The following is a summary by the author, and due to my level, if there are any mistakes, I hope everyone can provide criticism and correction.

The string intern pool is accomplished at the domain level, and the string intern pool can be shared among all assemblies within the domain.
The CLR maintains a table called the Intern Pool.
This table records references to all string instances declared using literals in the code.
When concatenating literals, the new string will also enter the string intern pool.
Only string instances declared using **literal declarations** will have references to the strings in the string intern pool.
However, whether it’s a field property or a string variable declared within a method, or even the default value of a method parameter, they will all enter the string intern pool.
For example:
```c#
static string test = "一个测试";
static void Main(string[] args)
{
string a = "a";
Console.WriteLine("test:" + test.GetHashCode());
TestOne(test);
TestTwo(test);
TestThree("一个测试");
}
public static void TestOne(string a)
{
Console.WriteLine("----TestOne-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
public static void TestTwo(string a = "一个测试")
{
Console.WriteLine("----TestTwo-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
public static void TestThree(string a)
{
Console.WriteLine("----TestThree-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
Output:
test:-407145577
----TestOne-----
a:-407145577
b:-407145577
test - a :True
----TestTwo-----
a:-407145577
b:-407145577
test - a :True
----TestThree-----
a:-407145577
b:-407145577
test - a :True
You can compare whether two strings are the same reference by using the static method Object.ReferenceEquals(s1, s2);
or by calling the instance method .GetHashCode()
.
You can use unsafe code to directly modify strings in memory.
Refer to https://blog.benoitblanchon.fr/modify-intern-pool/
string a = "Test";
fixed (char* p = a)
{
p[1] = '3';
}
Console.WriteLine(a);
Using *Microsoft.Diagnostics.Runtime*
can retrieve information from the CLR.
After much research, the author found that .NET does not provide an API to view the hash table in the string constant pool.
For information on the usage of C# strings and the principles behind the intern pool, please refer to:
http://community.bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx
Obtaining a list of string literals from an assembly:
https://stackoverflow.com/questions/22172175/read-the-content-of-the-string-intern-pool
Documentation on the .NET Profiling API:
https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview?redirectedfrom=MSDN
.NET string intern pooling and improving string comparison performance:
http://benhall.io/net-string-interning-to-improve-performance/
Learning articles about C# string intern pool:
https://www.cnblogs.com/mingxuantongxue/p/3782391.html
https://www.xuebuyuan.com/189297.html
If there are any errors in the summary or knowledge, please feel free to correct me.
文章评论