Exploring char and string in C#
[TOC]
1. System.Char Character
char is an alias for System.Char.
System.Char occupies two bytes, which is 16 binary bits.
System.Char is used to represent and store a single Unicode character.
The range of representation for System.Char is from U+0000
to U+FFFF
, and the default value of char is \0
, which is U+0000
.
Unicode representation is usually indicated in the form of U+____
, which consists of U
followed by a set of 16 hexadecimal digits.
char can be assigned in four different ways:
char a = 'j';
char b = '\u006A';
char c = '\x006A';
char d = (char) 106;
Console.WriteLine($"{a} | {b} | {c} | {d}");
Output:
j | j | j | j
A Unicode escape sequence starts with \u
(encoding); when using a Unicode escape sequence, it must be followed by four hexadecimal digits.
\u006A Valid
\u06A Invalid
\u6A Invalid
A hexadecimal escape sequence starts with \x
, which also consists of 4 hexadecimal digits. If there are N number of preceding zeros, they can be omitted. The examples below all represent the same character.
\x006A
\x06A
\x6A
char can be implicitly converted to other numeric types, including ushort
, int
, uint
, long
, and ulong
. It can also be converted to floating-point types such as float
, double
, and decimal
.
char can be explicitly converted to sbyte
, byte
, and short
.
Other types cannot be implicitly converted to char, but any integer and floating-point type can be explicitly converted to char.
2. Character Processing
The System.Char class has many static methods that can assist in identifying and processing characters.
A very important enumeration is UnicodeCategory:
public enum UnicodeCategory
{
UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter,
NonSpacingMark,
SpacingCombiningMark,
EnclosingMark,
DecimalDigitNumber,
LetterNumber,
OtherNumber,
SpaceSeparator,
LineSeparator,
ParagraphSeparator,
Control,
Format,
Surrogate,
PrivateUse,
ConnectorPunctuation,
DashPunctuation,
OpenPunctuation,
ClosePunctuation,
InitialQuotePunctuation,
FinalQuotePunctuation,
OtherPunctuation,
MathSymbol,
CurrencySymbol,
ModifierSymbol,
OtherSymbol,
OtherNotAssigned,
}
System.Char includes a static method GetUnicodeCategory()
that can return the type of a character, i.e., the above enumeration value.
In addition to GetUnicodeCategory()
, we can also use specific static methods to determine the category of a character.
Below is a table listing the static methods and their enumeration categories.
| Static Method | Description | Enum Representation |
| -------------------- | ------------------------------------------------------|------------------------------------------------------------------|
| IsControl | Non-printable characters with values less than 0x20
. For example, \r, \n, \t, \0 etc. | None |
| IsDigit | Numbers 0-9 and digits from other alphabets | DecimalDigitNumber |
| IsLetter | Letter characters A-Z, a-z and others | UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter |
| IsLetterOrDigit | Letters and digits | Reference IsLetter and IsDigit |
| IsLower | Lowercase letters | LowercaseLetter |
| IsNumber | Numbers, fractions in Unicode, Roman numerals | DecimalDigitNumber,
LetterNumber,
OtherNumber |
| IsPunctuation | Punctuation marks in Western and other alphabets | ConnectorPunctuation,
DashPunctuation,
InitialQuotePunctuation,
FinalQuotePunctuation,
OtherPunctuation |
| IsSeparator | Spaces and all Unicode separators | SpaceSeparator,
ParagraphSeparator |
| IsSurrogate | Unicode values between 0x10000 and 0x10FFF | Surrogate |
| IsSymbol | Most printable characters | MathSymbol,
ModifierSymbol,
OtherSymbol |
| IsUpper | Uppercase letters | UppercaseLetter |
| IsWhiteSpace | All separators and characters like \t, \n, \r, \v, \f | SpaceSeparator,
ParagraphSeparator |
Example:
char chA = 'A';
char ch1 = '1';
string str = "test string";
Console.WriteLine(chA.CompareTo('B')); //----------- Output: "-1
//(meaning 'A' is 1 less than 'B')
Console.WriteLine(chA.Equals('A')); //----------- Output: "True"
Console.WriteLine(Char.GetNumericValue(ch1)); //----------- Output: "1"
Console.WriteLine(Char.IsControl('\t')); //----------- Output: "True"
Console.WriteLine(Char.IsDigit(ch1)); //----------- Output: "True"
Console.WriteLine(Char.IsLetter(',')); //----------- Output: "False"
Console.WriteLine(Char.IsLower('u')); //----------- Output: "True"
Console.WriteLine(Char.IsNumber(ch1)); //----------- Output: "True"
Console.WriteLine(Char.IsPunctuation('.')); //----------- Output: "True"
Console.WriteLine(Char.IsSeparator(str, 4)); //----------- Output: "True"
Console.WriteLine(Char.IsSymbol('+')); //----------- Output: "True"
Console.WriteLine(Char.IsWhiteSpace(str, 4)); //----------- Output: "True"
Console.WriteLine(Char.Parse("S")); //----------- Output: "S"
Console.WriteLine(Char.ToLower('M')); //----------- Output: "m"
Console.WriteLine('x'.ToString()); //----------- Output: "x"
Console.WriteLine(Char.IsSurrogate('\U00010F00')); // Output: "False"
char test = '\xDFFF';
Console.WriteLine(test); //----------- Output:'?'
Console.WriteLine(Char.GetUnicodeCategory(test));//----------- Output:"Surrogate"
If you're curious, you can click here.
3. Globalization
C#'s System.Char has a rich set of methods for character processing, such as the commonly used ToUpper
and ToLower
.
However, character processing can be influenced by the user's language settings.
When using methods of System.Char to handle characters, methods with the Invariant
suffix or using CultureInfo.InvariantCulture
can be called for culture-independent character processing.
Example:
Console.WriteLine(Char.ToUpper('i',CultureInfo.InvariantCulture));
Console.WriteLine(Char.ToUpperInvariant('i'));
For character and string processing, the overload parameters and handling methods that may be used are described below.
StringComparison
| Enum | Enum Value | Description |
| -------------------------- | ----------- | ----------------------------------------------------------------|
| CurrentCulture | 0 | Compare strings using culture-sensitive sorting rules based on the current culture |
| CurrentCultureIgnoreCase | 1 | Compare strings with culture-sensitive sorting rules based on current culture, ignoring case |
| InvariantCulture | 2 | Compare strings using culture-sensitive sorting rules with invariant culture |
| InvariantCultureIgnoreCase | 3 | Compare strings with culture-sensitive sorting rules with invariant culture, ignoring case |
| Ordinal | 4 | Compare strings using ordinal (binary) sorting rules |
| OrdinalIgnoreCase | 5 | Compare strings using ordinal (binary) sorting rules, ignoring case |
CultureInfo
| Enum | Description |
| ------------------ | --------------------------------------------------------------- |
| CurrentCulture | Gets the CultureInfo object representing the culture used by the current thread |
| CurrentUICulture | Gets or sets the CultureInfo object representing the current user interface culture used by the resource manager at runtime |
| InstalledUICulture | Gets the CultureInfo representing the cultures installed in the operating system |
| InvariantCulture | Gets the CultureInfo object that is culture-independent (fixed) |
| IsNeutralCulture | Gets a value indicating whether the current CultureInfo represents a neutral culture |
4. System.String String
4.1 String Search
Strings have multiple searching methods: StartsWith()
, EndsWith()
, Contains()
, and IndexOf()
.
StartsWith()
and EndsWith()
can use StringComparison methods to control culture-related rules.
StartsWith()
: checks if a string starts with a matching string.
EndsWith()
: checks if a string ends with a matching string.
Contains()
: checks if a string contains a matching string at any position.
IndexOf()
: returns the index position where a string or character first appears; if the return value is -1
, it indicates no matching result.
Usage example:
string a = "痴者工良(高级程序员劝退师)";
Console.WriteLine(a.StartsWith("高级"));
Console.WriteLine(a.StartsWith("高级", StringComparison.CurrentCulture));
Console.WriteLine(a.StartsWith("高级", true, CultureInfo.CurrentCulture));
Console.WriteLine(a.StartsWith("痴者", StringComparison.CurrentCulture));
Console.WriteLine(a.EndsWith("劝退师)", true, CultureInfo.CurrentCulture));
Console.WriteLine(a.IndexOf("高级", StringComparison.CurrentCulture));
Output:
False
False
False
True
True
5
Apart from Contains()
, the other three methods have multiple overloads, for example:
| Overload | Description |
| ------------------------------ | ------------------------------------- |
| (String) | Checks for match with specified string |
| (String, StringComparison) | Indicates how to specify string match |
| (String, Boolean, CultureInfo) | Controls case-sensitivity and cultural rules for string match |
These globalization and case matching rules will be discussed further in later chapters.
4.2 String Extraction, Insertion, Deletion, Replacement
4.2.1 Extraction
The Substring()
method can extract a specified number of characters starting from a given index, or all remaining characters.
string a = "痴者工良(高级程序员劝退师)";
Console.WriteLine(a.Substring(startIndex: 1, length: 3));
// 者工良
Console.WriteLine(a.Substring(startIndex: 5));
// 高级程序员劝退师)
4.2.2 Insertion, Deletion, Replacement
Insert()
: Inserts characters or strings at a specified index.
Remove()
: Removes characters or strings at a specified index.
PadLeft()
: Expands the string on the left to a specified length using a given string.
PadRight()
: Expands the string on the right to a specified length using a given string.
TrimStart()
: Removes a specified character from the left side of the string, stopping when encountering a character that does not meet the condition.
TrimEnd()
: Removes a specified character from the right side of the string, stopping when encountering a character that does not meet the condition.
Replace()
: Replaces groups of N consecutive characters in the string with new M character groups.
## 5. String Intern Pool
The following is a personal summary by the author. Please feel free to criticize or correct any errors.

The string intern pool is maintained at the domain level and can be shared across all assemblies within the domain.
The CLR maintains a table called the intern pool.
This table records references to all string instances declared using literal strings in the code.
When concatenating literal strings, the new string will also enter the string intern pool.
Only string instances declared using **literal strings** will reference strings from the string intern pool.
String variables declared in field properties, method scopes, or even default values for method parameters will enter the string intern pool as well.
For example
```c#
static string test = "一个测试";
static void Main(string[] args)
{
string a = "a";
Console.WriteLine("test:" + test.GetHashCode());
TestOne(test);
TestTwo(test);
TestThree("一个测试");
}
public static void TestOne(string a)
{
Console.WriteLine("----TestOne-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
public static void TestTwo(string a = "一个测试")
{
Console.WriteLine("----TestTwo-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
public static void TestThree(string a)
{
Console.WriteLine("----TestThree-----");
Console.WriteLine("a:" + a.GetHashCode());
string b = a;
Console.WriteLine("b:" + b.GetHashCode());
Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
}
Output
test:-407145577
----TestOne-----
a:-407145577
b:-407145577
test - a :True
----TestTwo-----
a:-407145577
b:-407145577
test - a :True
----TestThree-----
a:-407145577
b:-407145577
test - a :True
You can compare two strings for the same reference using the static method Object.ReferenceEquals(s1, s2);
or the instance's .GetHashCode()
.
It is possible to use unsafe code to directly modify the strings in memory.
Refer to https://blog.benoitblanchon.fr/modify-intern-pool/
string a = "Test";
fixed (char* p = a)
{
p[1] = '3';
}
Console.WriteLine(a);
Using *Microsoft.Diagnostics.Runtime*
, you can obtain CLR information.
The author found extensive information indicating that .NET does not provide an API to view the hash table inside the string constant pool.
For more about C# strings and the principles of the intern pool, please refer to
http://community.bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx
To find ways to get a list of string literals in assemblies
https://stackoverflow.com/questions/22172175/read-the-content-of-the-string-intern-pool
Description of the .NET Profiling API
https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview?redirectedfrom=MSDN
Improving string comparison performance with .NET's string intern pool
http://benhall.io/net-string-interning-to-improve-performance/
Learning articles about C# string intern pool
https://www.cnblogs.com/mingxuantongxue/p/3782391.html
https://www.xuebuyuan.com/189297.html
https://www.xuebuyuan.com/189297.html
If there are any errors in the summary or knowledge, please kindly point them out.
文章评论