Exploring char and string in C# (Part One)

2019年12月15日 52点热度 0人点赞 0条评论
内容目录

Exploring char and string in C#

[TOC]

1. System.Char Character

char is an alias for System.Char.

System.Char occupies two bytes, which is 16 binary bits.

System.Char is used to represent and store a single Unicode character.

The representation range of System.Char is from U+0000 to U+FFFF, and the default value of char is \0, i.e., U+0000.

Unicode representation is typically shown in the form U+____, consisting of a U followed by a group of hexadecimal digits.

The char type can be assigned in four ways:

            char a = 'j';
            char b = '\u006A';
            char c = '\x006A';
            char d = (char) 106;
            Console.WriteLine($"{a} | {b} | {c} | {d}");

Output:

j | j | j | j

A \u prefix indicates a Unicode escape sequence (encoding); when using a Unicode escape sequence, it must be followed by four hexadecimal digits.

\u006A    Valid
\u06A	  Invalid
\u6A	  Invalid

A \x prefix indicates a hexadecimal escape sequence, which also consists of four hexadecimal digits. If there are leading zeros, they can be omitted. The following examples all represent the same character.

\x006A
\x06A
\x6A

The char type can be implicitly converted to other numeric types, such as ushort, int, uint, long, and ulong. It can also be converted to floating-point types like float, double, and decimal.

The char type can be explicitly converted to sbyte, byte, and short.

Other types cannot be implicitly converted to char, but any integer or floating-point type can be explicitly converted to char.

2. Character Processing

In System.Char, there are many static methods that help identify and process characters.

A very important enumeration is UnicodeCategory.

  public enum UnicodeCategory
  {
    UppercaseLetter,
    LowercaseLetter,
    TitlecaseLetter,
    ModifierLetter,
    OtherLetter,
    NonSpacingMark,
    SpacingCombiningMark,
    EnclosingMark,
    DecimalDigitNumber,
    LetterNumber,
    OtherNumber,
    SpaceSeparator,
    LineSeparator,
    ParagraphSeparator,
    Control,
    Format,
    Surrogate,
    PrivateUse,
    ConnectorPunctuation,
    DashPunctuation,
    OpenPunctuation,
    ClosePunctuation,
    InitialQuotePunctuation,
    FinalQuotePunctuation,
    OtherPunctuation,
    MathSymbol,
    CurrencySymbol,
    ModifierSymbol,
    OtherSymbol,
    OtherNotAssigned,
  }

In System.Char, there is a static method GetUnicodeCategory() that can return the type of a character, i.e., the values from the above enumeration.

In addition to GetUnicodeCategory(), we also have specific static methods to determine the category of a character.

The following lists the static methods and their usage explanations along with the enumeration categories.

| Static Method | Description | Enumeration Representation |
|---------------------|-------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| IsControl | Non-printable characters with a value less than 0x20, e.g., \r, \n, \t, \0, etc. | None |
| IsDigit | Digits 0-9 and other numeral systems | DecimalDigitNumber |
| IsLetter | Alphabetic characters A-Z, a-z and other letters | UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter |
| IsLetterOrDigit | Letters and digits | Refers to IsLetter and IsDigit |
| IsLower | Lowercase letters | LowercaseLetter |
| IsNumber | Digits, fractions in Unicode, Roman numerals | DecimalDigitNumber,
LetterNumber,
OtherNumber |
| IsPunctuation | Punctuation in Western and other letter systems | ConnectorPunctuation,
DashPunctuation,
InitialQuotePunctuation,
FinalQuotePunctuation,
OtherPunctuation |
| IsSeparator | Spaces and all Unicode separators | SpaceSeparator,
ParagraphSeparator |
| IsSurrogate | Unicode values from 0x10000 to 0x10FFF | Surrogate |
| IsSymbol | Most printable characters | MathSymbol,
ModifierSymbol,
OtherSymbol |
| IsUpper | Uppercase letters | UppercaseLetter |
| IsWhiteSpace | All separators plus \t, \n, \r, \v, \f | SpaceSeparator,
ParagraphSeparator |

Example:

        char chA = 'A';
        char ch1 = '1';
        string str = "test string"; 

        Console.WriteLine(chA.CompareTo('B'));          //-----------  Output: "-1
                                                      // (meaning 'A' is 1 less than 'B')
        Console.WriteLine(chA.Equals('A'));             //-----------  Output: "True"
        Console.WriteLine(Char.GetNumericValue(ch1));   //-----------  Output: "1"
        Console.WriteLine(Char.IsControl('\t'));        //-----------  Output: "True"
        Console.WriteLine(Char.IsDigit(ch1));           //-----------  Output: "True"
        Console.WriteLine(Char.IsLetter(','));          //-----------  Output: "False"
        Console.WriteLine(Char.IsLower('u'));           //-----------  Output: "True"
        Console.WriteLine(Char.IsNumber(ch1));          //-----------  Output: "True"
        Console.WriteLine(Char.IsPunctuation('.'));     //-----------  Output: "True"
        Console.WriteLine(Char.IsSeparator(str, 4));    //-----------  Output: "True"
        Console.WriteLine(Char.IsSymbol('+'));          //-----------  Output: "True"
        Console.WriteLine(Char.IsWhiteSpace(str, 4));   //-----------  Output: "True"
        Console.WriteLine(Char.Parse("S"));             //-----------  Output: "S"
        Console.WriteLine(Char.ToLower('M'));           //-----------  Output: "m"
        Console.WriteLine('x'.ToString());              //-----------  Output: "x"
        Console.WriteLine(Char.IsSurrogate('\U00010F00'));		// Output: "False"
        char test = '\xDFFF';
        Console.WriteLine(test);						//-----------	Output:'?'
        Console.WriteLine(Char.GetUnicodeCategory(test));//-----------	Output: "Surrogate"

If you want to satisfy your curiosity, you can click here

3. Globalization

In C#, System.Char provides rich methods for character processing, such as the commonly used ToUpper, ToLower.

However, character processing is influenced by the user's culture.

When using methods in System.Char to process characters, you can call methods with the Invariant suffix or use CultureInfo.InvariantCulture for culture-independent character processing.

Example:

            Console.WriteLine(Char.ToUpper('i',CultureInfo.InvariantCulture));
            Console.WriteLine(Char.ToUpperInvariant('i'));

For character and string processing, the overload parameters and processing methods that may be used are as follows.

StringComparison

| Enumeration | Value | Description |
|----------------------------------|--------|------------------------------------------------------------------|
| CurrentCulture | 0 | Compare strings using culture-sensitive ordering based on current culture |
| CurrentCultureIgnoreCase | 1 | Compare strings using culture-sensitive ordering based on current culture, ignoring case |
| InvariantCulture | 2 | Compare strings using culture-sensitive ordering based on invariant culture |
| InvariantCultureIgnoreCase | 3 | Compare strings using culture-sensitive ordering based on invariant culture, ignoring case |
| Ordinal | 4 | Compare strings using ordinal (binary) ordering |
| OrdinalIgnoreCase | 5 | Compare strings using ordinal (binary) ordering, ignoring case |

CultureInfo

| Enumeration | Description |
|-------------------------|---------------------------------------------------------------|
| CurrentCulture | Gets the CultureInfo object representing the culture used by the current thread |
| CurrentUICulture | Gets or sets the CultureInfo object used by the resource manager to look up culture-specific resources at runtime |
| InstalledUICulture | Gets the CultureInfo object representing the installed culture of the operating system |
| InvariantCulture | Gets the CultureInfo object that is culture-independent (fixed) |
| IsNeutralCulture | Gets a value indicating whether the current CultureInfo represents a neutral culture |

4. System.String String

4.1 String Search

Strings have multiple search methods: StartsWith(), EndsWith(), Contains(), IndexOf.

StartsWith() and EndsWith() can use StringComparison comparison methods, and CultureInfo controls culture-related rules.

StartsWith(): Checks if the string starts with a specified string.

EndsWith(): Checks if the string ends with a specified string.

Contains(): Checks if the specified string exists anywhere in the string.

IndexOf: Returns the index of the first occurrence of the string or character. If the return value is -1, it indicates no match.

Usage Example:

            string a = "痴者工良(高级程序员劝退师)";
            Console.WriteLine(a.StartsWith("高级"));
            Console.WriteLine(a.StartsWith("高级",StringComparison.CurrentCulture));
            Console.WriteLine(a.StartsWith("高级",true, CultureInfo.CurrentCulture));
            Console.WriteLine(a.StartsWith("痴者",StringComparison.CurrentCulture));
            Console.WriteLine(a.EndsWith("劝退师)",true, CultureInfo.CurrentCulture));
            Console.WriteLine(a.IndexOf("高级",StringComparison.CurrentCulture));

Output:

False
False
False
True
True
5

Except for Contains(), the other three methods have multiple overloads, such as:

| Overload | Description |
|---------------------------------|--------------------------------------|
| (String) | Checks if it matches the specified string |
| (String, StringComparison) | Specifies how to compare the string |
| (String, Boolean, CultureInfo) | Controls case and culture rules for string matching |

These globalization and case matching rules will be discussed in later chapters.

4.2 String Extraction, Insertion, Deletion, Replacement

4.2.1 Extraction

The SubString() method can be used to extract N characters from a specific start index or the remaining characters.

            string a = "痴者工良(高级程序员劝退师)";
            Console.WriteLine(a.Substring(startIndex: 1, length: 3));
            // 者工良
            Console.WriteLine(a.Substring(startIndex: 5));
            // 高级程序员劝退师)

4.2.2 Insertion, Deletion, Replacement

Use the following methods:

Insert(): Insert a character or string after a specified index.

Remove(): Remove a substring based on the specified index.

PadLeft(): Extend the string to N characters long using a specific string on the left.

PadRight(): Extend the string to N characters long using a specific string on the right.

TrimStart(): Remove a specified character from the left side of the string, stopping when a non-matching character is encountered.

TrimEnd(): Remove a specified character from the right side of the string, stopping when a non-matching character is encountered.

Replace(): Replace a set of consecutive characters in the string with a new set of characters.

这是你提供的内容的英文翻译:

## 5. String Intern Pool

The following is a summary by the author, and due to my level, if there are any mistakes, I hope everyone can provide criticism and correction.

![images](https://img2018.cnblogs.com/blog/1315495/201811/1315495-20181129151124078-244644004.png)

The string intern pool is accomplished at the domain level, and the string intern pool can be shared among all assemblies within the domain.

The CLR maintains a table called the Intern Pool.

This table records references to all string instances declared using literals in the code.

When concatenating literals, the new string will also enter the string intern pool.

Only string instances declared using **literal declarations** will have references to the strings in the string intern pool.

However, whether it’s a field property or a string variable declared within a method, or even the default value of a method parameter, they will all enter the string intern pool.

For example:

```c#
        static string test = "一个测试";

        static void Main(string[] args)
        {
            string a = "a";

            Console.WriteLine("test:" + test.GetHashCode());
            
            TestOne(test);
            TestTwo(test);
            TestThree("一个测试");
        }

        public static void TestOne(string a)
        {
            Console.WriteLine("----TestOne-----");
            Console.WriteLine("a:" + a.GetHashCode());
            string b = a;
            Console.WriteLine("b:" + b.GetHashCode());
            Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
        }

        public static void TestTwo(string a = "一个测试")
        {
            Console.WriteLine("----TestTwo-----");
            Console.WriteLine("a:" + a.GetHashCode());
            string b = a;
            Console.WriteLine("b:" + b.GetHashCode());
            Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
        }

        public static void TestThree(string a)
        {
            Console.WriteLine("----TestThree-----");
            Console.WriteLine("a:" + a.GetHashCode());
            string b = a;
            Console.WriteLine("b:" + b.GetHashCode());
            Console.WriteLine("test - a :" + Object.ReferenceEquals(test, a));
        }

Output:

test:-407145577
----TestOne-----
a:-407145577
b:-407145577
test - a :True
----TestTwo-----
a:-407145577
b:-407145577
test - a :True
----TestThree-----
a:-407145577
b:-407145577
test - a :True

You can compare whether two strings are the same reference by using the static method Object.ReferenceEquals(s1, s2); or by calling the instance method .GetHashCode().

You can use unsafe code to directly modify strings in memory.

Refer to https://blog.benoitblanchon.fr/modify-intern-pool/

string a = "Test";

fixed (char* p = a)
{
    p[1] = '3';
}

Console.WriteLine(a);

Using *Microsoft.Diagnostics.Runtime* can retrieve information from the CLR.

After much research, the author found that .NET does not provide an API to view the hash table in the string constant pool.

For information on the usage of C# strings and the principles behind the intern pool, please refer to:

http://community.bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx

Obtaining a list of string literals from an assembly:

https://stackoverflow.com/questions/22172175/read-the-content-of-the-string-intern-pool

Documentation on the .NET Profiling API:

https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview?redirectedfrom=MSDN

.NET string intern pooling and improving string comparison performance:

http://benhall.io/net-string-interning-to-improve-performance/

Learning articles about C# string intern pool:

https://www.cnblogs.com/mingxuantongxue/p/3782391.html

https://www.xuebuyuan.com/189297.html

If there are any errors in the summary or knowledge, please feel free to correct me.

痴者工良

高级程序员劝退师

文章评论