Term | What it really is | Why it matters to the compiler |
Unicode scalar value | Any code point except 0xD800–0xDFFF | Compilers can say "one scalar value = one character", ignoring UTF‑16 artefacts |
Surrogate pair | Two UTF‑16 code units that encode one scalar value ≥ 0x10000 | Only relevant if the data is stored in UTF‑16 |
UTF‑32 / UCS‑4 char | One 32‑bit integer that equals the scalar value | No surrogate logic needed; every code point fits |
Aspect | How Go handles it |
Source encoding | Compiler expects the file to be valid UTF‑8 |
String / rune literals | [tt]\uXXXX[/tt] escapes may NOT name a surrogate; use [tt]\U0001F600[/tt] instead of a pair |
Data representation | [tt]string[/tt] = immutable UTF‑8; [tt]rune[/tt] = 32‑bit scalar (built‑in UTF‑32 cell) |
Library helpers | [tt]encoding/utf16[/tt] offers [tt]Encode[/tt]/[tt]Decode[/tt]/[tt]IsSurrogate[/tt] |
JSON & friends | Parsers merge [tt]\uXXXX\uXXXX[/tt] into a single rune; invalid pairs become U+FFFD |
Every‑day impact | You never see surrogates unless you purposely handle UTF‑16 |
Concept | Free Pascal type / facility |
UTF‑16 storage | [tt]WideChar[/tt] (16‑bit) & [tt]UnicodeString[/tt]/[tt]WideString[/tt]; non‑BMP stored as two cells |
Surrogate helpers | Unit [tt]Character[/tt] (Delphi‑compatible): [tt]IsSurrogate[/tt], [tt]ConvertToUtf32[/tt], [tt]ConvertFromUtf32[/tt] |
UTF‑32 storage | [tt]UCS4Char[/tt] (32‑bit) & dynamic [tt]UCS4String[/tt] |
Source code | Parser expects UTF‑8; you may embed a pair as [tt]#$D83D#$DE00[/tt] |
Conversions | [tt]SysUtils[/tt] + [tt]LazUTF8[/tt] for UTF‑8 ⇄ UTF‑16 / UTF‑32 |
Every‑day impact | If you stick to [tt]UTF8String[/tt] or [tt]UCS4String[/tt], you rarely care about surrogates |
program Smile32;
uses SysUtils, Character;
var
s16 : UnicodeString;
cp : UCS4Char;
begin
s16 := #$D83D#$DE00; // 😀 in UTF‑16
cp := ConvertToUtf32(s16, 1); // -> $1F600
Writeln('U+' + IntToHex(cp, 6));
end.
Go | FreePascal | |
Default string encoding | UTF‑8 | Platform‑dependent (UTF‑8 on nix, UTF‑16 on Windows) |
"One code‑point" primitive | [tt]rune[/tt] | [tt]UCS4Char[/tt] |
Literal rejects half‑surrogate? | Yes (compile‑time error) | No (validated at run time if you ask) |
Std‑lib helpers | [tt]encoding/utf16[/tt] | [tt]TCharacter.[/tt] |
Need to think about pairs daily? | Rarely | Only when you keep data in UTF‑16 |
Page created in 0.101 seconds with 11 queries.