TextEncoding

From Xojo Documentation

Revision as of 18:56, 19 November 2009 by WikiSysop (talk) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Description

Used to specify the text encoding of a String.


Super Class

Object

Properties

Name Type Description
Base Integer The type of encoding. The entry for TextConverter contains the possible values of Base.
Code
Introduced 5.0
Integer The Mac OS TextEncoding value, useful for declares.

You can also use it to compare two encodings: If the Code properties of two TextEncoding objects are equal, then they represent the same text encoding (including base, variation, and format).

Format Integer A format for the Base text encoding.

Used by Unicode for defining which format of Unicode you wish to use.

InternetName String Internet Text Encoding name.
Variant Integer A variant of the Base text encoding.

The entry for TextConverter contains the possible values of Variant.


Methods

Name Parameters Return Type Description
Chr
Introduced 5.0
CodePoint as Integer String Returns the character in the given encoding specified by CodePoint.

The "code point" of the character is the same as the value that Asc returns. In general, it is safest to use this rather then the Chr function unless you are dealing with ASCII codes only (0-127).

Equals
Introduced 5.0
otherEncoding as TextEncoding Boolean Compares the given encoding to the passed encoding. Returns a Boolean.
Operator_Compare
Introduced 2008r2
otherEncoding as TextEncoding Boolean Compares the given encoding to the passed encoding. Currently, only the equals comparison is implemented; relational testing is undefined. As a result, it currently calls the Equals function.


Notes

When a computer stores text, it encodes each character as a numeric value and stores the byte (or bytes) associated with that number. When it needs to display or print that character, it consults the encoding scheme to determine which character the number represents.

The first computers used the encoding scheme called "ASCII", which stands for American Standard Code for Information Exchange. It specified 128 values and includes codes for upper and lower case letters, numbers, the common symbols on a keyboard, and some "invisible" control codes that were heavily used in early computers.

As computers became more sophisticated and were introduced in non-English speaking countries, the limitations of the ASCII encoding scheme became apparent. It didn't include codes for accented characters and had no chance of handling idiographic languages, such as Japanese or Chinese, which require thousands of characters.

As a result, extensions to the ASCII encoding scheme were developed. Outside the range of 0-127, the schemes, in general, do not agree. For example, in the US Mac OS and Windows computers use different encodings for codes 128-255. Many other encoding schemes for handling languages that use non-ASCII characters have been developed.

The most general solution to the problem is an encoding called Unicode. It is designed to handle every character in every language. It also enables you to represent a mixture of languages within one text stream. However, not all strings that you may encounter use Unicode.

When you encounter a string, you need to know its encoding in order to interpret the sequence of bytes (or double-bytes) that make up the string's content. In REALbasic, every string contains both the bytes (content) and the encoding (if it is known; it is Nil if not known). REALbasic supports two different formats of Unicode, UTF-8 and UTF-16. All strings in a REALbasic project are compiled as UTF-8. This is a Unicode encoding that uses one byte for ASCII characters and up to four bytes for non-ASCII characters.

If you work only with strings that are created and managed within your REALbasic application, you probably don't need to deal with encodings directly, as REALbasic takes care of the issues via UTF-8. However, if you receive strings from an outside source such as via the internet, an external database (that is, not the REAL SQL Database engine), or a text file, you should let REALbasic know what encoding is used. If the string is a Memoryblock, the encoding will be Nil.

You can assign an encoding to a string in several ways. For example, if you are reading the string using the TextInputStream class, you use the Encoding property. The Encodings module gives you access to all known encodings. Here is an example that reads a text file that uses the MacRoman encoding:

f=GetOpenFolderItem("text") //file type defined as as File Type
If f <> Nil then
 t=TextInputStream.Open(f)
 t.Encoding=Encodings.MacRoman //specify encoding of input stream
TextArea1.text=t.ReadAll
 t.Close
End if

Also, the Read, ReadLine, and ReadAll methods take an optional parameter that lets you specify the encoding.

If you need to output a string in a specific encoding, you can use the ConvertEncoding function to do so. For example, this code converts the text in a TextField to the WindowsANSI encoding:

Dim s as String

You will find text encoding helpful if you develop:

Internet applications, such as web browsers or e-mail applications Applications that transfer text across different platforms Applications based in Unicode

The Encoding function makes it easy to obtain the TextEncoding of any string. Use the Encodings module to obtain a specified text encoding. Some of the most useful are UTF8, UTF16, UCS4, ASCII, MacRoman, MacJapanese, and WindowsLatin1. Use the Autocomplete feature of the Code Editor to view the complete list.

ASCII Codes

The following table presents the ASCII character codes. It presents the Decimal, Hex, and Octal values for ASCII codes (0 to 127).


Decimal Hex Octal Result Decimal Hex Octal Result
0 0 0 NUL 32 20 40 SP
1 1 1 SOH 33 21 41 !
2 2 2 STX 34 22 42 "
3 3 3 ETX 35 23 43 #
4 4 4 EOT 36 24 44 $
5 5 5 ENQ 37 25 45 %
6 6 6 ACK 38 26 46 &
7 7 7 BEL 39 27 47 '
8 8 10 BS 40 28 50 (
9 9 11 HT 41 29 51 )
10 A 12 LF 42 2A 52 *
11 B 13 VT 43 2B 53
12 C 14 FF 44 2C 54 ,
13 D 15 CR 45 2D 55
14 E 16 SO 46 2E 56 .
15 F 17 SI 47 2F 57 /
16 10 20 DLE 48 30 60 0
17 11 21 DC1 49 31 61 1
18 12 22 DC2 50 32 62 2
19 13 23 DC3 51 33 63 3
20 14 24 DC4 52 34 64 4
21 15 25 NAK 53 35 65 5
22 16 26 SYN 54 36 66 6
23 17 27 ETB 55 37 67 7
24 18 30 CAN 56 38 70 8
25 19 31 EM 57 39 71 9
26 1A 32 SUB 58 3A 72 :
27 1B 33 ESC 59 3B 73 ;
28 1C 34 FS 60 3C 74 <
29 1D 35 GS 61 3D 75 =
30 1E 36 RS 62 3E 76 >
31 1F 37 US 63 3F 77 ?
64 40 100 @ 96 60 140 '
65 41 101 A 97 61 141 a
66 42 102 B 98 62 142 b
67 43 103 C 99 63 143 c
68 44 104 D 100 64 144 d
69 45 105 E 101 65 145 e
70 46 106 F 102 66 146 f
71 47 107 G 103 67 147 g
72 48 110 H 104 68 150 h
73 49 111 I 105 69 151 i
74 4A 112 J 106 6A 152 j
75 4B 113 K 107 6B 153 k
76 4C 114 L 108 6C 154 l
77 4D 115 M 109 6D 155 m
78 4E 116 N 110 6E 156 n
79 4F 117 O 111 6F 157 o
80 50 120 P 112 70 160 p
81 51 121 Q 113 71 161 q
82 52 122 R 114 72 162 r
83 53 123 S 115 73 163 s
84 54 124 T 116 74 164 t
85 55 125 U 117 75 165 u
86 56 126 V 118 76 166 v
87 57 127 W 119 77 167 w
88 58 130 X 120 78 170 x
89 59 131 Y 121 79 171 y
90 5A 132 Z 122 7A 172 z
91 5B 133 [ 123 7B 173 {
92 5C 134 \ 124 7C 174
93 5D 135 ] 125 7D 175

|- |94

|5E

|136

|^

|126

|7E

|176

|~

|- |95

|5F

|137

|-

|127

|7F

|177

|DEL

|- |}


Examples

The following example obtains the TextEncoding of the string passed to the Encoding function.

Dim t as TextEncoding
If t <> Nil then
 staticText1.text="Base="+Str(t.base)
 statictext2.text="Format="+Str(t.format)
 statictext3.text="Variant="+Str(t.variant)
end if

The following statement uses the Encodings module to obtain the UTF8 text encoding for text in a TextField.

TextField2.text=DefineEncoding(TextField1.text,Encodings.UTF8)

The following example uses the Chr method to obtain the character corresponding to the code point of 165 for the MacRoman encoding, the bullet character (ߦ):

Dim s as String
s=Encodings.MacRoman.Chr(165)


See Also

Chr, ConvertEncoding, DefineEncodingEncoding, GetInternetTextEncoding, GetTextConverter, GetTextEncoding functions; Encodings module.