Tuesday, April 2, 2013

Windows uses Unicode, and specifically UTF-16 encoding

Windows natively supports Unicode strings for UI elements, file names, and so forth. Unicode is the preferred character encoding, because it supports all character sets and languages. Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as a 16-bit value. UTF-16 characters are called widecharacters, to distinguish them from 8-bit ANSI characters. 


Unicode and ANSI Functions

When Microsoft introduced Unicode support to Windows, it eased the transition by providing two parallel sets of APIs, one for ANSI strings and the other for Unicode strings. For example, there are two functions to set the text of a window's title bar:
  • SetWindowTextA takes an ANSI string.
  • SetWindowTextW takes a Unicode string.
Internally, the ANSI version translates the string to Unicode. 

New applications should always call the Unicode versions. Many world languages require Unicode. If you use ANSI strings, it will be impossible to localize your application. The ANSI versions are also less efficient, because the operating system must convert the ANSI strings to Unicode at run time.

No comments: