Use an ATL string conversion macro.
WideCharToMultiByte()
You can convert a Unicode string to an ANSI string with the?WideCharToMultiByte()?API. This API‘s prototype is:
int WideCharToMultiByte (
UINT CodePage,
DWORD dwFlags,
LPCWSTR lpWideCharStr,
int cchWideChar,
LPSTR lpMultiByteStr,
int cbMultiByte,
LPCSTR lpDefaultChar,
LPBOOL lpUsedDefaultChar );
The parameters are:
CodePage
The code page to convert the Unicode characters into. You can pass?CP_ACP?to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it‘s important to use the right code page to get proper display of accented characters.
dwFlags
dwFlags?determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character is?è. If this character is in the code page specified in?CodePage, then nothing special happens. However, if it is?not?in the code page, Windows has to convert it to something else.
Passing?WC_COMPOSITECHECK?makes the API check for non-mapping composite characters. PassingWC_SEPCHARS?makes Windows break the character into two, the letter followed by the diacritic, for example?e`. Passing?WC_DISCARDNS?makes Windows discard the diacritics. Passing?WC_DEFAULTCHAR?makes Windows replace the composite characters with a "default" character, specified in the?lpDefaultChar?parameter. The default behavior is?WC_SEPCHARS.
lpWideCharStr
The Unicode string to convert.
cchWideChar
The length of?lpWideCharStr?in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.
lpMultiByteStr
A?char?buffer that will hold the converted string.
cbMultiByte
The size of?lpMultiByteStr, in bytes.
lpDefaultChar
Optional - a one-character ANSI string that contains the "default" character to be inserted when?dwFlagscontains?WC_COMPOSITECHECK | WC_DEFAULTCHAR?and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).
lpUsedDefaultChar
Optional - a pointer to a?BOOL?that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don‘t care about this information.
Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here‘s an example showing how to use the API:
?Collapse?|?Copy Code
// Assuming we already have a Unicode string wszSomeString...
char szANSIString [MAX_PATH];
?
WideCharToMultiByte ( CP_ACP, // ANSI code page
WC_COMPOSITECHECK, // Check for accented characters
wszSomeString, // Source Unicode string
-1, // -1 means string is zero-terminated
szANSIString, // Destination char string
sizeof(szANSIString), // Size of buffer
NULL, // No default character
NULL ); // Don‘t care about this flag
After this call,?szANSIString?will contain the ANSI version of the Unicode string.
wcstombs()
The CRT function?wcstombs()?is a bit simpler, but it just ends up calling?WideCharToMultiByte(), so in the end the results are the same. The prototype for?wcstombs()?is:
?Collapse?|?Copy Code
size_t wcstombs (
char* mbstr,
const
wchar_t* wcstr,
size_t count );
The parameters are:
mbstr
A?char?buffer to hold the resulting ANSI string.
wcstr
The Unicode string to convert.
count
The size of the?mbstr?buffer, in bytes.
wcstombs()?uses the?WC_COMPOSITECHECK | WC_SEPCHARS?flags in its call to?WideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:
?Collapse?|?Copy Code
wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );
CString
The MFC?CString?class contains constructors and assignment operators that accept Unicode strings, so you can letCString?do the conversion work for you. For example:
?Collapse?|?Copy Code
// Assuming we already have wszSomeString...
?
CString str1 ( wszSomeString ); // Convert with a constructor.
CString str2;
?
str2 = wszSomeString; // Convert with an assignment operator.
ATL macros
ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use the?W2A()?macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should use?OLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here‘s an example of how to use these macros.
?Collapse?|?Copy Code
#include <atlconv.h>
?
// Again assuming we have wszSomeString...
?
{
char szANSIString [MAX_PATH];
USES_CONVERSION; // Declare local variable used by the macros.
?
lstrcpy ( szANSIString, OLE2A(wszSomeString) );
}
The?OLE2A()?macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with?lstrcpy(). Other macros you should look into areW2T()?(Unicode to?TCHAR), and?W2CT()?(Unicode string to?const?TCHAR?string).
There is an?OLE2CA()?macro (Unicode string to a?const?char?string) which we could‘ve used in the code snippet above.?OLE2CA()?is actually the correct macro for that situation, since the second parameter to?lstrcpy()?is aconst?char*, but I didn‘t want to throw too much at you at once.
Sticking with Unicode
On the other hand, you can just keep the string in Unicode if you won‘t be doing anything complicated with the string. If you‘re writing a console app, you can print Unicode strings with the?std::wcout?global variable, for example:
?Collapse?|?Copy Code
wcout << wszSomeString;
But keep in mind that?wcout?expects all strings to be in Unicode, so if you have any "normal" strings, you‘ll still need to output them with?std::cout. If you have string literals, prefix them with?L?to make them Unicode, for example:
?Collapse?|?Copy Code
wcout << L"The Oracle says..." << endl << wszOracleResponse;
If you keep a string in Unicode, there are a couple of restrictions: