有关String的转换的一篇好文章

?


Pay Close Attention - String Handling

I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.

Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a?TCHAR?string.

TCHAR?and the?_t?functions (for example,?_tcscpy()) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you‘ll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to?chars instead of?TCHARs, just for simplicity. You should definitely read up on the?TCHAR?types, though, to be aware of them in case you ever come across them in code written by others.

When you get a Unicode string back from a COM method, you can convert it to a?char?string in one of several ways:

  1. Call the?WideCharToMultiByte()?API.
  2. Call the CRT function?wcstombs().
  3. Use the?CString?constructor or assignment operator (MFC only).
  4. Use an ATL string conversion macro.

    WideCharToMultiByte()

    You can convert a Unicode string to an ANSI string with the?WideCharToMultiByte()?API. This API‘s prototype is:

    int WideCharToMultiByte (

    UINT CodePage,

    DWORD dwFlags,

    LPCWSTR lpWideCharStr,

    int cchWideChar,

    LPSTR lpMultiByteStr,

    int cbMultiByte,

    LPCSTR lpDefaultChar,

    LPBOOL lpUsedDefaultChar );

    The parameters are:

    CodePage

    The code page to convert the Unicode characters into. You can pass?CP_ACP?to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it‘s important to use the right code page to get proper display of accented characters.

    dwFlags

    dwFlags?determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character is?è. If this character is in the code page specified in?CodePage, then nothing special happens. However, if it is?not?in the code page, Windows has to convert it to something else.
    Passing?WC_COMPOSITECHECK?makes the API check for non-mapping composite characters. PassingWC_SEPCHARS?makes Windows break the character into two, the letter followed by the diacritic, for example?e`. Passing?WC_DISCARDNS?makes Windows discard the diacritics. Passing?WC_DEFAULTCHAR?makes Windows replace the composite characters with a "default" character, specified in the?lpDefaultChar?parameter. The default behavior is?WC_SEPCHARS.

    lpWideCharStr

    The Unicode string to convert.

    cchWideChar

    The length of?lpWideCharStr?in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.

    lpMultiByteStr

    A?char?buffer that will hold the converted string.

    cbMultiByte

    The size of?lpMultiByteStr, in bytes.

    lpDefaultChar

    Optional - a one-character ANSI string that contains the "default" character to be inserted when?dwFlagscontains?WC_COMPOSITECHECK | WC_DEFAULTCHAR?and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).

    lpUsedDefaultChar

    Optional - a pointer to a?BOOL?that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don‘t care about this information.

    Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here‘s an example showing how to use the API:

    ?Collapse?|?Copy Code

    // Assuming we already have a Unicode string wszSomeString...

    char szANSIString [MAX_PATH];

    ?

    WideCharToMultiByte ( CP_ACP, // ANSI code page

    WC_COMPOSITECHECK, // Check for accented characters

    wszSomeString, // Source Unicode string

    -1, // -1 means string is zero-terminated

    szANSIString, // Destination char string

    sizeof(szANSIString), // Size of buffer

    NULL, // No default character

    NULL ); // Don‘t care about this flag

    After this call,?szANSIString?will contain the ANSI version of the Unicode string.

    wcstombs()

    The CRT function?wcstombs()?is a bit simpler, but it just ends up calling?WideCharToMultiByte(), so in the end the results are the same. The prototype for?wcstombs()?is:

    ?Collapse?|?Copy Code

    size_t wcstombs (

    char* mbstr,

    const
    wchar_t* wcstr,

    size_t count );

    The parameters are:

    mbstr

    A?char?buffer to hold the resulting ANSI string.

    wcstr

    The Unicode string to convert.

    count

    The size of the?mbstr?buffer, in bytes.

    wcstombs()?uses the?WC_COMPOSITECHECK | WC_SEPCHARS?flags in its call to?WideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:

    ?Collapse?|?Copy Code

    wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );

    CString

    The MFC?CString?class contains constructors and assignment operators that accept Unicode strings, so you can letCString?do the conversion work for you. For example:

    ?Collapse?|?Copy Code

    // Assuming we already have wszSomeString...

    ?

    CString str1 ( wszSomeString ); // Convert with a constructor.

    CString str2;

    ?

    str2 = wszSomeString; // Convert with an assignment operator.

    ATL macros

    ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use the?W2A()?macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should use?OLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here‘s an example of how to use these macros.

    ?Collapse?|?Copy Code

    #include <atlconv.h>

    ?

    // Again assuming we have wszSomeString...

    ?

    {

    char szANSIString [MAX_PATH];

    USES_CONVERSION; // Declare local variable used by the macros.

    ?

    lstrcpy ( szANSIString, OLE2A(wszSomeString) );

    }

    The?OLE2A()?macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with?lstrcpy(). Other macros you should look into areW2T()?(Unicode to?TCHAR), and?W2CT()?(Unicode string to?const?TCHAR?string).

    There is an?OLE2CA()?macro (Unicode string to a?const?char?string) which we could‘ve used in the code snippet above.?OLE2CA()?is actually the correct macro for that situation, since the second parameter to?lstrcpy()?is aconst?char*, but I didn‘t want to throw too much at you at once.

    Sticking with Unicode

    On the other hand, you can just keep the string in Unicode if you won‘t be doing anything complicated with the string. If you‘re writing a console app, you can print Unicode strings with the?std::wcout?global variable, for example:

    ?Collapse?|?Copy Code

    wcout << wszSomeString;

    But keep in mind that?wcout?expects all strings to be in Unicode, so if you have any "normal" strings, you‘ll still need to output them with?std::cout. If you have string literals, prefix them with?L?to make them Unicode, for example:

    ?Collapse?|?Copy Code

    wcout << L"The Oracle says..." << endl << wszOracleResponse;

    If you keep a string in Unicode, there are a couple of restrictions:

  • You must use the?wcsXXX()?string functions, such as?wcslen(), on Unicode strings.
  • With very few exceptions, you cannot pass a Unicode string to a Windows API on Windows 9x. To write code that will run on 9x and NT unchanged, you‘ll need to use the?TCHAR?types, as described in MSDN.
时间: 2024-10-01 11:40:59

有关String的转换的一篇好文章的相关文章

char*,const char*和string 三者转换

1. const char* 和string 转换 (1) const char*转换为 string,直接赋值即可. EX: const char* tmp = "tsinghua". string s = tmp; (2) string转换为const char*,利用c_str() EX:  string s = "tsinghua"; const char*tmp = s.c_str(); 2. char*和const char*之间的转换 (1) cons

C++中将string类型变量转换成int型变量

需要的头文件:#include<sstream> 操作: string s1="124": int x; stringstream ss; ss<<s1; ss>>x; C++中将string类型变量转换成int型变量,布布扣,bubuko.com

unicode string和ansi string的转换函数及获取程序运行路径的代码

#pragma once#include <string> namespace stds { class tool { public: std::string ws2s(const std::wstring& ws) { std::string curLocale = setlocale(LC_ALL, NULL); // curLocale = "C"; setlocale(LC_ALL, "chs"); const wchar_t* _Sou

实战c++中的string系列--CDuiString和string的转换(duilib中的cduistring)

使用所duilib的人定会知道cduistring类型,先看看这个类是怎么定义的: class UILIB_API CDuiString { public: enum { MAX_LOCAL_STRING_LEN = 127/*63*/ }; CDuiString(); CDuiString(const TCHAR ch); CDuiString(const CDuiString& src); CDuiString(LPCTSTR lpsz, int nLen = -1); ~CDuiStrin

java面试题,将String字符串转换成数字

题目要求:将String字符串转换成数字,不能用java自带的方法转换字符串,要求自己写一个atoi(String s),如果输入的不是数字则返回0. import java.util.Scanner; /** * Created by Dell on 2014/7/14. * * 面试题 * 将字符串转换成数字,不用java自带的方法 */ public class MianShi_01 { public static void main(String[] args) { Scanner in

HTML5 Blob与ArrayBuffer、TypeArray和字符串String之间转换

1.将String字符串转换成Blob对象 //将字符串 转换成 Blob 对象 var blob = new Blob(["Hello World!"], { type: 'text/plain' }); console.info(blob); console.info(blob.slice(1, 3, 'text/plain')); 2.将TypeArray  转换成 Blob 对象 //将 TypeArray 转换成 Blob 对象 var array = new Uint16A

CDuiString和String的转换

很多时候 难免用到CDuiString和string的转换. 我们应该注意到,CDuiString类有个方法: LPCTSTR GetData() const; 可以通过这个方法,把CDuiString变为LPCTSTR : 所以下一步只是如何把LPCTSTR 转为string了. 首先写一个StringFromLPCTSTR函数,完成转换: std::string StringFromLPCTSTR(LPCTSTR str) { #ifdef _UNICODE int size_str = W

C# 之 将string数组转换到int数组并获取最大最小值

1.string 数组转换到 int 数组 string[] input = { "1", "2", "3", "4", "5", "6", "7", "8", "9" }; int[] output = Array.ConvertAll<string, int>(input, delegate(string s)

【python】bytearray和string之间转换,用在需要处理二进制文件和数据流上

最近在用python搞串口工具,串口的数据流基本读写都要靠bytearray,而我们从pyqt的串口得到的数据都是string格式,那么我们就必须考虑到如何对这两种数据进行转换了,才能正确的对数据收发. 先考虑的接收串口数据,那么格式是bytearray,下面需要处理成string格式来显示: #按string来显示,byarray代表接收到的数据 readstr = byarray.decode('utf-8')#这样就直接转换成str格式 #强制转换 readstr = str(byarra