utf8 to unicode

json utf8 to unicode (stm32 发烧群友提供），仅留做参考，不保证其准确及可用。

u32 UTF8_to_Unicode(char *dst, char *src)  //json utf8 to unicode
{
    u32 i = 0, unicode = 0, ii, iii;
    int codeLen = 0;

    while ( *src )
    {
        //1. UTF-8 ---> Unicode
        if(0 == (src[0] & 0x80))
        {
            // 单字节
            codeLen = 1;
            unicode = src[0];
        }
        else if(0xC0 == (src[0] & 0xE0) && 0x80 == (src[1] & 0xC0))
        {// 双字节
            codeLen = 2;
            unicode = (u32)((((u32)src[0] & 0x001F) << 6) | ((u32)src[1] & 0x003F));
        }
        else if(0xE0 == (src[0] & 0xF0) && 0x80 == (src[1] & 0xC0) && 0x80 == (src[2] & 0xC0))
        {// 三字节
            codeLen = 3;
            ii = (((u32)src[0] & 0x000F) << 12);
            iii = (((u32)src[1] & 0x003F) << 6);
            unicode = ii|iii|((u32)src[2] & 0x003F);
            unicode = (u32)((((u32)src[0] & 0x000F) << 12) | (((u32)src[1] & 0x003F) << 6) | ((u32)src[2] & 0x003F));
        }
        else if(0xF0 == (src[0] & 0xF0) && 0x80 == (src[1] & 0xC0) && 0x80 == (src[2] & 0xC0) && 0x80 == (src[3] & 0xC0))
        {// 四字节
            codeLen = 4;
            unicode = (((int)(src[0] & 0x07)) << 18) | (((int)(src[1] & 0x3F)) << 12) | (((int)(src[2] & 0x3F)) << 6) | (src[3] & 0x3F);
        }
        else
        {
            break;
        }
        src += codeLen;
        if (unicode < 0x80)
        {
            if (i == 0 && unicode == 0x20)
            {
                continue;
            }
        }
        i += 2;
        *dst++ = (u8)((unicode&0xff));
        *dst++ = (u8)(((unicode>>8)&0xff));
    } // end while
    *dst = 0;

    return i;
}

原文地址：https://www.cnblogs.com/LittleTiger/p/12187600.html

时间： 2024-08-13 07:07:49

utf8 to unicode的相关文章

读写UTF-8、Unicode文件（加上了文件头，貌似挺好用）

conf配置文件一些为UTF-8和Unicode格式,这样便可良好的支持多语言,从网上查阅资料后,将读写UTF-8.Unicode文件写了几个最精简的函数,更新后加了是否写文件头的功能,以适应更多需要,注意函数未加防错保护. 参数说明:f文件名.s写入或读取的文件内容.hs文件头.b是否读写文件头. UTF-8文件写入函数代码 procedure SaveUTF(f:string;s:string;b:boolean=true);var ms:TMemoryStream; hs:Strin

UTF-8和Unicode

What's the difference between unicode and utf8? up vote 103 down vote favorite 49 Is it true that unicode=utf16 ? UPDATE Many are saying unicode is a standard not an encoding,but most editors support save as Unicode encoding actually. As Rasmus state

utf-8、unicode 、nsstring转换

一.nsstring转utf-8 NSString * str1=[@"你好" stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; 二.utf-8转unicode NSString * title=[self utf8ToUnicode:@"你好"]; - (NSString *) utf8ToUnicode:(NSString *)string { NSUInteger length =

paip.utf-8，unicode编码的本质输出unicode文件原理 python

#别的语言,java php都是unicode,走十python不一样. #enddef #todo write to unicode encode fileHandle = open ( r"c:\fmtSmpEnRst.txt", 'w',encoding="UTF-16" ) #todox utf8 not decode...now UTF-16是Unicode lines=file2list(f, encode="utf-16") 作者

Python中GBK, UTF-8和Unicode的编码问题

编码问题,一直是使用python2时的一块心病.几乎所有的控制台输入输出.IO操作和HTTP操作都会涉及如下的编码问题: UnicodeDecodeError:‘ascii’codec can’t decodebyte0xc4inposition10:ordinalnotinrange(128) 这究竟是是个什么东西?!有时稀里糊涂地用一坨encode(),decode()之类的函数让程序能跑对了,可是下次遇到非ASCII编码时又悲剧了. 那么Python 2.x中的字符串究竟是个什么呢? 基本

（转载）GBK、UTF8、UNICODE编码转换

GBK.UTF8.UNICODE编码转换 1 string GBKToUTF8(const std::string& strGBK) 2 { 3 int nLen = MultiByteToWideChar(CP_ACP, 0, strGBK.c_str(), -1, NULL, 0); 4 WCHAR * wszUTF8 = new WCHAR[nLen]; 5 MultiByteToWideChar(CP_ACP, 0, strGBK.c_str(), -1, wszUTF8, nLen);

libiconv gbk utf-8 转 unicode

#include <stdio.h> #include <string.h> #include <stdlib.h> #include "iconv.h" #define CODE_UNICODE "UCS-2LE" int code_convert(const char *pFromCharset, const char *pToCharset, char *pInBuf, size_t nInLen, char *pOutBu

MFC格式转换 UTF8 ANSI UNICODE

函数拿起来就可以用参数说明:sChartSet : FromANSI(ANSI->UNICODE) , ToANSI (UNICODE->ANSI) , FromUTF8 (UTF8->UNICODE) , ToUTF8 (UNICODE->UTF8) CString CSqlConTestDlg::UnicodeCovert(CString sSourceStr , CString sCharSet) { bool bToUnicode = true; if(!strnicmp

【转】utf-8与Unicode的转化

作者:uuspider链接:https://www.zhihu.com/question/23374078/answer/65352538来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处. 举一个例子:It's 知乎日报你看到的unicode字符集是这样的编码表: I 0049 t 0074 ' 0027 s 0073 0020 知 77e5 乎 4e4e 日 65e5 报 62a5 每一个字符对应一个十六进制数字. 计算机只懂二进制,因此,严格按照unicode的