MSVCRT.DLL Console I/O Bug(setlocale(LC_CTYPE, "Chinese_China.936"))

I have been quite annoyed by a Windows bug that causes a huge number of open-source command-line tools to choke on multi-byte characters at the Windows Command Prompt. The MSVCRT.DLL shipped with Windows Vista or later has been having big troubles with such characters. While Microsoft tools and compilers after Visual Studio 6.0 do not use this DLL anymore, the GNU tools on Windows, usually built by MinGW or Mingw-w64, are dependent on this DLL and suffer from this problem. One cannot even use ls to display a Chinese file name, when the system locale is set to Chinese.

The following simple code snippet demonstrates the problem:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

#include <locale.h>

#include <stdio.h>

char msg[] = "\xd7\xd6\xb7\xfb Char";

wchar_t wmsg[] = L"字符 char";

void Test1()

{

    char* ptr = msg;

    printf("Test 1: ");

    while (*ptr) {

        putchar(*ptr++);

    }

    putchar(‘\n‘);

}

void Test2()

{

    printf("Test 2: ");

    puts(msg);

}

void Test3()

{

    wchar_t* ptr = wmsg;

    printf("Test 3: ");

    while (*ptr) {

        putwchar(*ptr++);

    }

    putwchar(L‘\n‘);

}

int main()

{

    char buffer[32];

    puts("Default C locale");

    Test1();

    Test2();

    Test3();

    putchar(‘\n‘);

    puts("Chinese locale");

    setlocale(LC_CTYPE, "Chinese_China.936");

    Test1();

    Test2();

    Test3();

    putchar(‘\n‘);

    puts("English locale");

    setlocale(LC_CTYPE, "English_United States.1252");

    Test1();

    Test2();

    Test3();

}

When built with a modern version of Visual Studio, it gives the expected output (console code page is 936):

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

I.e. when the locale is the default ‘C’, the ‘ANSI’ version of character output routines can successfully output single-byte and multi-byte characters, while putwchar, the ‘Unicode’ version of putchar, fails at the multi-byte characters (reasonably, as the C locale does not understand how to translate Chinese characters). When the locale is set correctly to code page 936 (Simplified Chinese), everything is correct. When the locale is set to code page 1252 (Latin), the corresponding characters at the same code points of the original Chinese characters (‘×Ö·û’ instead of ‘字符’) are shown with the ‘ANSI’ routines, though ‘Ö’ (\xd6) and ‘û’ (\xfb) are shown as ‘?’ because they do not exist in code page 936. The Chinese characters, of course, cannot be shown with putwchar in this locale, just like the C locale.

When built with GCC, the result is woeful:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1:  Char
Test 2: 字符 Char
Test 3:  char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

Two things are worth noticing:

  • putchar stops working for Chinese when the locale is correctly set.
  • putwchar never works for Chinese.

Horrible and thoroughly broken! (Keep in mind that Microsoft is to blame here. You can compile the program with MSVC 6.0 using the /MD option, and the result will be the same—an executable that works in Windows XP but not in Windows Vista or later.)

I attacked this problem a few years ago, and tried some workarounds. The solution I came up with looked so fragile that I did not push it up to the MinGW library. It was a personal failure, as well as an indication that working around a buggy implementation without affecting the application code can be very difficult or just impossible.



The problem occurs only with the console, where the Microsoft runtime does some translation (broken in MSVCRT.DLL, but OK in newer MSVC runtimes). It vanishes when users redirect the output from the console. So one solution is not to use the Command Prompt at all. The Cygwin Terminal may be a good choice, especially for people familiar with Linux/Unix. I have Cygwin installed, but sometimes I still want to do things in the more Windows-y way. I figured I could make a small tool (like cat) to get the input from stdin, and forward everything to stdout. As long as this tool is compiled by a Microsoft compiler, things should be OK. Then I thought a script could be faster. Finally, I came up with putting the following line into an mbf.bat:

@perl -p -e ""

(Perl is still wonderful for text processing, even in this ‘empty’ program!)

Now the executables built by GCC and MSVC give the same result, if we append ‘|mbf’ on the command line:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

If you know how to make Microsoft fix the DLL problem, do it. Otherwise you know at least a workaround now. 



The following code is my original partial solution to the problem, and it may be helpful to your GCC-based project. I don’t claim any copyright of it, nor will I take any responsibilities for its use.


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

/* mingw_mbcs_safe_io.c */

#include <mbctype.h>

#include <stdio.h>

/* Output functions that work with the Windows 7+ MSVCRT.DLL

 * for multi-byte characters on the console.  Please notice

 * that buffering must not be enabled for the console (e.g.

 * by calling setvbuf); otherwise weird things may occur. */

int __cdecl _mgw_flsbuf(int ch, FILE* fp)

{

  static char lead = ‘\0‘;

  int ret = 1;

  if (lead != ‘\0‘)

    {

      ret = fprintf(fp, "%c%c", lead, ch);

      lead = ‘\0‘;

      if (ret < 0)

        return EOF;

    }

  else if (_ismbblead(ch))

    lead = ch;

  else

    return _flsbuf(ch, fp);

  return ch;

}

int __cdecl putc(int ch, FILE* fp)

{

  static __thread char lead = ‘\0‘;

  int ret = 1;

  if (lead != ‘\0‘)

    {

      ret = fprintf(fp, "%c%c", lead, ch);

      lead = ‘\0‘;

    }

  else if (_ismbblead(ch))

    lead = ch;

  else

    ret = fprintf(fp, "%c", ch);

  if (ret < 0)

    return EOF;

  else

    return ch;

}

int __cdecl putchar(int ch)

{

  putc(ch, stdout);

}

int __cdecl _mgwrt_putchar(int ch)

{

  putc(ch, stdout);

}

 

https://yongweiwu.wordpress.com/tag/mingw-w64/

原文地址:https://www.cnblogs.com/findumars/p/10247277.html

时间: 2024-11-02 07:06:26

MSVCRT.DLL Console I/O Bug(setlocale(LC_CTYPE, "Chinese_China.936"))的相关文章

支付宝安卓机型支付时,只有第一次可以唤醒支付窗的bug(有支付宝交易号)

占坑 之前公司的支付宝支付时相关项目代码一直是 document.addEventListener('AlipayJSBridgeReady', function() { AlipayJSBridge.call('tradePay',{tradeNO:tradeNo}, function(result){ } }) 这样的,但是前两天突然报了bug出来 安卓的只可以支付第一次,再支付就弹不出支付窗了 经过一系列查找也没找到问题 最后按照官方的方法写 function ready(callback

POJ 2096 Collecting Bugs(概率DP求期望)

传送门 Collecting Bugs Time Limit: 10000MS Memory Limit: 64000K Total Submissions: 4333 Accepted: 2151 Case Time Limit: 2000MS Special Judge Description Ivan is fond of collecting. Unlike other people who collect post stamps, coins or other material stu

【原创】IE11惊现无厘头Crash BUG(三招搞死你的IE11,并提供可重现代码)!

前言 很多人都知道我们在做FineUI控件库,而且我们也做了超过 9 年的时间,在和浏览器无数次的交往中,也发现了多个浏览器自身的BUG,并公开出来方便大家查阅: 分享IE7一个神奇的BUG(不是封闭标签的问题,的确是IE7的BUG) Chrome53 最新版惊现无厘头卡死 BUG! Chrome最新版(53-55)再次爆出BUG! 这类BUG之所以被大家所深恶痛绝,在于其隐蔽性,很多时候不能用常规的逻辑去分析.另一个原因的开发人员一般都很善良,出现问题总是从自身找原因,很少会怀疑到IDE,浏览

msvcrt.dll 导出/导入函数列表

Section contains the following exports for msvcrt.dll 00000000 characteristics F5BDEFD7 time date stamp 0.00 version 1 ordinal base 1317 number of functions 1317 number of names ordinal hint RVA name 1 0 0000A540 [email protected]@[email protected]@@

LIB和DLL的区别与使用(转载)

转载自:http://www.cppblog.com/amazon/archive/2009/09/04/95318.html 共有两种库:一种是LIB包含了函数所在的DLL文件和文件中函数位置的信息(入口),代码由运行时加载在进程空间中的DLL提供,称为动态链接库dynamic link library.一种是LIB包含函数代码本身,在编译时直接将代码加入程序当中,称为静态链接库static link library.共有两种链接方式:动态链接使用动态链接库,允许可执行模块(.dll文件或.e

第22章 DLL注入和API拦截(3)

22.6 API拦截的一个例子 22.6.1 通过覆盖代码来拦截API (1)实现过程 ①在内存中对要拦截的函数(假设是Kernel32.dll中的ExitProcess)进行定位,从而得到它的内存地址. ②把这个函数的起始的几个字节保存在我们自己的内存中. ③用CPU的一条JUMP指令来覆盖这个函数起始的几个字节,这条JUMP指令用来跳转到我们替代函数的内存地址.当然,我们的替代函数的函数签名必须与要拦截的函数的函数签名完全相同,即所有的参数必须相同,返回值必须相同,调用约定也必须相同. ④现

QT5.4 计算器程序 打包&amp;发布,解决dll的最新解决方案(图文并茂,很清楚)

QT写界面还是很不错,就是打包会比较麻烦,折腾了一天总算是打包完成了. QT软件的打包发布一个难点是必备dll文件的识别,现在高版本QT自带了一个windeployqt工具,直接会把需要的dll生成一份,放在软件的目录里面. 参考官方文档:http://doc.qt.io/qt-5/windows-deployment.html#application-dependencies 具体使用步骤如下: 1.添加qt的bin/目录进系统path变量.很简单,在系统属性里设置,添加D:\QT\QT5.4

poj2586 Y2K Accounting Bug(贪心)

转载请注明出处:http://blog.csdn.net/u012860063?viewmode=contents 题目链接:http://poj.org/problem?id=2586 Language: Default Y2K Accounting Bug Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 9979   Accepted: 4970 Description Accounting for Computer

第22章 DLL注入和API拦截(2)

22.4 使用远程线程来注入DLL 22.4.1 概述 (1)远程线程注入是指一个进程在另一个进程中创建线程,然后载入我们编写的DLL,并执行该DLL代码的技术.其基本思路是通过CreateRemoteThread创建一个远程线程,并将LoadLibrary函数作为该线程函数来启动线程,同时将Dll文件名作为线程函数的参数传入.大致执过程如下:CreateRemoteThread()→LoadLibrary()→DllMain(). (2)核心函数:CreateRemoteThread 参数 说