This article is about some inspirations I got from another blog pasted below.

That post discusses which algorithm is truly correct to randomly shuffle an array of integers (or shuffle pokers), and how to test an algorithm to make sure it’s the right one.

The core idea is that: when it comes to probability problem, there’s only one way to actually test that - statistics.

For example, if a function can randomly return a number from 1 - 10, it means the probability of occurance of each individual number is 1/10. Therefore, if we run this function a huge amount of times (let’s say 1 million times), the distribution of returned values should be uniform.

At least, the error rate should be reasonable, say 5% but not 20%. We can define ‘reasonable’ as:

Sample: 1 million timesMax error rate: 10%The 95% Percentile: over 90% error rate is less than 5%

Don’t use average error rate here as the post mentioned! I personally really hate using average of something. What’s the problem with ‘average’? You might ask. My question is: the average personal assets between you and Mark Zuckerburgh is pretty huge, so you a billinaire.

And the corresponding test plan is:

  • a test unit should run this function 100 thousand times (100 samples)
  • run this test unit 1 million times for each sample
  • if over 95% percentile of the 1m sample tests has an error rate less than 5%, we say this function is truly random

(From my point of view, there’s actually another way if the function given to you is not in a black box - you are able to give out the math probability equation)

1. 递归二分随机抽牌


const size_t MAXLEN = 10;const char TestArr[MAXLEN] = {'A','B','C','D','E','F','G','H','I','J'};

static char RecurArr[MAXLEN]={0};static int cnt = 0;void (char* arr, int len){    if(cnt > MAXLEN || len <=0){        return;    }

    int pos = rand() % len;    RecurArr[cnt++] = arr[pos];    if (len==1) return;    ShuffleArray_Recursive_Tmp(arr, pos);    ShuffleArray_Recursive_Tmp(arr+pos+1, len-pos-1);}

void ShuffleArray_Recursive(char* arr, int len){    memset(RecurArr, 0, sizeof(RecurArr));    cnt=0;    ShuffleArray_Recursive_Tmp(arr, len);    memcpy(arr, RecurArr, len);}

void main(){    char temp[MAXLEN]={0};    for(int i=0; i<5; i++) {        strncpy(temp, TestArr, MAXLEN);        ShuffleArray_Recursive((char*)temp, MAXLEN);    }}


第一次:D C A B H E G F I J第二次:A G D B C E F J H I第三次:A B H F C E D G I J第四次:J I F B A D C E H G第五次:F B A D C E H G I J

2. 快排Hack法


int compare( const void *a, const void *b ){    return rand() % 3-1;}

void ShuffleArray_Sort(char* arr, int len){    qsort( (void *)arr, (size_t)len, sizeof(char), compare );}


第一次:H C D J F E A G B I第二次:B F J D C E I H G A第三次:C G D E J F B I A H第四次:H C B J D F G E I A第五次:D B C F E A I H G J


3. 大多数人的实现


void ShuffleArray_General(char* arr, int len){    const int suff_time = len;    for(int idx=0; idx<suff_time; idx++) {        int i = rand() % len;        int j = rand() % len;        char temp = arr[i];        arr[i] = arr[j];        arr[j] = temp;    }}


第一次:G F C D A J B I H E第二次:D G J F E I A H C B第三次:C J E F A D G B H I第四次:H D C F A E B J I G第五次:E A J F B I H G D C


如何测试 How to test




试想,我们有个随机函数rand()返回 1 到 10 中的一个数,如果够随机的话,每个数返回的概率都应该是一样的,也就是说每个数都应该有10分之1的概率会被返回。




Test Result


1. 递归随机抽牌的方法


     1    2    3    4    5    6    7   大专栏  How to test a program involving probability - take an algorithm that shuffles array as example  8    9    10----------------------------------------------------A | 101  283  317  208   65   23    3    0    0    0B | 101  191  273  239  127   54   12    2    1    0C | 103  167  141  204  229  115   32    7    2    0D | 103  103   87  128  242  195  112   26    3    1E | 104   83   62   67  116  222  228   93   22    3F |  91   58   34   60   69  141  234  241   65    7G |  93   43   35   19   44  102  174  274  185   31H |  94   28   27   27   46   68   94  173  310  133I | 119   27   11   30   28   49   64   96  262  314J |  91   17   13   18   34   31   47   88  150  511

2. 快排Hack法


      1    2    3    4    5    6    7    8    9    10-----------------------------------------------------A |   74  108  123  102   93  198   40   37   52  173B |  261  170  114   70   49   28   37   76  116   79C |  112  164  168  117   71   37   62   96  116   57D |   93   91  119  221  103   66   91   98   78   40E |   62   60   82   90  290  112   95   98   71   40F |   46   60   63   76   81  318   56   42   70  188G |   72   57   68   77   83   39  400  105   55   44H |   99   79   70   73   87   34  124  317   78   39I |  127  112  102   90   81   24   57   83  248   76J |   54   99   91   84   62  144   38   48  116  264

3. 大多数人的算法


      1    2    3    4    5    6    7    8    9    10-----------------------------------------------------A |  178   98   92   82  101   85   79  105   87   93B |   88  205   90   94   77   84   93   86  106   77C |   93   99  185   96   83   87   98   88   82   89D |  105   85   89  190   92   94  105   73   80   87E |   97   74   85   88  204   91   80   90  100   91F |   85   84   90   91   96  178   90   91  105   90G |   81   84   84  104  102  105  197   75   79   89H |   84   99  107   86   82   78   92  205   79   88I |  102   72   88   94   87  103   94   92  187   81J |   87  100   90   75   76   95   72   95   95  215

正确的算法 The correct algorithm

下面,我们来看看性能高且正确的算法—— Fisher_Yates算法

void ShuffleArray_Fisher_Yates(char* arr, int len){    int i = len, j;    char temp;

    if ( i == 0 ) return;    while ( i-- ) {        j = rand() % (i+1);        temp = arr[i];        arr[i] = arr[j];        arr[j] = temp;    }}


      1    2    3    4    5    6    7    8    9    10-----------------------------------------------------A |  107   98   83  115   89  103  105   99   94  107B |   91  106   90  102   88  100  102   97  112  112C |  100  107   99  108  101   99   86   99  101  100D |   96   85  108  101  117  103  102   96  108   84E |  106   89  102   86   88  107  114  109  100   99F |  109   96   87   94   98  102  109  101   92  102G |   94   95  119  110   97  112   89  101   89   94H |   93  102  102  103  100   89  107  105  101   98I |   99  110  111  101  102   79  103   89  104  102J |  105  112   99   99  108  106   95   95   99   82



      1       2     3       4      5      6      7      8     9      10-------------------------------------------------------------------------A | 100095  99939 100451  99647  99321 100189 100284  99565 100525  99984B |  99659 100394  99699 100436  99989 100401  99502 100125 100082  99713C |  99938  99978 100384 100413 100045  99866  99945 100025  99388 100018D |  99972  99954  99751 100112 100503  99461  99932  99881 100223 100211E | 100041 100086  99966  99441 100401  99958  99997 100159  99884 100067F | 100491 100294 100164 100321  99902  99819  99449 100130  99623  99807G |  99822  99636  99924 100172  99738 100567 100427  99871 100125  99718H |  99445 100328  99720  99922 100075  99804 100127  99851 100526 100202I | 100269 100001  99542  99835 100070  99894 100229 100181  99718 100261J | 100268  99390 100399  99701  99956 100041 100108 100212  99906 100019



平均误差:5%以内 (或者:90%以上的误差要小于5%)






void ShuffleArray_Manual(char* arr, int len){    int mid = len / 2;

    for (int n=0; n<5; n++){

        //两手洗牌        for (int i=1; i<mid; i+=2){            char tmp = arr[i];            arr[i] = arr[mid+i];            arr[mid+i] = tmp;        }

        //随机切牌        char *buf = (char*)malloc(sizeof(char)*len);

        for(int j=0; j<5; j++) {            int start= rand() % (len-1) + 1;            int numCards= rand()% (len/2) + 1;

            if (start + numCards > len ){                numCards = len - start;            }

            memset(buf, 0, len);            strncpy(buf, arr, start);            strncpy(arr, arr+start, numCards);            strncpy(arr+numCards, buf, start);        }        free(buf);



      1       2     3       4      5      6      7      8     9      10-------------------------------------------------------------------------A |  10002   9998   9924  10006  10048  10200   9939   9812  10080   9991B |   9939   9962  10118  10007   9974  10037  10149  10052   9761  10001C |  10054  10100  10050   9961   9856   9996   9853  10016   9928  10186D |   9851   9939   9852  10076  10208  10003   9974  10052   9992  10053E |  10009   9915  10050  10037   9923  10094  10078  10059   9880   9955F |  10151  10115  10113   9919   9844   9896   9891   9904  10225   9942G |  10001  10116  10097  10030  10061   9993   9891   9922   9889  10000H |  10075  10033   9866   9857  10170   9854  10062  10078  10056   9949I |  10045   9864   9879  10066   9930   9919  10085  10104  10095  10013J |   9873   9958  10051  10041   9986  10008  10078  10001  10094   9910


