为什么有序数组比无序数组快呢?

来自stackoverflow的题目Why is processing a sorted array faster than an unsorted array?

Here is a piece of C++ code that seems very peculiar. For some strange reason, sorting the data miraculously makes the code almost six times faster:

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;
}
  • Without std::sort(data, data + arraySize);, the code runs in
    11.54
    seconds.
  • With the sorted data, the code runs in 1.93 seconds.
import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i)
        {
            // Primary loop
            for (int c = 0; c < arraySize; ++c)
            {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

答案是:You are the victim of branch prediction fail.


What is Branch Prediction?

Now for the sake of argument, suppose this is back in the 1800s - before long distance or radio communication.

You are the operator of a junction and you hear a train coming. You have no idea which way it will go. You stop the train to ask the captain which direction he wants. And then you set the switch appropriately.

Trains are heavy and have a lot of inertia. So they take forever to start up and slow down.

Is there a better way? You guess which direction the train will go!

  • If you guessed right, it continues on.
  • If you guessed wrong, the captain will stop, back up, and yell at you to flip the switch. Then it can restart down the other path.

If you guess right every time, the train will never have to stop.

If you guess wrong too often, the train will spend a lot of time stopping, backing up, and restarting.



Consider an if-statement: At the processor level, it is a branch instruction:

You are a processor and you see a branch. You have no idea which way it will go. What do you do? You halt execution and wait until the previous instructions are complete. Then you continue down the correct path.

Modern processors are complicated and have long pipelines. So they take forever to "warm up" and "slow down".

Is there a better way? You guess which direction the branch will go!

  • If you guessed right, you continue executing.
  • If you guessed wrong, you need to flush the pipeline and roll back to the branch. Then you can restart down the other path.

If you guess right every time, the execution will never have to stop.

If you guess wrong too often, you spend a lot of time stalling, rolling back, and restarting.



This is branch prediction. I admit it‘s not the best analogy since the train could just signal the direction with a flag. But in computers, the processor doesn‘t know which direction a branch will go until the last moment.

So how would you strategically guess to minimize the number of times that the train must back up and go down the other path? You look at the past history! If the train goes left 99% of the time, then you guess left. If it alternates, then you alternate your
guesses. If it goes one way every 3 times, you guess the same...

In other words, you try to identify a pattern and follow it. This is more or less how branch predictors work.

Most applications have well-behaved branches. So modern branch predictors will typically achieve >90% hit rates. But when faced with unpredictable branches with no recognizable patterns, branch predictors are virtually useless.

Further reading: "Branch predictor" article on Wikipedia.

As hinted from above, the culprit is this if-statement:

if (data[c] >= 128)
    sum += data[c];

Notice that the data is evenly distributed between 0 and 255. When the data is sorted, roughly the first half of the iterations will not enter the if-statement. After that, they will all enter the if-statement.

This is very friendly to the branch predictor since the branch consecutively goes the same direction many times.Even a simple saturating counter will correctly predict the branch except for the few iterations after it switches direction.

Quick visualization:

T = branch taken
N = branch not taken

data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N  N  N  N  N  ...   N    N    T    T    T  ...   T    T    T  ...

       = NNNNNNNNNNNN ... NNNNNNNTTTTTTTTT ... TTTTTTTTTT  (easy to predict)

However, when the data is completely random, the branch predictor is rendered useless because it can‘t predict random data.Thus there will probably be around 50% misprediction. (no better than random guessing)

data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118,  14, 150, 177, 182, 133, ...
branch =   T,   T,   N,   T,   T,   T,   T,  N,   T,   N,   N,   T,   T,   T,   N  ...

       = TTNTTTTNTNNTTTN ...   (completely random - hard to predict)


So what can be done?

If the compiler isn‘t able to optimize the branch into a conditional move, you can try some hacks if you are willing to sacrifice readability for performance.

Replace:

if (data[c] >= 128)
    sum += data[c];

with:

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

This eliminates the branch and replaces it with some bitwise operations.

(Note that this hack is not strictly equivalent to the original if-statement. But in this case, it‘s valid for all the input values ofdata[].)

Benchmarks: Core i7 920 @ 3.5 GHz

C++ - Visual Studio 2010 - x64 Release

//  Branch - Random
seconds = 11.777

//  Branch - Sorted
seconds = 2.352

//  Branchless - Random
seconds = 2.564

//  Branchless - Sorted
seconds = 2.587

Java - Netbeans 7.1.1 JDK 7 - x64

//  Branch - Random
seconds = 10.93293813

//  Branch - Sorted
seconds = 5.643797077

//  Branchless - Random
seconds = 3.113581453

//  Branchless - Sorted
seconds = 3.186068823

Observations:

  • With the Branch: There is a huge difference between the sorted and unsorted data.
  • With the Hack: There is no difference between sorted and unsorted data.
  • In the C++ case, the hack is actually a tad slower than with the branch when the data is sorted.

A general rule of thumb is to avoid data-dependent branching in critical loops. (such as in this example)

时间: 2024-11-05 15:12:12

为什么有序数组比无序数组快呢?的相关文章

java面向对象的有序数组和无序数组的比较

package aa; class Array{ //定义一个有序数组 private long[] a; //定义数组长度 private int nElems; //构造函数初始化 public Array(int max){ a = new long[max]; nElems = 0; } //size函数 public int size(){ return nElems; } //定义添加函数 public void insert(long value){ //将value赋值给数组成员

为什么处理有序数组比无序数组快?

有兴趣学习教流c/c++的小伙伴可以加群:941636044 问题 由于某些怪异的原因,下面这段C++代码表现的异乎寻常--当这段代码作用于有序数据时其速度可以提高将近6倍,这真是令人惊奇. #include <algorithm> #include <ctime> #include <iostream> int _tmain (int argc , _TCHAR * argv []) { //Generate data const unsigned arraySize

数组结构之数组

数据结构之数组的运用,无非是增删查操作,就有序数组和无序数组进行这三种操作: 一.查找 (1)无序数组查找特定元素,线性查找: 1 public static void unSortSearchKey(int arr[], int key) { 2 for (int i = 0; i < arr.length; i++) { 3 if(arr[i]==key){ 4 System.out.println(key+"在无序数组中索引为的"+i+"个位置"); 5

有序和无序数组的二分搜索算法

题目意思 1.给定有序数组A和关键字key,判断A中是否存在key,如果存在则返回下标值,不存在则返回-1. 2.给定无序数组A和关键字key,判断A中是否存在key,如果存在则返回1,不存在则返回0. 对于1.2问题,我们都可以简单的写出O(n)的从头到尾为的扫描算法,这里就不在累赘,这里我们讨论的是基于二分查找的算法,使其时间在渐进意义上达到O(logn). 对于有序的数组,很"容易"写出基于二分的函数. 那么问题2,对于无序数组,怎么查找呢?这里我们用到了快速排序的划分原则.算法

快速查找无序数组中的第K大数?

1.题目分析: 查找无序数组中的第K大数,直观感觉便是先排好序再找到下标为K-1的元素,时间复杂度O(NlgN).在此,我们想探索是否存在时间复杂度 < O(NlgN),而且近似等于O(N)的高效算法. 还记得我们快速排序的思想麽?通过“partition”递归划分前后部分.在本问题求解策略中,基于快排的划分函数可以利用“夹击法”,不断从原来的区间[0,n-1]向中间搜索第k大的数,大概搜索方向见下图: 2.参考代码: 1 #include <cstdio> 2 3 #define sw

无序数组中第Kth大的数

题目:找出无序数组中第Kth大的数,如{63,45,33,21},第2大的数45. 输入: 第一行输入无序数组,第二行输入K值. 该是内推滴滴打车时(2017.8.26)的第二题,也是<剑指offer>上最小k个数的变形.当时一看到题,这个不是用快排吗?然后就写了,结果始终没有通过,遗憾的超时提交了.错误点主要在于,这里求的是第K大的数,而若是我们使用K去判断快排得到的下标,得到的是第K个数(等同于排序以后从左往右下标为K-1),而题中隐藏的意思等同于排序以后从 右往左数第K个数.所写在写代码

求无序数组的中位数

中位数即是排过序后的处于数组最中间的元素. 不考虑数组长度为偶数的情况.设集合元素个数为n. 简单的想了下:思路1) 把无序数组排好序,取出中间的元素            时间复杂度 采用普通的比较排序法 O(N*logN)            如果采用非比较的计数排序等方法, 时间复杂度 O(N), 空间复杂度也是O(N). 思路2)           2.1)将前(n+1)/2个元素调整为一个小顶堆,          2.2)对后续的每一个元素,和堆顶比较,如果小于等于堆顶,丢弃之,

有1,2,3一直到n的无序数组,排序

题目:有1,2,3,..n 的无序整数数组,求排序算法.要求时间复杂度 O(n), 空间复杂度O(1). 分析:对于一般数组的排序显然 O(n) 是无法完成的. 既然题目这样要求,肯定原先的数组有一定的规律,让人们去寻找一种机会. 例如:原始数组: a = [ 10, 6,9, 5,2, 8,4,7,1,3 ] ,如果把它们从小到大排序且应该是 b = [1,2,3,4,5,6,7,8,9,10 ],也就是说: 如果我们观察 a --> b 的对映关系是: a[i] 在 b 中的位置是 a[i]

二分法查找(数组元素无序)

问题描述: 一数组,含有一堆无序数据,首先将数据按顺序排列,再用二分法实现某个元素的查找,若找到,返回该元素在数组中的下表,否则,返回不存在提示信息. #include<stdio.h> #include<stdlib.h> int *bubble_sort(int a[],int n)//冒泡排序(将数据升序排列) { int i; int j; int tmp; for(j=0;j<n-1;++j)//n个元素需要排序n-1趟 { for(i=0;i<n-j-1;+