Checking the Performance of FindArray

FindArray Example

Let’s create a program that shows how a sample C++ compiler generates code for a function

named FindArray. Later, we will write an assembly language version of the function, attempting

to write more efficient code than the C++ compiler. The following FindArray function (in C++)

searches for a single value in an array of long integers:

bool FindArray( long searchVal, long array[], long count )

{

for(int i = 0; i < count; i++)

{

if( array[i] == searchVal )

return true;

}

return false;

}

Linking MASM to Visual C++

Let’s create a hand-optimized assembly language version of FindArray, named AsmFindArray.

A few basic principles are applied to the code optimization:

? Move as much processing out of the loop as possible.

? Move stack parameters and local variables to registers.

? Take advantage of specialized string/array processing instructions (in this case, SCASD).

We will use Microsoft Visual C++ (Visual Studio) to compile the calling C++ program and

Microsoft MASM to assemble the called procedure. Visual C++ generates 32-bit applications that

run only in protected mode. We choose Win32 Console as the target application type for the examples

shown here, although there is no reason why the same procedures would not work in ordinary

MS-Windows applications. In Visual C++, functions return 8-bit values in AL, 16-bit values in AX,

32-bit values in EAX, and 64-bit values in EDX:EAX. Larger data structures (structure values,

arrays, etc.) are stored in a static data location, and a pointer to the data is returned in EAX.

Our assembly language code is slightly more readable than the code generated by the C++

compiler because we can use meaningful label names and define constants that simplify the use

of stack parameters. Here is the complete module listing:

TITLE AsmFindArray Procedure (AsmFindArray.asm)

.586

.model flat,C

AsmFindArray PROTO,

srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD

.code

;-----------------------------------------------

AsmFindArray PROC USES edi,

srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD

;

; Performs a linear search for a 32-bit integer

; in an array of integers. Returns a boolean

; value in AL indicating if the integer was found.

;-----------------------------------------------

true = 1

false = 0

mov eax,srchVal ; search value

mov ecx,count ; number of items

mov edi,arrayPtr ; pointer to array

repne scasd ; do the search

jz returnTrue ; ZF = 1 if found

returnFalse:

mov al,false

jmp short exit

returnTrue:

mov al, true

exit:

ret

AsmFindArray ENDP

END

Checking the Performance of FindArray

Test Program It is interesting to check the performance of any assembly language code

you write against similar code written in C++. To that end, the following C++ test program

inputs a search value and gets the system time before and after executing a loop that calls

FindArray one million times. The same test is performed on AsmFindArray. Here is a listing

of the findarr.h header file, with function prototypes for the assembly language procedure and

the C++ function:

// findarr.h

extern "C" {

bool AsmFindArray( long n, long array[], long count );

// Assembly language version

bool FindArray( long n, long array[], long count );

// C++ version

}

Main C++ Module Here is a listing of main.cpp, the startup program that calls FindArray and

AsmFindArray:

// main.cpp - Testing FindArray and AsmFindArray.

#include <iostream>

#include <time.h>

#include "findarr.h"

using namespace std;

int main()

{

// Fill an array with pseudorandom integers.

const unsigned ARRAY_SIZE = 10000;

const unsigned LOOP_SIZE = 1000000;

long array[ARRAY_SIZE];

for(unsigned i = 0; i < ARRAY_SIZE; i++)

array[i] = rand();

long searchVal;

time_t startTime, endTime;

cout << "Enter value to find: ";

cin >> searchVal;

cout << "Please wait. This will take between 10 and 30

seconds...\n";

// Test the C++ function:

time( &startTime );

bool found = false;

for( int n = 0; n < LOOP_SIZE; n++)

found = FindArray( searchVal, array, ARRAY_SIZE );

time( &endTime );

cout << "Elapsed CPP time: " << long(endTime - startTime)

<< " seconds. Found = " << found << endl;

// Test the Assembly language procedure:

time( &startTime );

found = false;

for( int n = 0; n < LOOP_SIZE; n++)

found = AsmFindArray( searchVal, array, ARRAY_SIZE );

time( &endTime );

cout << "Elapsed ASM time: " << long(endTime - startTime)

<< " seconds. Found = " << found << endl;

return 0;

}

Assembly Code versus Nonoptimized C++ Code We compiled the C++ program to a

Release (non-debug) target with code optimization turned off. Here is the output, showing the

worst case (value not found):

Assembly Code versus Compiler Optimization Next, we set the compiler to optimize the

executable program for speed and ran the test program again. Here are the results, showing the

assembly code is noticeably faster than the compiler-optimized C++ code:

Pointers versus Subscripts

Programmers using older C compilers observed that processing arrays with pointers was more efficient

than using subscripts. For example, the following version of FindArray uses this approach:

bool FindArray( long searchVal, long array[], long count )

{

long * p = array;

for(int i = 0; i < count; i++, p++)

if( searchVal == *p )

return true;

return false;

}

Running this version of FindArray through the Visual C++ compiler produced virtually the

same assembly language code as the earlier version using subscripts. Because modern compilers

are good at code optimization, using a pointer variable is no more efficient than using a subscript.

Here is the loop from the FindArray target code that was produced by the C++ compiler:

$L176:
cmp esi, DWORD PTR [ecx]
je SHORT $L184
inc eax
add ecx, 4
cmp eax, edx
jl SHORT $L176

Your time would be well spent studying the output produced by a C++ compiler to learn about

optimization techniques, parameter passing, and object code implementation. In fact, many computer

science students take a compiler-writing course that includes such topics. It is also important to

realize that compilers take the general case because they usually have no specific knowledge about

individual applications or installed hardware. Some compilers provide specialized optimization for a

particular processor such as the Pentium, which can significantly improve the speed of compiled

programs. Hand-coded assembly language can take advantage of string primitive instructions, as

well as specialized hardware features of video cards, sound cards, and data acquisition boards.

时间: 2024-08-24 06:15:28

Checking the Performance of FindArray的相关文章

STL之Errors and Exceptions

Error Handling STL设计的目标是性能最优化,而不是最安全. 错误检查是极其浪费时间的,因此,STL对于错误处理几乎没有做处理,因此,这对STL的使用者的要求就非常高. 为什么不采取错误处理呢,下面是两个主要原因: Error checking reduces performance, and speed is still a general goal of programs. As mentioned, good performance was one of the design

[C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

About this Course This course will teach you the "magic" of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good res

Unity3D Optimizing Graphics Performance for iOS

原地址:http://blog.sina.com.cn/s/blog_72b936d801013ptr.html icense Comparisons http://unity3d.com/unity/licenses#iphone Optimizing Graphics Performance http://unity3d.com/support/documentation/Manual/Optimizing Graphics Performance.html iOS A useful bac

Profiling MySQL queries from Performance Schema

转自:http://www.percona.com/blog/2015/04/16/profiling-mysql-queries-from-performance-schema/?utm_source=tuicool When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries aka SET profiling = 1;

System and method for dynamically adjusting to CPU performance changes

FIELD OF THE INVENTION The present invention is related to computing systems, and more particularly to a system and method for adjusting to changes in processor performance. BACKGROUND INFORMATION Designers of mobile computing platforms are faced wit

How to Configure Nginx for Optimized Performance

Features Pricing Add-ons Resources | Log in Sign up Guides & Tutorials Web Server Guides Nginx How to Configure Nginx for Optimized Performance How to Configure Nginx for Optimized Performance Updated Wednesday, September 9th, 2015 by LinodeContribut

NIKE KOBE A.D. PERFORMANCE REVIEW

The Kobe series has long been one of my favorite signature lines in the basketball shoe industry. The first pair of basketball shoes that I bought on my own (when I was in middle school) was the Nike Kobe 1, the first signature shoe that Kobe Bryant

[MySQL Reference Manual] 23 Performance Schema结构

23 MySQL Performance Schema 23 MySQL Performance Schema.. 1 23.1 性能框架快速启动... 3 23.2 性能框架配置... 5 23.2.1 性能框架编译时配置... 5 23.2.2 性能框架启动配置... 6 23.2.3 启动时性能框架配置... 8 23.2.3.1 性能架构事件定时... 8 23.2.3.2 性能框架事件过滤... 9 23.2.3.3 事件预过滤... 10 23.2.3.4命名记录点或者消费者的过滤.

Optimizing Item Import Performance in Oracle Product Hub/Inventory

APPLIES TO: Oracle Product Hub - Version 12.1.1 to 12.1.1 [Release 12.1] Oracle Inventory Management - Version 12.1.1 to 12.1.1 [Release 12.1] Oracle Item Master - Version 12.0.6 to 12.0.6 [Release 12] Information in this document applies to any plat