1. Duff‘s Device
Invited by Tom Duff in 1983, Duff‘s device is a good practical example of loop unrolling in software-based instruction-level parallelism, which approximately reduces the instructions that control the loop into 1/8.
void memcpy(char *to, char *from, size_t count) { size_t n =(count+7)/8; switch (count%8) { case 0: do { *to++=*from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while(--n>0); } }
2. Fast Inverse Square Root
This algorithm can speed up the caculation of normalized vectors in IEEE floating point format, and it was devised by John Carmack in 1990s. Up to now, I still have no idea about why this program is so magic.
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // what the fuck? y = * ( float * ) & i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed return y; }
Duff's Device and Fast InvSqrt
时间: 2024-10-26 10:24:29