深入底层-数据的按位运算

深入底层——数据的按位运算

简介

由于程序中的所有数在计算机内存中都是以二进制的形式储存的,因此要对这些数进行直接操作的话,只能按位来处理。不同于其他复杂的运算符,位运算消耗的时钟周期是最少的,因此如果要追求极致的效率,那么位运算是理想的选择。

基本运算符及性质

与运算&

”遇0清0,遇1不变“。只有两个二进制位都是1时运算后才为1,其他情况都为0。

与运算最常见地被用来清除数据的某些位数,或者用来判断某一位是否为1。

1
2
3
4
5
6
7
//将0011的末位清0
0011 & 0010 = 0010;
//判断x的第3位是否为1(index start from 0)
if x == 1100:
x & 1000 = 1000(not zero)
if x == 0100:
x & 1000 = 0000(zero)

或运算|

“遇1置1,遇0不变”。只有两个二进制位都为0时运算后才为0,其他情况都为1。

或运算可以用来把某些位置为1,但更常用来将两个错位的数据合并起来。

1
2
//合并1100 和 0011
1100 | 0011 = 11111

非运算~

将数据的每一位取反。常用来计算补码

1
2
3
4
//-1转化为0
~1111 = 0000
//计算-1的补码
~1111 + 1 = 0001

异或运算^

如果两个二进制位不同,则结果为1,否则结果为0。

异或运算可以看成是不带进位的加法。

异或运算有一些很好的性质:

x ^ x = 0

x ^ 0 = x

交换律:x ^ a ^ x = x ^ x ^ a = a

结合律:a ^ b ^ c = a ^ (b ^ c)

1
2
3
4
5
6
7
8
9
10
11
//交换两个数(不能储存在同一个地址下)
x ^= y;
y ^= x;
x ^= y;

//找出藏在成对数据中的唯一数据
//如:1,1,2,3,3,4,4 -> 2
a = 0;
for i 0 -> length - 1:
a ^= array[i];
return a;

左移<<

将数据的二进制位依次左移,相当于乘2。也经常和与运算结合起来取出特定位置的数据。

1
2
3
x * 2 == x << 1;
//取出x = 0101 1100的高4位
x &= -1 << 4;

右移>>

将数据的二进制位依次右移,相当于除以2。右移又分为逻辑右移和算术右移,算术右移会根据最高位来补位。

1
x / 2 == x >> 1;

题海拾贝

byteSwap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/* 
* byteSwap - swaps the nth byte and the mth byte
* Examples: byteSwap(0x12345678, 1, 3) = 0x56341278
* byteSwap(0xDEADBEEF, 0, 2) = 0xDEEFBEAD
* You may assume that 0 <= n <= 3, 0 <= m <= 3
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 25
* Rating: 2
*/
int byteSwap(int x, int n, int m)
{
/**
* first get the bytes needed to swap: n_byte and m_byte, which
* seperately present nth byte of x and mth byte of y.
* then set y = n_byte ^ m_byte.
* because y ^ n_byte = n_byte ^ m_byte ^ n_byte = m_byte,
* same as y ^ m_byte, so x ^= y << n will set nth byte to m_byte,
* and x ^= y << m will set mth byte to n_byte.
*/
int y = 0;
n = n << 3;
m = m << 3;
y = 0xff & ((x >> n) ^ (x >> m));
x = x ^ (y << n);
x = x ^ (y << m);
return x;
}

可见,异或运算有时也能用来存储好几个数据,比如y存储了x的第n个字节和第m个字节,这样在重新和原始的x进行异或运算时,重复的数据被抵消掉了,留下的就是交换后的数据。

rotateRight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/* 
* rotateRight - Rotate x to the right by n
* Can assume that 0 <= n <= 31
* Examples: rotateRight(0x87654321,4) = 0x18765432
* Legal ops: ~ & ^ | + << >> !
* Max ops: 25
* Rating: 3
*/
int rotateRight(int x, int n)
{
/**
* x >> n presents the lower n bits, and (x ^ (x >> 31)) <<
* (31 ^ n) << 1 presents the higher 32 - n bits. In order to
* clean the 1 generated by moving right, you can XOR the sign of
* x.It's x ^ (x >> 31)
*/
return (x >> n) ^ ((x ^ (x >> 31)) << (31 ^ n) << 1);
}

这里要注意的是,<<的运算符优先级是高于^的,这里有一个很巧妙的处理方法:(31 ^ n) << 1。因为如果要计算32 - n的话,如果使用补码来模拟减法运算,一则增加了运算符的个数,二则是,当n为0时,x << 32会变成 x<<1。而31是$2^5 - 1$即11111, 和n异或运算后刚好把n中的1全部消除了,就相当于31-n。最后补上<<1 代表实际右移了32 - n步。

另一个处理巧妙的地方在于使用符号位处理掉了x>>n产生的补充位,如果x是负数,则x ^ x>>31会让x每一位存储1,在于x>>n的高位异或之后刚好抵消。如果x是正数则没有影响。

logicalNeg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/* 
* logicalNeg - implement the ! operator, using all of
* the legal operators except !
* Examples: logicalNeg(3) = 0, logicalNeg(0) = 1
* Legal ops: ~ & ^ | + << >>
* Max ops: 12
* Rating: 4
*/
int logicalNeg(int x)
{
/**
* If x == 0, then negX | x >> 31 = 0,
* else it must be -1, so add 1 to the result is
* the final answer.
*/
return (((~x + 1) | x) >> 31) + 1;
}

0 和 非0的区别刚好可以通过补码来体现:只有0和0x8000 0000的补码才等于它们本身,而补码和自身或运算之后,又只有非0的数符号位为1,因此根据这个特点就能轻松区别0和非0数了。

isGreater

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* 
* isGreater - if x > y then return 1, else return 0
* Example: isGreater(4,5) = 0, isGreater(5,4) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 24
* Rating: 3
*/
int isGreater(int x, int y)
{
/**
* first, we can get the average of x and y with no overflow happening
* by using the expression: (x & y) + (x ^ y) >> 1.
* Ideas as follows:
* for these bits which are both 1 in x and y, then bits_x & bits_y = average()
* eg: bits_x = 0100, bits_y = 0100, then average = bits_x & bits_y = 0100
* for other bits, to get their average, we can use (bits_x ^ bits_y)>>1.
* eg: bits_x = 0100 = 4, bits_y = 0010 = 2, then average = (0110)>>1 = 0011 = 3
* So if we want to get the average of x and y, we can separate x and y to
* the two conditions above and deal with them with certain operator, then
* add them to get the correct result. Because this operation won't generate carrying,
* so the result won't overflow.
*
* In addition, because the sign of x - y and (x - y)/2 are the same, so we can
* get the sign of average(x , ~y), when x > y, the sign is 0, else sign is 1,
* so the result is !sign.
*
*/
y = ~y;
return !(((x & y) + ((x ^ y) >> 1)) >> 31);
}

这题的处理方法甚为巧妙,如果对两个数直接相减,则可能有溢出的风险,但如果换种方式求解其平均数,则不会发生溢出,这样两个数的大小就可以直接用符号位来判断了。

妙哉!(但还有个4步解题的,已经不是正常脑回路想得到的了… …)

satAdd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/*
* satAdd - adds two numbers but when positive overflow occurs, returns
* maximum possible value, and when negative overflow occurs,
* it returns minimum positive value.
* Examples: satAdd(0x40000000,0x40000000) = 0x7fffffff
* satAdd(0x80000000,0xffffffff) = 0x80000000
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 30
* Rating: 4
*/
int satAdd(int x, int y)
{
/**
* if x + y overflow, then sum has different sign with x and y.
* so if overflow, the value overFlow will be -1, overwise 0.
*
* In this case, if pos + pos = neg, then overFlow << 31 will be
* 0x80000000, and sum >> overFlow will be 0xfffffffff, and both XOR
* will be 0x7fffffff. Same if neg + neg = pos.
*
* If not overflow, overFlow << 31 is always 0, and sum >> overFlow is still
* sum. So the result is not changed.
*/
int sum = x + y;
int overFlow = ((sum ^ x) & (sum ^ y)) >> 31;
return (sum >> overFlow) ^ (overFlow << 31);
}

也是运用异或运算来产生两个特殊的最大值和最小值。

howManyBits

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* howManyBits - return the minimum number of bits required to represent x in
* two's complement
* Examples: howManyBits(12) = 5
* howManyBits(298) = 10
* howManyBits(-5) = 4
* howManyBits(0) = 1
* howManyBits(-1) = 1
* howManyBits(0x80000000) = 32
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 90
* Rating: 4
*/
int howManyBits(int x)
{
/**
* To get the minimum bits needed by x, just get the first '1'
* from high bit to low bit of x if x > 0 or ~x if x < 0.
*
* The main idea is binary search, first check the high 16 bits.
* If it's not 0, then x must >= 2^16, else x must < 2^16. Record
* 16 in value.
* then check the high 8 bits in the none zero zone of x. Loop
* until the high 1 bit. Finally add all of the recorded values and the
* sign bit to get the result.
*
*/
int b16, b8, b4, b2, b1, b0;
int sign = x >> 31;
x ^= sign;

b16 = !!(x >> 16) << 4;
x = x >> b16;
b8 = !!(x >> 8) << 3;
x = x >> b8;
b4 = !!(x >> 4) << 2;
x = x >> b4;
b2 = !!(x >> 2) << 1;
x = x >> b2;
b1 = (x >> 1);
x = x >> b1;
b0 = x;
return b16 + b8 + b4 + b2 + b1 + b0 + 1;
}

二分查找在位运算中的运用。

要判断x的最高位1在什么位置,先在较高的半位查找,如果确定在上半区域,则在这个区域继续查找,直到区域只剩下1个二进制位。

具体方法见代码注释。

ilog2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
* ilog2 - return floor(log base 2 of x), where x > 0
* Example: ilog2(16) = 4
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 90
* Rating: 4
*/
int ilog2(int x)
{
/**
* same idea as above problem
*/
int sum1, sum2, sum3, sum4, sum5;
sum1 = (!!(x >> 16)) << 4;
x = x >> sum1;
sum2 = (!!(x >> 8)) << 3;
x = x >> sum2;
sum3 = (!!(x >> 4)) << 2;
x = x >> sum3;
sum4 = (!!(x >> 2)) << 1;
x = x >> sum4;
sum5 = (x >> 1);
return sum1 + sum2 + sum3 + sum4 + sum5;
}

上个题的举一反三。

浮点数处理

按照规则取出符号,阶码和尾码就好,在做除法时要考虑舍入的问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
/* 
* float_half - Return bit-level equivalent of expression 0.5*f for
* floating point argument f.
* Both the argument and result are passed as unsigned int's, but
* they are to be interpreted as the bit-level representation of
* single-precision floating point values.
* When argument is NaN, return argument
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
*/
unsigned float_half(unsigned uf)
{
/**
* first get sign, exp digit and check if uf need to round off.
* if uf = inf or nan, return uf
* if exp = 0 or exp = 1, as well as uf is 0 or not format digit or
* exp = 1, move left the addition of tail digit and the round bit.
* and finally add the sign bit.
* In other cases, DEC exp digit.
*/
unsigned exp = uf & 0xff800000;
unsigned frac;
unsigned mask = 0x7fffff;
unsigned shift = 1;

switch (exp)
{
case 0x7f800000:
case 0xff800000:
return uf;
case 0:
case 0x80000000:
break;
default:
switch (sign_exp -= 0x800000)
{
case 0:
case 0x80000000:
mask = 0xfffff;
break;
default:
shift = 0;
}
}

frac = uf & mask;
frac = frac >> shift;

switch (uf & 3)
{
case 3:
frac += shift;
}

return sign_exp | frac;
}
/*
* float_f2i - Return bit-level equivalent of expression (int) f
* for floating point argument f.
* Argument is passed as unsigned int, but
* it is to be interpreted as the bit-level representation of a
* single-precision floating point value.
* Anything out of range (including NaN and infinity) should return
* 0x80000000u.
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
*/
int float_f2i(unsigned uf)
{
/**
* first get sign, exp and frac digit, then check if uf is over the range of
* unsigned int and return the certain value.
* If not, add the hidden bit and move left frac by 7 bit so that every bit
* won't lose even if the right steps of movement is maxinum -- 30.
* Finally, return the result according the correct sign.
*/
int sign = uf >> 31;
int exp = (uf >> 23) & 0xff;
int frac = uf & 0x007fffff;
int right = 157 - exp;
int abs;
if (exp < 127)
return 0;
if (exp > 157)
return 0x80000000;
abs = (0x40000000 + (frac << 7)) >> right;
if (sign)
return -abs;
else
return abs;
}
/*
* float_twice - Return bit-level equivalent of expression 2*f for
* floating point argument f.
* Both the argument and result are passed as unsigned int's, but
* they are to be interpreted as the bit-level representation of
* single-precision floating point values.
* When argument is NaN, return argument
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
*/
unsigned float_twice(unsigned uf)
{
/**
* first get exp and sign of uf, if exp is not 0, INC when exp is also
* not 255.
* else, it indicates that uf is 0 or not format digit, just double its
* frac and return the result according the correct sign.
*/
unsigned exp = uf & 0x7f800000;
unsigned sign = uf & 0x80000000;
if (exp)
{
if (exp != 0x7f800000)
uf = uf + 0x00800000;
}
else
uf = (uf << 1) | sign;
return uf;
}

深入底层-数据的按位运算
http://example.com/2020/10/23/深入底层-数据的按位运算/
作者
Chen Shuwen
发布于
2020年10月23日
许可协议