深入底层——数据的按位运算

简介

由于程序中的所有数在计算机内存中都是以二进制的形式储存的，因此要对这些数进行直接操作的话，只能按位来处理。不同于其他复杂的运算符，位运算消耗的时钟周期是最少的，因此如果要追求极致的效率，那么位运算是理想的选择。

基本运算符及性质

与运算&

”遇0清0，遇1不变“。只有两个二进制位都是1时运算后才为1，其他情况都为0。

与运算最常见地被用来清除数据的某些位数，或者用来判断某一位是否为1。

//将0011的末位清0
0011 & 0010 = 0010;
//判断x的第3位是否为1(index start from 0)
if x == 1100:
	x & 1000 = 1000(not zero)
if x == 0100:
	x & 1000 = 0000(zero)

或运算|

“遇1置1，遇0不变”。只有两个二进制位都为0时运算后才为0，其他情况都为1。

或运算可以用来把某些位置为1，但更常用来将两个错位的数据合并起来。

1 2	`//合并1100 和 0011 1100 \| 0011 = 11111`

非运算~

将数据的每一位取反。常用来计算补码

//-1转化为0
~1111 = 0000
//计算-1的补码
~1111 + 1 = 0001

异或运算^

如果两个二进制位不同，则结果为1，否则结果为0。

异或运算可以看成是不带进位的加法。

异或运算有一些很好的性质：

x ^ x = 0

x ^ 0 = x

交换律：x ^ a ^ x = x ^ x ^ a = a

结合律：a ^ b ^ c = a ^ (b ^ c)

//交换两个数（不能储存在同一个地址下）
x ^= y;
y ^= x;
x ^= y;

//找出藏在成对数据中的唯一数据
//如:1,1,2,3,3,4,4 -> 2
a = 0;
for i 0 -> length - 1:
	a ^= array[i];
return a;

左移<<

将数据的二进制位依次左移，相当于乘2。也经常和与运算结合起来取出特定位置的数据。

1
2
3

x * 2 == x << 1;
//取出x = 0101 1100的高4位
x &= -1 << 4;

右移>>

将数据的二进制位依次右移，相当于除以2。右移又分为逻辑右移和算术右移，算术右移会根据最高位来补位。

1	`x / 2 == x >> 1;`

题海拾贝

byteSwap

/* 
 * byteSwap - swaps the nth byte and the mth byte
 *  Examples: byteSwap(0x12345678, 1, 3) = 0x56341278
 *            byteSwap(0xDEADBEEF, 0, 2) = 0xDEEFBEAD
 *  You may assume that 0 <= n <= 3, 0 <= m <= 3
 *  Legal ops: ! ~ & ^ | + << >>
 *  Max ops: 25
 *  Rating: 2
 */
int byteSwap(int x, int n, int m)
{
  /**
   * first get the bytes needed to swap: n_byte and m_byte, which 
   * seperately present nth byte of x and mth byte of y.
   * then set y = n_byte ^ m_byte.
   * because y ^ n_byte = n_byte ^ m_byte ^ n_byte = m_byte,
   * same as y ^ m_byte, so x ^= y << n will set nth byte to m_byte,
   * and x ^= y << m will set mth byte to n_byte.
  */
  int y = 0;
  n = n << 3;
  m = m << 3;
  y = 0xff & ((x >> n) ^ (x >> m));
  x = x ^ (y << n);
  x = x ^ (y << m);
  return x;
}

可见，异或运算有时也能用来存储好几个数据，比如y存储了x的第n个字节和第m个字节，这样在重新和原始的x进行异或运算时，重复的数据被抵消掉了，留下的就是交换后的数据。

rotateRight

/* 
 * rotateRight - Rotate x to the right by n
 *   Can assume that 0 <= n <= 31
 *   Examples: rotateRight(0x87654321,4) = 0x18765432
 *   Legal ops: ~ & ^ | + << >> !
 *   Max ops: 25
 *   Rating: 3 
 */
int rotateRight(int x, int n)
{
  /**
   * x >> n presents the lower n bits, and (x ^ (x >> 31)) << 
   * (31 ^ n) << 1 presents the higher 32 - n bits. In order to
   * clean the 1 generated by moving right, you can XOR the sign of 
   * x.It's x ^ (x >> 31)
  */
  return (x >> n) ^ ((x ^ (x >> 31)) << (31 ^ n) << 1);
}

这里要注意的是，<<的运算符优先级是高于^的，这里有一个很巧妙的处理方法：(31 ^ n) << 1。因为如果要计算32 - n的话，如果使用补码来模拟减法运算，一则增加了运算符的个数，二则是，当n为0时，x << 32会变成 x<<1。而31是$2^5 - 1$即11111, 和n异或运算后刚好把n中的1全部消除了，就相当于31-n。最后补上<<1 代表实际右移了32 - n步。

另一个处理巧妙的地方在于使用符号位处理掉了x>>n产生的补充位，如果x是负数，则x ^ x>>31会让x每一位存储1，在于x>>n的高位异或之后刚好抵消。如果x是正数则没有影响。

logicalNeg

/* 
 * logicalNeg - implement the ! operator, using all of 
 *              the legal operators except !
 *   Examples: logicalNeg(3) = 0, logicalNeg(0) = 1
 *   Legal ops: ~ & ^ | + << >>
 *   Max ops: 12
 *   Rating: 4 
 */
int logicalNeg(int x)
{
  /**
   * If x == 0, then negX | x >> 31 = 0,
   * else it must be -1, so add 1 to the result is 
   * the final answer.
  */
  return (((~x + 1) | x) >> 31) + 1;
}

0 和非0的区别刚好可以通过补码来体现：只有0和0x8000 0000的补码才等于它们本身，而补码和自身或运算之后，又只有非0的数符号位为1，因此根据这个特点就能轻松区别0和非0数了。

isGreater

/* 
 * isGreater - if x > y  then return 1, else return 0 
 *   Example: isGreater(4,5) = 0, isGreater(5,4) = 1
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 24
 *   Rating: 3
 */
int isGreater(int x, int y)
{
  /**
   * first, we can get the average of x and y with no overflow happening
   * by using the expression: (x & y) + (x ^ y) >> 1.
   * Ideas as follows:
   * for these bits which are both 1 in x and y, then bits_x & bits_y = average()
   * eg: bits_x = 0100, bits_y = 0100, then average = bits_x & bits_y = 0100
   * for other bits, to get their average, we can use (bits_x ^ bits_y)>>1.
   * eg: bits_x = 0100 = 4, bits_y = 0010 = 2, then average = (0110)>>1 = 0011 = 3
   * So if we want to get the average of x and y, we can separate x and y to 
   * the two conditions above and deal with them with certain operator, then 
   * add them to get the correct result. Because this operation won't generate carrying,
   * so the result won't overflow.
   * 
   * In addition, because the sign of x - y and (x - y)/2 are the same, so we can 
   * get the sign of average(x , ~y), when x > y, the sign is 0, else sign is 1,
   * so the result is !sign.
   * 
  */
  y = ~y;
  return !(((x & y) + ((x ^ y) >> 1)) >> 31);
}

这题的处理方法甚为巧妙，如果对两个数直接相减，则可能有溢出的风险，但如果换种方式求解其平均数，则不会发生溢出，这样两个数的大小就可以直接用符号位来判断了。

妙哉！（但还有个4步解题的，已经不是正常脑回路想得到的了… …）

satAdd

/*
 * satAdd - adds two numbers but when positive overflow occurs, returns
 *          maximum possible value, and when negative overflow occurs,
 *          it returns minimum positive value.
 *   Examples: satAdd(0x40000000,0x40000000) = 0x7fffffff
 *             satAdd(0x80000000,0xffffffff) = 0x80000000
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 30
 *   Rating: 4
 */
int satAdd(int x, int y)
{
  /**
   * if x + y overflow, then sum has different sign with x and y.
   * so if overflow, the value overFlow will be -1, overwise 0.
   * 
   * In this case, if pos + pos = neg, then overFlow << 31 will be 
   * 0x80000000, and sum >> overFlow will be 0xfffffffff, and both XOR 
   * will be 0x7fffffff. Same if neg + neg = pos.
   * 
   * If not overflow, overFlow << 31 is always 0, and sum >> overFlow is still
   * sum. So the result is not changed.
  */
  int sum = x + y;
  int overFlow = ((sum ^ x) & (sum ^ y)) >> 31;
  return (sum >> overFlow) ^ (overFlow << 31);
}

也是运用异或运算来产生两个特殊的最大值和最小值。

howManyBits

/* howManyBits - return the minimum number of bits required to represent x in
 *             two's complement
 *  Examples: howManyBits(12) = 5
 *            howManyBits(298) = 10
 *            howManyBits(-5) = 4
 *            howManyBits(0)  = 1
 *            howManyBits(-1) = 1
 *            howManyBits(0x80000000) = 32
 *  Legal ops: ! ~ & ^ | + << >>
 *  Max ops: 90
 *  Rating: 4
 */
int howManyBits(int x)
{
  /**
   * To get the minimum bits needed by x, just get the first '1'
   * from high bit to low bit of x if x > 0 or ~x if x < 0.
   * 
   * The main idea is binary search, first check the high 16 bits.
   * If it's not 0, then x must >= 2^16, else x must < 2^16. Record 
   * 16 in value.
   * then check the high 8 bits in the none zero zone of x. Loop 
   * until the high 1 bit. Finally add all of the recorded values and the
   * sign bit to get the result.
   * 
  */
  int b16, b8, b4, b2, b1, b0;
  int sign = x >> 31;
  x ^= sign;

  b16 = !!(x >> 16) << 4;
  x = x >> b16;
  b8 = !!(x >> 8) << 3;
  x = x >> b8;
  b4 = !!(x >> 4) << 2;
  x = x >> b4;
  b2 = !!(x >> 2) << 1;
  x = x >> b2;
  b1 = (x >> 1);
  x = x >> b1;
  b0 = x;
  return b16 + b8 + b4 + b2 + b1 + b0 + 1;
}

二分查找在位运算中的运用。

要判断x的最高位1在什么位置，先在较高的半位查找，如果确定在上半区域，则在这个区域继续查找，直到区域只剩下1个二进制位。

具体方法见代码注释。

ilog2

/*
 * ilog2 - return floor(log base 2 of x), where x > 0
 *   Example: ilog2(16) = 4
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 90
 *   Rating: 4
 */
int ilog2(int x)
{
  /**
   * same idea as above problem
  */
  int sum1, sum2, sum3, sum4, sum5;
  sum1 = (!!(x >> 16)) << 4;
  x = x >> sum1;
  sum2 = (!!(x >> 8)) << 3;
  x = x >> sum2;
  sum3 = (!!(x >> 4)) << 2;
  x = x >> sum3;
  sum4 = (!!(x >> 2)) << 1;
  x = x >> sum4;
  sum5 = (x >> 1);
  return sum1 + sum2 + sum3 + sum4 + sum5;
}

上个题的举一反三。

浮点数处理

按照规则取出符号，阶码和尾码就好，在做除法时要考虑舍入的问题。

/* 
 * float_half - Return bit-level equivalent of expression 0.5*f for
 *   floating point argument f.
 *   Both the argument and result are passed as unsigned int's, but
 *   they are to be interpreted as the bit-level representation of
 *   single-precision floating point values.
 *   When argument is NaN, return argument
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
unsigned float_half(unsigned uf)
{
  /**
   * first get sign, exp digit and check if uf need to round off.
   * if uf = inf or nan, return uf
   * if exp = 0 or exp = 1, as well as uf is 0 or not format digit or 
   * exp = 1, move left the addition of tail digit and the round bit.
   * and finally add the sign bit.
   * In other cases, DEC exp digit.
  */
  unsigned exp = uf & 0xff800000;
  unsigned frac;
  unsigned mask = 0x7fffff;
  unsigned shift = 1;

  switch (exp)
  {
  case 0x7f800000:
  case 0xff800000:
    return uf;
  case 0:
  case 0x80000000:
    break;
  default:
    switch (sign_exp -= 0x800000)
    {
    case 0:
    case 0x80000000:
      mask = 0xfffff;
      break;
    default:
      shift = 0;
    }
  }

  frac = uf & mask;
  frac = frac >> shift;

  switch (uf & 3)
  {
  case 3:
    frac += shift;
  }

  return sign_exp | frac;
}
/* 
 * float_f2i - Return bit-level equivalent of expression (int) f
 *   for floating point argument f.
 *   Argument is passed as unsigned int, but
 *   it is to be interpreted as the bit-level representation of a
 *   single-precision floating point value.
 *   Anything out of range (including NaN and infinity) should return
 *   0x80000000u.
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
int float_f2i(unsigned uf)
{
  /**
   * first get sign, exp and frac digit, then check if uf is over the range of
   * unsigned int and return the certain value.
   * If not, add the hidden bit and move left frac by 7 bit so that every bit 
   * won't lose even if the right steps of movement is maxinum -- 30.
   * Finally, return the result according the correct sign.
  */
  int sign = uf >> 31;
  int exp = (uf >> 23) & 0xff;
  int frac = uf & 0x007fffff;
  int right = 157 - exp;
  int abs;
  if (exp < 127)
    return 0;
  if (exp > 157)
    return 0x80000000;
  abs = (0x40000000 + (frac << 7)) >> right;
  if (sign)
    return -abs;
  else
    return abs;
}
/* 
 * float_twice - Return bit-level equivalent of expression 2*f for
 *   floating point argument f.
 *   Both the argument and result are passed as unsigned int's, but
 *   they are to be interpreted as the bit-level representation of
 *   single-precision floating point values.
 *   When argument is NaN, return argument
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
unsigned float_twice(unsigned uf)
{
  /**
   * first get exp and sign of uf, if exp is not 0, INC when exp is also 
   * not 255.
   * else, it indicates that uf is 0 or not format digit, just double its 
   * frac and return the result according the correct sign.
  */
  unsigned exp = uf & 0x7f800000;
  unsigned sign = uf & 0x80000000;
  if (exp)
  {
    if (exp != 0x7f800000)
      uf = uf + 0x00800000;
  }
  else
    uf = (uf << 1) | sign;
  return uf;
}

位运算

#位运算

深入底层-数据的按位运算

http://example.com/2020/10/23/深入底层-数据的按位运算/

作者

Chen Shuwen

发布于

2020年10月23日

许可协议

大整数乘法上一篇

自底向上归并排序下一篇