用于32位字旋转的ARM内联汇编

ARM inline assembly for 32-bit word rotate

本文关键字:汇编 ARM 32位 旋转 用于      更新时间:2023-10-16

我正在尝试制作一些内联汇编来测试ARM上旋转的性能。该代码是c++代码库的一部分,因此旋转是模板专门化。代码如下,但它产生的消息对我来说没有多大意义。

根据ARM汇编语言,指令大致如下:

# rotate - rotate instruction
# dst - output operand
# lhs - value to be rotated
# rhs - rotate amount (immediate or register)
<rotate> <dst>, <lhs>, <rhs>

它们没有多大意义,因为(对我来说),例如,我使用 g 来约束输出寄存器,这只是一个简单约束下的通用寄存器。ARM应该有很多这样的约束,而机器特定约束似乎并没有改变约束的行为。

我不知道最好的方法是什么,所以我要问三个问题:

  1. 我如何编码旋转时使用常数或直接值?
  2. 我如何编码旋转时使用的值通过寄存器传递?
  3. 拇指模式如何改变内联程序集

arm-linux-androideabi-g++ -DNDEBUG -g2 -Os -pipe -fPIC -mfloat-abi=softfp
-mfpu=vfpv3-d16 -mthumb --sysroot=/opt/android-ndk-r10e/platforms/android-21/arch-arm
-I/opt/android-ndk-r10e/sources/cxx-stl/stlport/stlport/ -c camellia.cpp
In file included from seckey.h:9:0,
             from camellia.h:9,
             from camellia.cpp:14:
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1137:72: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1137:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1145:72: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1145:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: invalid lvalue in asm output 0
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: invalid lvalue in asm output 0
misc.h:1121:71: error: matching constraint references invalid operand number

// ROL #n Rotate left immediate
template<> inline word32 rotlFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}
// ROR #n Rotate right immediate
template<> inline word32 rotrFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}
// ROR rn Rotate left by a register
template<> inline word32 rotlVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}
// ROR rn Rotate right by a register
template<> inline word32 rotrVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}
template<> inline word32 rotlMod<word32>(word32 x, unsigned int y)
{
    return rotlVariable<word32>(x, y);
}
template<> inline word32 rotrMod<word32>(word32 x, unsigned int y)
{
    return rotrVariable<word32>(x, y);
}

首先,ARM没有左旋(ROL),您需要通过ROR来模拟。

第二,M约束由于某种原因接受0到32,但是ROL在处理直接对象时只接受0到31。

第三,g约束太泛型了,因为它还允许ROR不接受的内存操作数。最好使用r

这是我想到的:

// Rotate right
inline word32 rotr(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (y));
    }
    return static_cast<word32>(z);
}
// Rotate left
inline word32 rotl(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (32 - y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (32 - y));
    }
    return static_cast<word32>(z);
}

我可以告诉你拇指模式处理位旋转非常不同。ARM模式有一个"换档器"你可以在不改变任何参数的情况下进行位移位或位旋转。因此,让我们考虑以下内容:

ADD r0,r0,r1 ror #1

这大致可以翻译为"右旋转r1一次,将其添加到r0,然后将结果存储在r0中。"你可以决定是否移动/旋转其中一个操作数,以及移动多少。虽然没有ROL,但是ROR #31等同于ROL #1(如果ARM拥有它)的作用,所以请充分利用这一点。

存储在r1中的实际值不会改变,移位/旋转仅在此指令期间有效。这只适用于ARM模式,在THUMB模式下,您将不得不使用其他处理器(如x86, 68000等)典型的传统移位/旋转命令。