快速填充一个单词

fast padded strcpy for a single word

本文关键字：一个单词填充更新时间：2023-10-16

我正在尝试编写一个非常便宜的C++代码片段来对一个简短的以空结尾的字符串执行以下操作。

输入是一个字符串，类似于 "ABC" .它以空结尾，最大长度为 4(或空终止符为 5

(。

输出将转到不以 null 结尾的char[4]，并且应在右侧填充空格。所以在这种情况下，这将是{'A','B','C',' '}

可以假设输入字符串正确以 null 结尾，因此无需读取输入的第二个单词来确保。 4 字节是最长的。

所以它周围的代码看起来像这样：

char* input = "AB";
char output[4];
// code snippet goes here
// afterward output will be populated with {'A','B',' ',' '}

这能做到多便宜？如果重要：我正在与：

Linux 2.6.32-358.11.1.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

最后，输入是单词对齐的。

像这样的事情怎么样：

typedef unsigned int word;
int spacePad(word input) {
    static const word spaces = 0x20202020;
    word mask =
       !input ?                0 :
       !(input & 0x00ffffff) ? 0xff:
       !(input & 0x0000ffff) ? 0xffff :
       !(input & 0x0000ff)   ? 0xffffff :
                               0xffffffff;
    // or without branches
    word branchless_mask =
       1u << (8 * (
         bool(input & 0xff000000) +
         bool(input & 0x00ff0000) +
         bool(input & 0x0000ff00) +
         bool(input & 0x000000ff)
       ));
    return (spaces & mask) | (input & ~mask);
}

如果我没有搞砸，spacePad(0xaabb0000) 0xaabb2020.

代替计算和掩码，您可以使用SSE

内部函数，这可能会更快，因为您会在几条指令中获得掩码，然后掩码移动将完成其余的工作，但是编译器可能会将您的变量从SSE移动到标准寄存器，这可能会超过轻微的收益。这完全取决于您需要处理多少数据，如何在内存中打包等。

如果输入是char*而不是int，通常需要额外的代码，因为强制转换可以读取未分配的内存。但是，由于您提到所有字符串都是单词对齐的，因此强制转换就足够了，实际上即使有一些未分配的字节，它们与至少一个分配的字节位于同一单词上。由于您只是在阅读，因此没有内存损坏的风险，并且在我所知道的所有体系结构上，硬件内存保护的粒度大于一个字。例如，在 x86 上，内存页通常以 4k 对齐。

现在这一切都很好，很笨拙，但是：在选择解决方案之前，对其进行基准测试，这是知道哪个最适合您的唯一方法(当然，除了像这样编写代码的温暖模糊感觉^^(

如果速度是你的问题 - 使用蛮力。

这不会进入其边界之外的input，也不会破坏它。

 const char* input = TBD();
 char output[4] = {' '};
 if (input[0]) {
   output[0] = input[0];
   if (input[1]) {
     output[1] = input[1];
     if (input[2]) {
       output[2] = input[2];
       if (input[3]) {
         output[3] = input[3];
       }
     }
   }
 }

char* input = "AB";
char output[4];
input += (output[0] = *input ? *input : ' ') != ' ';
input += (output[1] = *input ? *input : ' ') != ' ';
input += (output[2] = *input ? *input : ' ') != ' ';
output[3] = *input ? *input : ' ';

请注意，这会破坏原始input指针，因此如果需要保留它，请复制它。

对于这样的短字符串，我认为您不能比琐碎的实现做得更好：

char buffer[4];
const char * input = "AB";
const char * in = input;
char * out = buffer;
char * end = buffer + sizeof buffer;
while (out < end)
{
    *out = *in != 0 ? *in++ : ' ';
    out++;
}

如果您的输入以 null 终止，一个简单的strcpy就足够了。memcpy 速度更快，但会复制它在空字符之后找到的任何垃圾。

您正在寻找memcpy：

char* input = "AB";
char output[4];
memcpy(output, input, 4);

如果您的输入是可变的，则需要先计算大小：

char* input = "AB";
std::size_t len = strlen(input);
char output[4] = {' ', ' ', ' ', ' '};
memcpy(output, input, std::min(4, len));