使用结束符的位置作为交换空间反转以空结尾的字符串

C++ Reverse a null terminated string using the position of the terminator as swap space

本文关键字：字符串结尾交换空间结束位置更新时间：2023-10-16

我正在研究经典的"反向字符串"问题。

使用空终止符的位置交换空间是一个好主意吗?这样做的目的是保存一个变量的声明。

具体来说，从Kernighan和Ritchie算法开始:

void reverse(char s[])
{
    int length = strlen(s);
    int c, i, j;
    for (i = 0, j = length - 1; i < j; i++, j--) 
    {
        c = s[i];
        s[i] = s[j];
        s[j] = c;
    }
}

…我们可以这样做吗?

void reverseUsingNullPosition(char s[]) {
    int length = strlen(s);
    int i, j;
    for (i = 0, j = length - 1; i < j; i++, j--) {
        s[length] = s[i]; // Use last position instead of a new var
        s[i] = s[j];
        s[j] = s[length];
    }
    s[length] = 0; // Replace null character
}

注意"c"变量是如何不再需要的。我们只使用数组中的最后一个位置——null终止所在的位置——作为交换空间。完成后，只需替换0。

下面是主例程(Xcode):

#include <stdio.h>
#include <string>
int main(int argc, const char * argv[]) {
    char cheese[] = { 'c' , 'h' , 'e' , 'd' , 'd' , 'a' , 'r' , 0 };
    printf("Cheese is: %sn", cheese); //-> Cheese is: cheddar
    reverse(cheese);
    printf("Cheese is: %sn", cheese); //-> Cheese is: raddehc
    reverseUsingNullPosition(cheese);
    printf("Cheese is: %sn", cheese); //-> Cheese is: cheddar
}

是的，这可以做到。不，这不是一个好主意，因为它使你的程序更难优化。

当您在局部作用域中声明char c时，优化器可以计算出该值在s[j] = c;赋值之外没有被使用，并且可以将临时值放在寄存器中。除了有效地为您消除变量之外，优化器甚至可以发现您正在执行交换，并发出特定于硬件的指令。所有这些将节省每个字符的内存访问。

当您使用s[length]为您的临时，优化器没有那么多的自由。它被强制向内存发出写操作。由于缓存，这可能会一样快，但在嵌入式平台上，这可能会有显著的影响。

首先这样的微优化是完全不相关的，直到证明相关。我们说的是c++，你有std::string, std::reverse，你甚至不应该考虑这些事实。

无论如何，如果你在Xcode上用- o编译这两个代码，你会得到reverse:

.cfi_startproc
Lfunc_begin0:
    pushq   %rbp
Ltmp3:
    .cfi_def_cfa_offset 16
Ltmp4:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp5:
    .cfi_def_cfa_register %rbp
    pushq   %r14
    pushq   %rbx
Ltmp6:
    .cfi_offset %rbx, -32
Ltmp7:
    .cfi_offset %r14, -24
    movq    %rdi, %r14
Ltmp8:
    callq   _strlen
Ltmp9:
    leal    -1(%rax), %ecx
    testl   %ecx, %ecx
    jle LBB0_3
Ltmp10:
    movslq  %ecx, %rcx
    addl    $-2, %eax
Ltmp11:
    xorl    %edx, %edx
LBB0_2:
Ltmp12:
    movb    (%r14,%rdx), %sil
    movb    (%r14,%rcx), %bl
    movb    %bl, (%r14,%rdx)
    movb    %sil, (%r14,%rcx)
Ltmp13:
    incq    %rdx
    decq    %rcx
    cmpl    %eax, %edx
    leal    -1(%rax), %eax
    jl  LBB0_2
Ltmp14:
LBB0_3:
    popq    %rbx
    popq    %r14
    popq    %rbp
    ret
Ltmp15:
Lfunc_end0:
    .cfi_endproc

对于reverseUsingNullPosition:

    .cfi_startproc
Lfunc_begin1:
    pushq   %rbp
Ltmp19:
    .cfi_def_cfa_offset 16
Ltmp20:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp21:
    .cfi_def_cfa_register %rbp
    pushq   %rbx
    pushq   %rax
Ltmp22:
    .cfi_offset %rbx, -24
    movq    %rdi, %rbx
Ltmp23:
    callq   _strlen
Ltmp24:
    leal    -1(%rax), %edx
    testl   %edx, %edx
Ltmp25:
    movslq  %eax, %rdi
    jle LBB1_3
Ltmp26:
    movslq  %edx, %rdx
    addl    $-2, %eax
Ltmp27:
    xorl    %esi, %esi
LBB1_2:
Ltmp28:
    movb    (%rbx,%rsi), %cl
    movb    %cl, (%rbx,%rdi)
    movb    (%rbx,%rdx), %cl
    movb    %cl, (%rbx,%rsi)
    movb    (%rbx,%rdi), %cl
    movb    %cl, (%rbx,%rdx)
Ltmp29:
    incq    %rsi
    decq    %rdx
    cmpl    %eax, %esi
    leal    -1(%rax), %eax
    jl  LBB1_2
Ltmp30:
LBB1_3:                                 ## %._crit_edge
    movb    $0, (%rbx,%rdi)
    addq    $8, %rsp
    popq    %rbx
Ltmp31:
    popq    %rbp
    ret
Ltmp32:
Lfunc_end1:
    .cfi_endproc

如果你检查内循环，你有

movb    (%r14,%rdx), %sil
movb    (%r14,%rcx), %bl
movb    %bl, (%r14,%rdx)
movb    %sil, (%r14,%rcx)

和

movb    (%rbx,%rsi), %cl
movb    %cl, (%rbx,%rdi)
movb    (%rbx,%rdx), %cl
movb    %cl, (%rbx,%rsi)
movb    (%rbx,%rdi), %cl
movb    %cl, (%rbx,%rdx)

所以我不会说你节省了那么多开销(因为你访问数组的次数更多)，也许是，也许不是。这教会了你另一件事:认为一些代码比其他代码性能更好是无关紧要的，唯一重要的是一个做得很好的基准测试和代码概要

Legal: Yes

好主意:不

"额外"变量的成本为零，因此绝对没有理由避免使用它。栈指针无论如何都需要改变，所以它是否需要处理一个额外的int也没关系。

进一步:

打开编译器优化后，原始代码中的变量c很可能根本不存在。它只是cpu中的一个寄存器。

对于你的代码:优化将更加困难，所以很难说编译器会做得有多好。也许你会得到同样的，也许你会得到更糟的。但是你不会得到更好的。

所以忘掉这个想法吧

我们可以使用printf和STL，也可以手动展开东西和使用指针。

#include <stdio.h>
#include <string>
#include <cstring>
void reverse(char s[])
{
  char * b=s;
  char * e=s+::strlen(s)-4;
  while (e - b > 4)
  {
    std::swap(b[0], e[3]);
    std::swap(b[1], e[2]);
    std::swap(b[2], e[1]);
    std::swap(b[3], e[0]);
    b+=4;
    e-=4;
  }
  e+=3;
  while (b < e)
  {
    std::swap(*(b++), *(e--));
  }
}
int main(int argc, const char * argv[]) {
    char cheese[] = { 'c' , 'h' , 'e' , 'd' , 'd' , 'a' , 'r' , 0 };
    printf("Cheese is: %sn", cheese); //-> Cheese is: cheddar
    reverse(cheese);
    printf("Cheese is: %sn", cheese); //-> Cheese is: raddehc
}

仅用"cheddar"这个测试用例很难说它是否更快