C++,转换字符串,使连续下划线序列变为单个下划线
C++, Transform a string such that sequences of consecutive underscores become a single underscore
在C++中,编写一个函数来转换字符串,使连续下划线序列成为单个下划线。 例如 ( ‘_hello___world__’
=> ‘_hello_world_’
)。
与以下问题相关:在 c++ 中将多个字符合并为一个字符。 ABCBC -> ABC
使用 erase/unique
和 C++11 lambda。
#include <algorithm>
#include <iostream>
#include <string>
int main()
{
std::string text("_hello___world__");
text.erase(
std::unique(
text.begin(),
text.end(),
[](char a, char b){ return (a == b) && (a == '_'); }
),
text.end()
);
std::cout << text << 'n';
return 0;
}
如果您不想使用 lambda,您可以定义一个函子,例如:
class both_equal_to
{
char value;
public:
both_equal_to(char ch) : value(ch) {}
bool operator()(char first, char second) const
{
return (first == second) && (first == value);
}
};
然后将 lambda 替换为 both_equal_to('_')
.
如果您只使用 char*
并且不想支付构造std::string
的成本,则以下代码更接近 RolandXu 的代码。
char *convert_underscores(char *str)
{
if (str)
{
size_t length = strlen(str);
char *end = std::unique(str, str + length, both_equal_to('_'));
*end = ' ';
}
return str;
}
没有库:
#include <stdio.h>
char *RemoveUnderScore(char *src)
{
char* readIndex;
char* writeIndex;
if (src==NULL) return NULL;
readIndex = writeIndex = src;
while (*readIndex != ' ')
{
while(*readIndex !='_' && *readIndex != ' ')
*writeIndex++ = *readIndex++;
if (*readIndex != ' ')
*writeIndex++ = *readIndex++;
while (*readIndex == '_')
readIndex++;
}
*writeIndex = ' ';
return src;
}
int main(int argc,char** argv){
char str[] = "_hello___worl__d___";
printf(RemoveUnderScore(str));
return 0;
}
这篇文章比较了此页面上提交的方法的速度。 我在 40 个字符的字符串上运行了该函数一百万次,并计时了每种算法花费的时间。
所需时间 |使用的算法
0.2 秒 |RolandXu的版本使用char*,杂耍指向char的指针。
0.4 秒 |高炉的版本第二部分,函子,无弦。
2.7 秒 |高炉的版本第一部分,带有函子和字符串。
8.7 秒 |Eric L 的版本在字符串上循环,确实找到 __ 替换为 _。
11.0 秒 |Eric L 的版本循环遍历每个字符并组装一个字符串。
11.8 秒 |杰瑞·科芬的版本与remove_copy_if。
C++代码来证明上述基准和计时:
#include <iostream>
#include <cstdio>
#include <ctime>
#include <cstring>
#include <algorithm>
using namespace std;
string convert_underscores_by_EricL_using_string_replace(string str){
//Cons:
//This is the very slowest algorithm because the complexity of this operation
//is O(n^2) and possibly higher with the overhead of string conversions.
//Pros:
//This is the function is the most concise, needing only 3 lines of code.
while(str.find("__") != string::npos){
str = str.replace(str.find("__"), 2, "_");
}
return str;
}
string convert_underscores_EricL_loop_over_a_string_and_remove_repeats(string str){
//CONS:
//Using a string really slows things down. Algorithm is too slow.
//Not the most concise solution, 8 lines.
//Has lots of ugly conditionals, x, and x-1, confusing to look at.
//PROS:
//Fastest function of those tested.
int len = str.length();
string result = "";
if (len < 2) return str;
result += str[0];
for(int x = 1; x < len; x++){
if (str[x] != str[x-1] || str[x] != '_')
result += str[x];
}
return result;
}
class repeated_by_jerry_coffin {
char prev;
char val;
public:
repeated_by_jerry_coffin(char ch) : val(ch), prev(0) {}
bool operator()(char ch) {
bool ret = prev == val && ch == val;
prev = ch;
return ret;
}
};
string convert_underscores_jerry_coffins_with_remove_copy_if(string str){
//CONS:
//Algorithm is the 2nd slowest algorithm.
//PROS:
//Concise, intuitive, needing only 4 lines.
//Offloads the heavy lifting to std builtin methods: std::remove_copy_if and std::back_inserter
string result = "";
std::remove_copy_if(str.begin(), str.end(),
std::back_inserter(result),
repeated_by_jerry_coffin('_'));
return result;
}
char* convert_underscores_by_RolandXu_using_char_stars_and_pointers(char *src){
//CONS:
//You have to get your hands dirty with C++ pointers.
//PROS:
//Fastest function of those tested.
char* readIndex;
char* writeIndex;
if (src==NULL) return NULL;
readIndex = writeIndex = src;
while (*readIndex != ' ')
{
while(*readIndex !='_' && *readIndex != ' ')
*writeIndex++ = *readIndex++;
if (*readIndex != ' ')
*writeIndex++ = *readIndex++;
while (*readIndex == '_')
readIndex++;
}
*writeIndex = ' ';
return src;
}
class both_equal_to__blastfurnace_version1{
char value;
public:
both_equal_to__blastfurnace_version1(char ch) : value(ch) {}
bool operator()(char first, char second) const{
return (first == second) && (first == value);
}
};
string convert_underscores_blastfurnace_version1_with_functor(string str){
//CONS:
//You have to create an entirely new class that overloads an operator.
//The speed is harmed by the usage of string.
//PROS:
//Don't need to roll your own loops with pointers.
str.erase(
std::unique(
str.begin(),
str.end(),
both_equal_to__blastfurnace_version1('_')
),
str.end()
);
return str;
}
class both_equal_to_blastfurnace_version2{
char value;
public:
both_equal_to_blastfurnace_version2(char ch) : value(ch) {}
bool operator()(char first, char second) const{
return (first == second) && (first == value);
}
};
char *convert_underscores_blastfurnace_version2_std_unique_and_no_string(char *str){
//CONS:
//No problems.
//PROS:
//More concise/intuitive than the fastest version and nearly as fast. Winner!
if (str){
size_t length = strlen(str);
char *end = std::unique(str, str + length, both_equal_to_blastfurnace_version2('_'));
*end = ' ';
}
return str;
}
void assertCharStarEquals(char* a, char* b, string msg){
if (strcmp(a, b) == 0) cout<<"passed" << endl;
else cout << "Failed" << msg << " should be: '" << a << "' it returned: '" << b << "'" << endl;
}
void assertStringEquals(string a, string b, string msg){
if (a == b) cout<<"passed" << endl;
else cout << "Failed" << msg << " should be: '" << a << "' it returned: '" << b << "'" << endl;
}
void test01_convert_underscores_by_RolandXu_using_char_stars_and_pointers(int numtests, string str){
char mystr[str.length()];
strcpy(mystr, str.c_str());
clock_t start = clock();
int x = 0;
while(x < numtests){
char* s = convert_underscores_by_RolandXu_using_char_stars_and_pointers(mystr);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout << diff << " RolandXu's version using char*. " << 'n';
}
void test02_convert_underscores_blastfurnace_version2_std_unique_and_no_string(int numtests, string str){
char mystr[str.length()];
strcpy(mystr, str.c_str());
clock_t start = clock();
int x = 0;
while(x < numtests){
char* val = convert_underscores_blastfurnace_version2_std_unique_and_no_string(mystr);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout << diff << " Blastfurnace's version part two, functor, without string. " <<endl;
}
void test03_convert_underscores_blastfurnace_version1_with_functor(int numtests, string str){
clock_t start = clock();
int x = 0;
while(x < numtests){
string s = convert_underscores_blastfurnace_version1_with_functor(str);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout << diff << " Blastfurnace's version part one with the functor and string. " <<endl;
}
void test04_convert_underscores_by_EricL_using_string_replace(int numtests, string str){
char mystr[str.length()];
strcpy(mystr, str.c_str());
clock_t start = clock();
int x = 0;
while(x < numtests){
string s = convert_underscores_by_EricL_using_string_replace(mystr);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout<< diff << " Eric L's version looping over the string doing find double underscore replace with single underscore. " <<endl;
}
void test05_convert_underscores_EricL_loop_over_a_string_and_remove_repeats(int numtests, string str){
char mystr[str.length()];
strcpy(mystr, str.c_str());
clock_t start = clock();
int x = 0;
while(x < numtests){
string s = convert_underscores_EricL_loop_over_a_string_and_remove_repeats(mystr);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout << diff << " Eric L's version looping over each char and assembling a string. "<< endl;
}
void test06_convert_underscores_jerry_coffins_with_remove_copy_if(int numtests, string str){
clock_t start = clock();
int x = 0;
while(x < numtests){
string s = convert_underscores_jerry_coffins_with_remove_copy_if(str);
x++;
}
double diff = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout<< diff << " Jerry Coffin's version with remove_copy_if. " <<endl;
}
int main(){
cout << "GO!n";
int numtests = 1000000;
string test_string = "__aa_a_aaa_--__&___aa_234______3)_!___<_";
cout << "Time | Algorithm Used.n";
test01_convert_underscores_by_RolandXu_using_char_stars_and_pointers(numtests, test_string);
test02_convert_underscores_blastfurnace_version2_std_unique_and_no_string(numtests, test_string);
test03_convert_underscores_blastfurnace_version1_with_functor(numtests, test_string);
test04_convert_underscores_by_EricL_using_string_replace(numtests, test_string);
test05_convert_underscores_EricL_loop_over_a_string_and_remove_repeats(numtests, test_string);
test06_convert_underscores_jerry_coffins_with_remove_copy_if(numtests, test_string);
//Produces the following output:
//Extra assertion testing to make sure everyone's algorithm is correct:
char in[30];
char out[30];
strcpy(in, "a__");
strcpy(out, "a_");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "01");
strcpy(in, "a_");
strcpy(out, "a_");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "02");
strcpy(in, "_______");
strcpy(out, "_");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "03");
strcpy(in, "__a");
strcpy(out, "_a");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "04");
strcpy(in, "_hello___world__");
strcpy(out, "_hello_world_");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "05");
strcpy(in, "");
strcpy(out, "");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "06");
strcpy(in, " __ ");
strcpy(out, " _ ");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "07");
strcpy(in, "U+221E");
strcpy(out, "U+221E");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "08");
strcpy(in, "__u00b2__");
strcpy(out, "_u00b2_");
assertCharStarEquals(out, convert_underscores_by_RolandXu_using_char_stars_and_pointers(in), "09");
cout<< "OKn";
strcpy(in, "a__");
strcpy(out, "a_");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "01");
strcpy(in, "a_");
strcpy(out, "a_");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "02");
strcpy(in, "_______");
strcpy(out, "_");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "03");
strcpy(in, "__a");
strcpy(out, "_a");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "04");
strcpy(in, "_hello___world__");
strcpy(out, "_hello_world_");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "05");
strcpy(in, "");
strcpy(out, "");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "06");
strcpy(in, " __ ");
strcpy(out, " _ ");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "07");
strcpy(in, "U+221E");
strcpy(out, "U+221E");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "08");
strcpy(in, "__u00b2__");
strcpy(out, "_u00b2_");
assertCharStarEquals(out, convert_underscores_blastfurnace_version2_std_unique_and_no_string(in), "09");
cout<< "OKn";
string in_s = "a__";
string out_s = "a_";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "01");
in_s = "a_";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "02");
in_s = "_______";
out_s = "_";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "03");
in_s = "__a";
out_s = "_a";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "04");
in_s = "_hello___world__";
out_s = "_hello_world_";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "05");
in_s = "";
out_s = "";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "06");
in_s = " __ ";
out_s = " _ ";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "07");
in_s = "U+221E";
out_s = "U+221E";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "08");
in_s = "__u00b2__";
out_s = "_u00b2_";
assertStringEquals(out_s, convert_underscores_blastfurnace_version1_with_functor(in_s), "09");
cout<< "OKn";
in_s = "a__";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "01");
in_s = "a_";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "02");
in_s = "_______";
out_s = "_";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "03");
in_s = "__a";
out_s = "_a";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "04");
in_s = "_hello___world__";
out_s = "_hello_world_";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "05");
in_s = "";
out_s = "";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "06");
in_s = " __ ";
out_s = " _ ";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "07");
in_s = "U+221E";
out_s = "U+221E";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "08");
in_s = "__u00b2__";
out_s = "_u00b2_";
assertStringEquals(out_s, convert_underscores_by_EricL_using_string_replace(in_s), "09");
cout<< "OKn";
in_s = "a__";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "01");
in_s = "a_";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "02");
in_s = "_______";
out_s = "_";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "03");
in_s = "__a";
out_s = "_a";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "04");
in_s = "_hello___world__";
out_s = "_hello_world_";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "05");
in_s = "";
out_s = "";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "06");
in_s = " __ ";
out_s = " _ ";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "07");
in_s = "U+221E";
out_s = "U+221E";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "08");
in_s = "__u00b2__";
out_s = "_u00b2_";
assertStringEquals(out_s, convert_underscores_EricL_loop_over_a_string_and_remove_repeats(in_s), "09");
cout<< "OKn";
in_s = "a__";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "01");
in_s = "a_";
out_s = "a_";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "02");
in_s = "_______";
out_s = "_";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "03");
in_s = "__a";
out_s = "_a";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "04");
in_s = "_hello___world__";
out_s = "_hello_world_";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "05");
in_s = "";
out_s = "";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "06");
in_s = " __ ";
out_s = " _ ";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "07");
in_s = "U+221E";
out_s = "U+221E";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "08");
in_s = "__u00b2__";
out_s = "_u00b2_";
assertStringEquals(out_s, convert_underscores_jerry_coffins_with_remove_copy_if(in_s), "09");
return 0;
}
我们学到了什么?
对于 c++ 字符串,'str.length()' 真的很重,因为编译器尽职尽责地逐个遍历字符串的内存,在内存中找到字符串的末尾,边走边算。 不要使用字符串,也不要使用 str.length();
使用 str.end() 也会调用 O(n) 性能命中,与项目 1 相同。 不要使用 str.end();
在字符数组上使用 std::unique 是经过优化且快如闪电的。 比带有字符串连接的 for 循环快一个数量级。
在C++中,执行 mystring[x] 会导致在内存中查找该槽的内存,这需要很长时间,并且比使用指针并将 1 添加到指针要慢得多。 不要把 mystring[x] 放在迭代 x 的循环中。
如果必须使用字符串,请不要输入 mystring += 另一个字符串[x]; 每次运行此行时,字符串都必须逐个遍历整个字符串。
不要连接字符串,获取内存块,定义指针,在指针中放置字符,然后递增指针。 循环和与字符串的串联调用 O(n) 复杂性。
这一天学到了很多东西,好歌会唱出来。
我会使用 std::remove_copy_if
,像这样:
char prev;
std::remove_copy_if(input.begin(), input.end(),
std::back_inserter(result),
[&prev] (char ch) ->bool {
bool ret=prev=='_' && ch == '_';
prev=ch;
return ret;
});
或者,如果您坚持使用 C++03,您可以使用显式函子而不是 lambda 执行相同的操作:
class repeated {
char prev;
char val;
public:
repeated(char ch) : val(ch), prev(0) {}
bool operator()(char ch) {
bool ret = prev == val && ch == val;
prev = ch;
return ret;
}
};
std::remove_copy_if(input.begin(), input.end(),
std::back_inserter(result),
repeated('_'));
无论哪种方式,它只将每个字符从输入复制到输出(最多)一次,其中使用 std::string::replace
复制每个字符的频率与其左侧重复下划线的数量一样多。
编辑:虽然看着@blastfurnace的答案,但我不确定我真的会使用它 - std::unique
可能更适合这份工作。
在C++中,我给出的内容非常低效:
string convert_underscores(string str){
int len = str.length();
//bad, str.length() incurs O(n) complexity.
string result = "";
//bad, don't use string. string makes everything slow.
if (len < 2) return str;
result += str[0];
//bad bad bad, string concatenation incurs O(n) complexity.
for(int x = 1; x < len; x++){
//This for loop incurs unnecessary management overhead.
if (str[x] != str[x-1] || str[x] != '_'){
result += str[x];
//concatenation extremely evil here: costs O(n)
//A lookup on str[x] costs time.
}
}
return result;
//returning a string bad, the entire memory is moved
//instead of just a pointer to the first slot.
}
甚至效率更低 使用str.replace。
string convert_underscores(string str){
while(str.find("__") != string::npos){
str = str.replace(str.find("__"), 2, "_");
}
return str;
}
//find incurs unnecessary O(n) complexity. string copy incurs
//a full memory movement on contents of string.
//n passes are required over the string.
- VSCode-有一个红色下划线,但程序构建和运行正确,并且出现配音错误
- 下划线不会与"发送输入"一起显示C++
- 在 QTextEdit C++ 中为特定行添加下划线
- 导出函数中有多少下划线('_')(C++
- 有没有选项不自动所有前导下划线 _ 以导出 emscripten 中的函数?
- 删除数字并在C++中保留字符串的下划线
- c++ 函数中额外的下划线名称
- 在C 中,Haskell下划线有其他选择吗?
- 使用与号后跟下划线命名的变量是什么意思?
- C++20 中的严格别名规则是否允许标准 c++ unicode 字符和下划线类型之间"reinterpret
- 用下划线填充空格
- 在Typedef结构中,下划线意味着什么
- 为什么非下划线名称保留给 UDL 的实现,而不是相反
- 下划线、名称和文字运算符
- 检查两个彼此相邻的两个下划线
- 使用查找检查,如果我的字符串中有下划线
- 代码的错误答案是在Java Camel案件和C 下划线标识符之间转换的错误答案
- 用户定义的文字、下划线和全局名称
- 如何在MFC C++中给文本加下划线
- C++,转换字符串,使连续下划线序列变为单个下划线