如何计算QString开头的重复字符?

How to count recurring characters at the beginning of a QString?

本文关键字：字符开头 QString 何计算计算更新时间：2023-10-16

我正在处理一系列行，我需要计算开头出现的哈希值。

#  item 1
## item 1, 1
## item 1, 2
#  item 2

等等。

如果每一行都是 QString，如何返回字符串开头出现的哈希数？

QString s("### foo # bar ");
int numberOfHashes = s.count("#"); // Answer should be 3, not 4

简单：

int number_of_hashes(const QString &s) {
int i, l = s.size();
for(i = 0; i < l && s[i] == '#'; ++i);
return i;
}

在其他语言(主要是解释型语言(中，你必须担心迭代字符，因为它很慢，并将所有内容委托给库函数(通常用 C 编写(。在C++迭代在性能方面是完全可以的，所以脚踏实地的for循环就可以了。

只是为了好玩，我做了一个小的基准测试，将这个微不足道的方法与 OP 中QRegularExpression的方法进行比较，可能是缓存了 RE 对象。

#include <QCoreApplication>
#include <QString>
#include <vector>
#include <QElapsedTimer>
#include <stdlib.h>
#include <iostream>
#include <QRegularExpression>
int number_of_hashes(const QString &s) {
int i, l = s.size();
for(i = 0; i < l && s[i] == '#'; ++i);
return i;
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
const int count = 100000;
std::vector<QString> ss;
for(int i = 0; i < 100; ++i) ss.push_back(QString(rand() % 10, '#') + " foo ## bar ###");
QElapsedTimer t;
t.start();
unsigned tot = 0;
for(int i = 0; i < count; ++i) {
for(const QString &s: ss) tot += number_of_hashes(s);
}
std::cerr<<"plain loop: "<<t.elapsed()*1000./count<<" nsn";
t.restart();
for(int i = 0; i < count; ++i) {
for(const QString &s: ss) tot += QRegularExpression("^[#]*").match(s).capturedLength();
}
std::cerr<<"QRegularExpression, rebuilt every time: "<<t.elapsed()*1000./count<<" nsn";
QRegularExpression re("^[#]*");
t.restart();
for(int i = 0; i < count; ++i) {
for(const QString &s: ss) tot += re.match(s).capturedLength();
}
std::cerr<<"QRegularExpression, cached: "<<t.elapsed()*1000./count<<" nsn";
return tot;    
}

正如预期的那样，基于QRegularExpression的要慢两个数量级：

plain loop: 0.7 ns
QRegularExpression, rebuilt every time: 75.66 ns
QRegularExpression, cached: 24.5 ns

在这里，我使用标准算法find_if_not将迭代器获取到第一个不是哈希的字符。然后，我返回从字符串开头到该迭代器的距离。

int number_of_hashes(QString const& s)
{
auto it = std::find_if_not(std::begin(s), std::end(s), [](QChar c){return c == '#';});
return std::distance(std::begin(s), it);
}

编辑：find_if_not函数只接受一元谓词，而不是值，因此您必须传递 lambda 谓词。

int numberOfHashes = 0;
int size = s.size();
QChar ch('#');
for(int i = 0; (i < size) && (s[i] == ch); ++i) {
++numberOfHashes;
}

没有for 循环的解决方案：

QString s("### foo # bar ");
int numberOfHashes = QRegularExpression("^[#]*").match(s).capturedLength();

另一种方式：

int beginsWithCount(const QString &s, const QChar c) {
int n = 0;
for (auto ch : s)
if (c == ch) n++; else break;
return n;
}

一种Qt方法，利用QString::indexOf(..)：

QString s("### foo # bar ");
int numHashes = 0;
while ((numHashes = s.indexOf("#", numHashes)) == numHashes) {
++numHashes;
} // numHashes == 3

int QString::indexOf(const QString &str, int from = 0, 
Qt::CaseSensitivity cs = Qt::CaseSensitive) const
返回此字符串中第一次出现的字符串str的索引位置，从索引位置向前搜索from。如果未找到str则返回-1。

从索引0开始，字符串s被搜索#的第一个出现，然后使用谓词来测试这个出现是否在索引0。如果未终止，则继续执行索引1，依此类推。

但是，这不会缩短最终可能的完整字符串搜索。如果在预期位置找不到哈希，则在最终失败的谓词检查之前，将一次完全搜索该字符串(或直到第一个哈希位于错误的位置(。