Regex表达式无效

Regex expression not vaild

本文关键字:无效 表达式 Regex      更新时间:2023-10-16

我在让正则表达式工作时遇到了困难。我试图只导出字符串中的URL。这是字符串中的一些文本。pastebin.com/wA9N1Gbi。我尝试使用的正则表达式是

(?< protocol>https?://)(?:(?< urlroot>[^/?#ns]+))?(?< urlResource>[^?#ns]+)?(?< queryString>?(?:[^#ns]*))?(?:#(?< fragment>[^ns]))?

这里有一个链接regex101.com/r/bH1eS9/3

不幸的是,无法工作,编译时我收到以下错误"Historik.exe中0x7638DAE8处未处理的异常:内存位置0x0018ED9C处的Microsoft C++异常:std::regex_error。"。你们中有人知道我该怎么做吗?是否有其他regex函数可能更适合此任务?

此时此刻正在进行编码。提前谢谢。

string str;
std::ifstream in("c:/Users/Petrus/Documents/History", std::ios::binary);
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str())
unsigned counter = 0;
std::regex word_regex(
    R"((?<protocol>https?://)(?:(?<urlroot>[^/?#ns]+))?(?<urlResource>[^?#ns]+)?(?<queryString>?(?:[^#ns]*))?(?:#(?<fragment>[^ns]))?)",
    std::regex::extended
    );
auto words_begin = std::sregex_iterator(contents.begin(), contents.end(), word_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
    std::smatch match = *i;
    std::string match_str = match.str();
    for (const auto& res : match) {
        counter++;
        std::cout << counter++ << ": " << res << std::endl;
    }

您需要这样一个复杂的正则表达式吗?你能逃脱不那么严格的惩罚吗?

std::string load_file(const std::string& filename)
{
    std::ostringstream oss;
    if(auto ifs = std::ifstream(filename, std::ios::binary))
        oss << ifs.rdbuf();
    else
        throw std::runtime_error("Failed to open file: " + filename);
    return oss.str();
}
int main(int, const char* const*)
{
    std::string s = load_file("test.txt");
    // crude... but effective?
    std::regex e(R"(https?://[^/]+[[:print:][:punct:]]*)");
    auto itr = std::sregex_iterator(s.begin(), s.end(), e);
    auto end = std::sregex_iterator();
    unsigned counter = 0;
    for(; itr != end; ++itr)
        std::cout << ++counter << ": " << itr->str(0) << 'n';
}

输出:

1: http://boplats.vaxjo.se/
2: http://192.168.0.7/
3: http://old.honeynet.org/
4: http://old.honeynet.org/scans/scan15/som/som11.txt
5: http://en.hackdig.com/
6: http://parallelrecovery.com/pdf-password.html
7: http://digitalcorpora.org/corp
8: http://tv4play.se/program/nyhetsmorgon
9: http://bredbandskollen.se/
10: http://194.47.149.19/dv1482/Lab5/
...