如何在同一级别的大括号之间提取字符串

How do I extract a string between braces on the same level?

本文关键字:之间 字符串 提取 一级      更新时间:2023-10-16

想象一下,我有一个未知字符串,它遵循以下格式:

Blablabla
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
    2
    {
        "Sub Text"
         99
    }
}
Blablabla2
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
}

我需要能够从这个字符串中提取第一层分隔符({})之间的每个子字符串。因此,在本例中,运行以下函数:

ExtractStringBetweenDelimitersOnSameLevel(string, "{", "}")

应该从原始字符串中提取以下子字符串,然后返回:

    "Some Text"
    2
    {
        "Sub Text"
         99
    }

问题是,由于第二层分隔符,它返回了一个较短的字符串。

这是我的代码:

const int Count(
   const std::string& haystack,
   const std::string& needle,
   const int starting_index,
   const int maximum_index)
{
   int total = 0;
   int offset = starting_index;
   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      if (current_index >= maximum_index)
      {
         break;
      }
      total++;
      offset = static_cast<int>(current_index + needle.size());
   }
   return total;
}
const size_t FindNthDelimiter(
   const std::string& haystack,
   const std::string& needle,
   const int nth)
{
   int total_found = 0;
   int offset = 0;
   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      total_found++;
      offset = static_cast<int>(current_index) + 1;
      if (total_found == nth)
      {
         return offset;
      }
   }
   std::cout << "String does not have nth element." << std::endl;
   return offset;
}
std::string ExtractStringBetweenDelimitersOnSameLevel(
   std::string& original_string,
   const std::string& opening_delimiter,
   const std::string& closing_delimiter)
{
   // Find the first delimiter...
   const size_t first_delimiter = original_string.find(opening_delimiter);
   if (first_delimiter != std::string::npos)
   {
      const size_t second_delimiter = original_string.find(closing_delimiter);
      if (second_delimiter != std::string::npos)
      {
         // Total first delimiters found until first closed delimiter...
         int total_first_delimiters = Count(original_string, opening_delimiter, static_cast<int>(first_delimiter), static_cast<int>(second_delimiter));
         const size_t index_of_nth_closer = FindNthDelimiter(original_string, closing_delimiter, total_first_delimiters);
         std::string needle = original_string.substr(first_delimiter + opening_delimiter.size(), index_of_nth_closer - opening_delimiter.size() - 1);
         original_string.erase(first_delimiter, index_of_nth_closer + closing_delimiter.size());
         return needle;
      }
   }
   return "";
}

"你越是想得太多,管道就越容易堵住下水道。"——斯科蒂,《星际迷航III》

对于这样一个简单的任务,显示的代码看起来过于复杂了。

此外,它似乎甚至没有完全执行给定的任务。该任务被描述为提取每个顶级字符串:

第一层分隔符之间的每个子串

但显示的代码似乎只提取了第一个。试图弄清楚复杂的算法哪里出了问题是不值得的。重写它来完成整个任务更容易,只需原始大小的一半。至少对于根算法来说,这应该不需要超过十几行或两行代码。只提取第一个字符串的代码已经比这长了很多倍。

以下示例提取匹配的{}分隔符之间的每个顶级字符串,并将其返回给lambda回调。main()std::cout 提供打印每个字符串的样本lambda

#include <string>
#include <algorithm>
#include <iostream>
template<typename functor_type> void ExtractStringBetweenDelimitersOnSameLevel(
    const std::string &original_string,
    char opening_delimiter, // Should be '{'
    char closing_delimiter, // Should be '}'
    functor_type &&functor) // Lambda that receives each string.
{
    auto b=original_string.begin(), e=original_string.end(), p=b;
    int nesting_level=0;
    while (b != e)
    {
        if (*b == closing_delimiter)
        {
            if (nesting_level > 0 && --nesting_level == 0)
            {
                functor(std::string(p, b));
            }
        }
        if (*b++ == opening_delimiter)
        {
            if (nesting_level++ == 0)
                p=b;
        }
    }
}

int main()
{
    std::string search_string="n"
        "Blablablan"
        "{n"
        "    "Some Text"n"
        "    2n"
        "    {n"
        "        "Sub Text"n"
        "         99n"
        "    }n"
        "    2n"
        "    {n"
        "        "Sub Text"n"
        "         99n"
        "    }n"
        "}n"
        "Blablabla2n"
        "{n"
        "    "Some Text"n"
        "    2n"
        "    {n"
        "        "Sub Text"n"
        "         99n"
        "    }n"
        "}";
    ExtractStringBetweenDelimitersOnSameLevel
        (search_string,
         '{',
         '}',
         [](const std::string &string)
         {
             std::cout << "Extracted: " << string << std::endl;
         });
}

你的家庭作业是修改它来处理多字符分隔符。这也不应该复杂得多。