这些方法中哪些是可能的/更有效的

Which of these methods are possible/more efficient

本文关键字：有效方法更新时间：2023-10-16

我有一个文本文件，格式如下

ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]

有没有办法从可能的35000多条线中同时得到x，y点，这些线看起来像上面的线，每一条线？如果是这样，这是最好的方法吗？

或者，

在每行上使用getline方法，然后使用boost:：regex解析该行更好吗？

我需要得到x，y点，并将它们填充到一个浮点数组中。

我一直在使用boost:：regex来满足我的需求，但它涉及到我一次执行每一行。我不知道它的效率有多高，所以我想知道是否有更好的解决方案。如果没有，我可以继续我一直在做的事情。

这个解决方案必须用c++完成。

以下是使用Boost Spirit X3和映射文件的照片。

struct Point { double x, y, z; };
template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);
    using namespace boost::spirit::x3;
    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}

Spirit是一个解析器生成器，因此它会根据表达式为您生成解析代码（例如'x' >> 'y' >> 'z' >> eol，以匹配标题行）。

与正则表达式不同，Spirit知道如何处理和转换值，因此您可以使用例如vector<Point>:

int main()
{
    std::vector<Point> v;
    if (parse("input.txt", v)) {
        std::cout << "Parsed " << v.size() << " elementsn";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}n";
        }
    } else {
        std::cout << "Parse failedn";
    } 
}

完整演示

在这里，程序解析本身，并嵌入问题的样本数据：

在Coliru上直播

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
struct Point { double x, y, z; };
BOOST_FUSION_ADAPT_STRUCT(Point,x,y,z)
template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);
    using namespace boost::spirit::x3;
    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}
int main()
{
    std::vector<Point> v;
    if (parse("main.cpp", v)) {
        std::cout << "Parsed " << v.size() << " elementsn";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}n";
        }
    } else {
        std::cout << "Parse failedn";
    } 
}
#if DATA
ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]
#endif

打印

Parsed 9 elements
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}

还没有人回答，所以我试一试。你没有用正则表达式发布你的解决方案，所以我无法比较性能。我推测我的代码可能会快一点。

struct Point
{
    float x;
    float y;
};
void transform_string( std::string& str )
{
    auto i { std::find( std::begin( str ), std::end( str ), '[' ) };
    std::remove( std::begin( str ), i, '-' );
    std::remove_if(
        std::begin( str ),
        std::end( str ),
        [] ( char c )
        {
            return c == ',' || c == '[' || c == ']';
        } );
}
std::istream& get_point( std::istream& in, Point& p )
{
    std::string str;
    std::getline( in, str );
    if ( !str.empty() )
    {
        transform_string( str );
        std::istringstream iss { str };
        iss >> p.x >> p.y;
    }
    return in;
}

代码是不言自明的（我希望）。它将一行读取为字符串，删除阻碍字符，并使用std::istringstream解析浮点值。它只依赖于标准库，易于阅读，其性能足以进行一次操作（在我的笔记本电脑上处理一个有50k行的文件需要大约300ms）。它对输入进行了一些假设，而不进行验证。使用get_point的方式与使用operator >>的方式类似。希望这能有所帮助。

UPD:测试程序：

int main()
{
    std::fstream in_file { "data.txt" };
    std::vector< Point > points;
    // Some code to prepare stream, e.g. skip first 8 lines with
    // std::string tmp; for ( int i = 0; i < 8; ++i ) std::getline( in_file, tmp );
    Point p;
    while ( get_point( in_file, p ) )
        points.emplace_back( p );
    for ( auto& point : points )
        std::cout << point.x << ' ' << point.y << std::endl;
}

我做的假设是：输入流只包含具有问题中所示结构的数据。例如，如果有其他字符、空行或其他内容，那么它将不起作用。如果这个假设不成立，请在问题中具体说明。