解析浮点数的 C 字符串

Parse a C-string of floating numbers

本文关键字：字符串浮点数更新时间：2023-10-16

>我有一个C字符串，其中包含一个由逗号和空格分隔的浮点数列表。每对数字由一个（或多个）空格分隔，并表示 x 和 y 字段用逗号（可选）分隔的点。

" 10,9 2.5, 3   4 ,150.32 "

我需要解析此字符串以填充Point(x, y)列表。
以下是我当前的实现：

const char* strPoints = getString();
std::istringstream sstream(strPoints);
float x, y;
char comma;
while (sstream >> x >> comma >> y)
{
   myList.push(Point(x, y));
}

由于我需要解析大量（最多 500,000 个）这些字符串，我想知道是否有更快的解决方案。

看看提升精神：

如何快速解析C++中空格分隔的浮点数？

它支持NaN，正负无穷大就好了。此外，它还允许您简洁地表达约束语法。

代码的简单调整

以下是适合您的语法的示例：
```
struct Point { float x,y; };
typedef std::vector<Point> data_t;
// And later:
bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
```
迭代器
可以是任何迭代器。因此，您可以将其与C字符串很好地连接起来。
这是对链接基准案例的直接改编。这将说明如何从任何std::istream或直接从内存映射文件进行分析。
住在科里鲁

进一步优化（严格针对 C 字符串）

这是一个不需要预先知道字符串长度的版本（这很整洁，因为它避免了strlen调用，以防您没有可用的长度）：

template <typename OI>
static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
    namespace qi  = boost::spirit::qi;
    namespace phx = boost::phoenix;
    bool ok = qi::phrase_parse(it, last,
            *(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
            qi::space);
    if (!ok || !(it == last || *it == '')) {
        throw it; // TODO proper error reporting?
    }
}

请注意我如何让它采用输出迭代器，以便您决定如何累积结果。/just/解析为向量的明显包装器是：

static inline data_t parse_points(char const* szInput) {
    data_t pts;
    parse_points(back_inserter(pts), szInput);
    return pts;
}

但您也可以做不同的事情（例如附加到现有容器，可以预先保留已知容量等）。像这样的事情通常最终允许真正优化的集成。

以下是在 ~30 行基本代码中完整演示的代码：

住在科里鲁

额外超棒奖金

为了展示这个解析器的灵活性;如果你只是想检查输入并获取点数，你可以用一个简单的lambda函数替换输出迭代器，该函数递增计数器而不是添加新构造的点。
```
int main() {
    int count = 0;
    parse_points( " 10,9 2.5, 3   4 ,150.32    ", boost::make_function_output_iterator([&](Point const&){count++;}));
    std::cout << "elements in sample: " << count << "n";
}
```
住在科里鲁
由于所有内容都是内联的，编译器会注意到不需要在此处构造整个Point并消除该代码：http://paste.ubuntu.com/9781055/
main 函数直接调用非常解析器原语。手动编码解析器不会让你在这里更好地调整，至少不是没有很多努力。

使用 std：：

find 和 std：：strtof 的组合解析点时，我获得了更好的性能，并且代码并没有复杂得多。这是我运行的测试：

#include <iostream>                                                                             
#include <sstream>                                                                              
#include <random>                                                                               
#include <chrono>                                                                               
#include <cctype>                                                                               
#include <algorithm>                                                                            
#include <cstdlib>                                                                              
#include <forward_list>                                                                         
struct Point { float x; float y; };                                                             
using PointList = std::forward_list<Point>;                                                     
using Clock = std::chrono::steady_clock;                                                        
using std::chrono::milliseconds;                                                                
std::string generate_points(int n) {                                                            
  static auto random_generator = std::mt19937{std::random_device{}()};                          
  std::ostringstream oss;                                                                       
  std::uniform_real_distribution<float> distribution(-1, 1);                                    
  for (int i=0; i<n; ++i) {                                                                     
    oss << distribution(random_generator) << " ," << distribution(random_generator) << "t n"; 
  }                                                                                             
  return oss.str();                                                                             
}                                                                                               
PointList parse_points1(const char* s) {                                                        
  std::istringstream iss(s);                                                                    
  PointList points;                                                                             
  float x, y;                                                                                   
  char comma;                                                                                   
  while (iss >> x >> comma >> y)                                                                
       points.push_front(Point{x, y});                                                          
  return points;                                                                                
}                                                                                               
inline                                                                                          
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {            
  auto is_whitespace = [](char c) { return std::isspace(c); };                                  
  auto x_last  = std::find(x_first, last, ',');                                                 
  auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);                      
  auto y_last  = std::find_if(y_first, last, is_whitespace);                                    
  auto x = std::strtof(x_first, (char**)&x_last);                                               
  auto y = std::strtof(y_first, (char**)&y_last);                                               
  auto next_x_first = std::find_if_not(y_last, last, is_whitespace);                            
  return std::make_tuple(Point{x, y}, next_x_first);                                            
}                                                                                               
PointList parse_points2(const char* i, const char* last) {                                      
  PointList points;                                                                             
  Point point;                                                                                  
  while (i != last) {                                                                           
    std::tie(point, i) = parse_point2(i, last);                                                 
    points.push_front(point);                                                                   
  }                                                                                             
  return points;                                                                                
}                                                                                               
int main() {                                                                                    
  auto s = generate_points(500000);                                                             
  auto time0 = Clock::now();                                                                    
  auto points1 = parse_points1(s.c_str());                                                      
  auto time1 = Clock::now();                                                                    
  auto points2 = parse_points2(s.data(), s.data() + s.size());                                  
  auto time2 = Clock::now();                                                                    
  std::cout << "using stringstream: "                                                           
            << std::chrono::duration_cast<milliseconds>(time1 - time0).count() << 'n';         
  std::cout << "using strtof: "                                                                 
            << std::chrono::duration_cast<milliseconds>(time2 - time1).count() << 'n';         
  return 0;                                                                                     
}

输出：

using stringstream: 1262
using strtof: 120

您可以先尝试禁用 C I/O 的同步：

std::ios::sync_with_stdio(false);

来源：在C++程序中使用 scanf（）比使用 cin 更快？

您也可以尝试使用 iostream 的替代品：

boost_lexical_cast和定义BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
斯堪夫

我认为你应该试一试sync_with_stdio(false)。其他替代方案需要更多的编码，我不确定你会赢得很多（如果有的话）。

解析浮点数的 C 字符串

Parse a C-string of floating numbers

代码的简单调整

进一步优化（严格针对 C 字符串）

额外超棒奖金