我们是否应该避免C++中的重复代码才能"Pythonic"，以及如何？

Should we avoid repetitive code in C++ in order to be "Pythonic", and how?

本文关键字：Pythonic 代码是否 C++ 我们更新时间：2023-10-16

我正处于Python的幼虫阶段和c++的前期阶段，但我正在努力做到最好，特别是"不要重复自己"的原则。

我有一个多通道的原始文件格式打开，与一个主要的ascii头与字段表示为字符串和整数(总是编码为字符填充空白)。第二部分是N个报头，其中N是主报头的一个字段，每个报头本身都有更多的文本和数字字段(编码为ascii)，这些字段指的是组成文件其余部分的实际16位多通道流的长度和大小。

到目前为止，我在c++中有以下工作代码:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <map>
using namespace std;
struct Header {
    string version;
    string patinfo;
    string recinfo;
    string start_date;
    string start_time;
    int header_bytes;
    string reserved;
    int nrecs;
    double rec_duration;
    int nchannels;
};
struct Channel {
    string label;
    string transducertype;
    string phys_dim;
    int pmin;
    int pmax;
    int dmin;
    int dmax;
    string prefiltering;
    int n_samples;
    string reserved;
};

int main()
{
    ifstream edf("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);
    // prepare to read file header
    Header header;
    char buffer[80];
    // reads header fields into the struct 'header'
    edf.read(buffer, 8);
    header.version = string(buffer, 8);
    edf.read(buffer, 80);
    header.patinfo = string(buffer, 80);
    edf.read(buffer, 80);
    header.recinfo = string(buffer, 80);
    edf.read(buffer, 8);
    header.start_date = string(buffer, 8);
    edf.read(buffer, 8);
    header.start_time = string(buffer, 8);
    edf.read(buffer, 8);
    stringstream(buffer) >> header.header_bytes;
    edf.read(buffer, 44);
    header.reserved = string(buffer, 44);
    edf.read(buffer, 8);
    stringstream(buffer) >> header.nrecs;
    edf.read(buffer,8);
    stringstream(buffer) >> header.rec_duration;
    edf.read(buffer,4);
    stringstream(buffer) >> header.nchannels;
    /*
    cout << "'" << header.version << "'" << endl;
    cout << "'" << header.patinfo << "'" << endl;
    cout << "'" << header.recinfo << "'" << endl;
    cout << "'" << header.start_date << "'" << endl;
    cout << "'" << header.start_time << "'" << endl;
    cout << "'" << header.header_bytes << "'" << endl;
    cout << "'" << header.reserved << "'" << endl;
    cout << "'" << header.nrecs << "'" << endl;
    cout << "'" << header.rec_duration << "'" << endl;
    cout << "'" << header.nchannels << "'" << endl;
    */
    // prepare to read channel headers
    int ns = header.nchannels; // ns tells how much channels I have
    char title[16]; // 16 is the specified length of the "label" field of each channel
    for (int n = 0; n < ns; n++)
    {
        edf >> title;
        cout << title << endl; // and this successfully echoes the label of each channel
    }

    return 0;
};

有些话我必须说:

我选择使用struct，因为格式规范非常硬编码;
我没有遍历主报头字段，因为要读取的字节数和类型对我来说似乎相当随意;

我的问题是:

"我应该担心偷工减料使这类代码更‘python化’(更抽象，更少重复)，还是这不是c++的工作方式?"

许多Python的传道者(我自己也是，因为我喜欢它)强调它易于使用等等。因此，我会在一段时间内怀疑我是否在做愚蠢的事情，或者只是在做正确的事情，但由于c++的本质，我不会这么"自动"。

感谢阅读

Helton

我想说根本就没有python式的c++代码。DRY原则适用于这两种语言，但大部分被认为是"Python"的东西只是用Python中最短、最甜蜜的方式来表达逻辑，使用Python特定的结构。习惯的c++是完全不同的。

例如，

lambda有时不被认为是非常python化的，并且保留用于没有其他解决方案存在的情况，但只是被添加到c++标准中。c++没有关键字参数，这是非常python化的。c++程序员不喜欢在没有必要的时候构造map，而Python程序员可能会在很多问题上抛出dict，因为它们恰好比有效的替代方法更清楚地表达了意图。

如果您想节省输入，请使用我之前发布的函数，然后:

header.version = read_field(edf, 8);
header.patinfo = read_field(edf, 80);

这样可以节省很多行。但比这几行更重要的是，您已经实现了少量的模块化:如何读取字段和读取哪些字段现在是程序的单独部分。

您是正确的:正如所写的，代码是重复的(并且没有错误检查)。读取的每个字段都需要执行三到五个步骤，具体取决于所读取的数据类型:

从流读取字段
确保读取成功
解析数据(如果需要)
确保解析成功(如果需要)
将数据复制到目标位置

您可以将这三种方法封装到一个函数中，这样代码就可以减少重复。例如，考虑以下函数模板:

template <typename TStream, typename TResult>
void ReadFixedWidthFieldFromStream(TStream& str, TResult& result, unsigned sz) 
{
    std::vector<char> data(sz);
    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");
    std::stringstream ss(&data[0]);
    if (!(ss >> result))
        throw std::runtime_error("Failed to parse data from stream");
}
// Overload for std::string:
template <typename TStream>
void ReadFixedWidthFieldFromStream(TStream& str, std::string& result, unsigned sz) 
{
    std::vector<char> data(sz);
    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");
    result = std::string(&data[0], sz);
}

现在你的代码可以更简洁了:

ReadFixedWidthFieldFromStream(edf, header.version, 8);
ReadFixedWidthFieldFromStream(edf, header.patinfo, 80);
ReadFixedWidthFieldFromStream(edf, header.recinfo, 80);
// etc.

这段代码直接、简单、易于理解。如果它有效，不要浪费时间去改变它。我敢肯定有很多写得很糟糕、很复杂、难以理解(可能是不正确的)的代码应该首先修复:)

Python的Zen并没有明确提到DRY。

>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

直接从文件中读取字符串参见这个问题，其余的是错误的。~~但是我个人认为有更好/更干净的方法来做这件事。~~

如果你知道结构体的大小，不要使用string，使用基本的C类型(并确保结构体是打包的)。查看这些链接:http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.80).aspx &http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html

我会这样做，例如(不确定每个字符串的大小，但你得到的想法):

struct Header {
    char version[8];
    char patinfo[80];
    char recinfo[80];
    char start_date[8];
    char start_time[8];
    int header_bytes;
    char reserved[44];
    int nrecs;
    double rec_duration;
    int nchannels;
};

一旦你有了一个打包结构，你可以直接从文件中读取它:

struct Header h;
edf.read(&h,sizeof(struct Header));

对我来说，这是最干净的方法，但是请记住，必须打包结构，以便保证内存中的结构与文件中保存的结构具有相同的大小——这在测试时并不难看出。