接收文件并删除文件中的重复项

Taking in a file and removing the duplicates within the file

本文关键字：文件删除更新时间：2023-10-16

我有一个逗号分隔的文件，它的扩展名.csv，术语如下所示：

30,Movies,53808-0776,278,65.75
31,Beauty,48951-8130,725,83.42
32,Baby,59779-224,848,82
33,Industrial,55711-070,753,46.48
34,Industrial,76446-002,272,89.03
35,Sports,68151-2870,185,2.86
36,Toys,0245-0709,783,45.65
37,Games,49999-963,523,93.65 
38,Beauty,52125-508,500,2.38
39,Toys,54092-381,783,45.65
40,Beauty,55154-6649,666,79.52
41,Jewelry,57664-327,46,10.28
42,Grocery,49738-453,317,29

我需要我的程序做的是接受这种格式的任何通用文件，删除重复项，然后创建一个没有重复项的相同扩展名的新文件。用户键入文件的确切位置。我正在使用一个类来保存单个数据记录。这就是我到目前为止的标题。

#ifndef RECORD_H_
#define RECORD_H_
#include <iostream>
class Record {
public:
// Constructor
    std::Record(string department, string item_code, int quantity, double cost);
// Deconstructor
    virtual ~Record();
// Overloaded '==' and '<' Comparision Methods
    friend bool operator ==(const Record &a, const Record &b);
    friend bool operator <(const Record&, const Record&);
// Overloaded '<<' Operator
    friend std::ostream& operator <<(std::ostream&, const Record&);
// Private Member Functions
private:
    std::string department;
    std::string item_code;
    int quantity;
    double cost;
};
#endif /* RECORD_H_ */

这些文件都很小，因此可以使用简单的排序方法。我对这些方面感到困惑。我是否执行在源代码的向量中存储对象的操作，还是需要为此创建定义？另外，如果我接收一个文件，如何让程序以相同的扩展名创建新文件？（.csv）

假设您有一个具有定义operator==的问题中描述的Record类，并且您有一个从csv文件中读入内存的向量，您可以执行以下操作来使用算法库删除重复项：

std::vector<Record> records; // full of Record
// rearranges the vector, moving all the unique elements to the front
// and duplicates to the back
// last is an iterator to beginning of the duplicates
auto last = std::unique(records.begin(), records.end());
// erase all duplicates
records.erase(last, records.end());

注意 std::unique将在 Record 中使用您定义的operator==。