如何遍历特定字符串的向量

How to loop through vectors for specific strings

本文关键字：字符串向量何遍历遍历更新时间：2023-10-16

我正在努力声明一个循环，该循环采用向量的字段，检查它是否第一次出现或跳转到下一个向量，直到该字段包含新字符串。

我的输入文件 (.csvx) 如下所示：

No.; ID; A; B; C;...;Z;
1;1_380; Value; Value; Value;...; Value;
2;1_380; Value; Value; Value;...; Value;
3;1_380; Value; Value; Value;...; Value;
...
41;2_380; Value; Value; Value;...; Value;
42;2_380; Value; Value; Value;...; Value;
...
400000; 6_392; Value; Value; Value;...; Value;

注意：文件比较大。

我设法将我的文件解析为vector<vector<string> >并在分号处分隔行以访问任何字段。现在我想访问第一个"ID"，即 1_380 并存储同一行的参数，然后转到下一个 ID 2_380 并再次存储这些参数，依此类推......

这是我到目前为止的代码：

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/algorithm/string.hpp>
using namespace std;
/*
* CSVX Reader defined to fetch data from 
* CSVX file into vectors
*/
class CSVXReader
{
string fileName, delimiter;
public:
CSVXReader(string filename, string delm = ";") :
fileName(filename), delimiter(delm)
{}
vector<vector<string> > getData();           //Function to fetch data 
};                                           //from CSVX file 
/*
* Parse through CSVX file line by line 
* and return the data in vector of vector
* of strings
*/
vector<vector<string> > CSVXReader::getData()
{
ifstream file(fileName);
vector<vector<string> > dataList;               //Vector of vector 
//contains all data
string line = "";                              
while (getline(file, line))                  //Iterate through each line 
//and split the content 
//using delimiter
{
vector<string> vec;                       //Vector contains a row from 
//input file 
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
dataList.push_back(vec);
}
file.close();
return dataList;
}

int main(int argc, char** argv) 
{
CSVXReader reader("file.csvx");                     //Creating an object 
//of CSVXReader
vector<vector<string> > dataList = reader.getData();//Get the data from 
//CSVX file
for(vector<string> vec : datalist)                  //Loop to go through 
//each line of 
//dataList 
//(vec1,vec2;vec3...)
if(vec[1] contains "_" && "appears for the first time")
{store parameters...};
else{go to next line};
return 0;
}

如您所见，我不知道如何正确声明我的循环...... 需要明确的是，我想检查每个向量"vec"的第二个字段：它是新的吗？-> 存储同一行的数据，如果不是 ->跳转到下一行，即向量，直到出现新 ID。

期待任何建议！

由于您编写了伪代码，因此很难编写真正的代码。

但一般来说，如果你想检测一个项目是否已经发生，你可以利用 std：：unordered_set 来实现"第一次出现"。

使用伪代码：

#include <unordered_set>
//...
std::unordered_set<std::string> stringSet;
//...
for(vector<string>& vec : datalist)
{
if(vec[1] contains "_" && !stringSet.count(vec[1]))
{
//...
stringSet.insert(vec[1]);
}
}

该条件检查项目是否在unordered_set中。如果是，那么我们跳过，如果不是，那么我们处理该项目并将其添加到unordered_set。

基本上，您不需要其他答案提供的所有代码。您只需一个语句即可将数据复制到要包含它们的位置。

让我们假设您已经在dataList中读取了数据。您定义了要在其中存储唯一结果的新std::vector<std::vector<std::string>> parameter{};。

算法库有一个叫做std:copy_if的函数。如果谓词(条件)为 true，这将仅复制数据。您的条件是一行与上一行不同。然后它是包含新数据的新行，您将复制它。如果一行等于其前一行数据，则不要复制它。

因此，我们将记住最后一行的重要数据。然后在下一行中将数据与存储的值进行比较。如果不同，请存储参数。如果没有，那就不是。每次检查后，我们将当前值分配给最后一个值。作为初始的"最后一个值"，我们将使用一个空字符串。所以第一行总是不同的。然后，该语句将如下所示：

std::copy_if(dataList.begin(), dataList.end(), std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);

因此，我们将所有数据从dataList的开始复制到parameter向量，当且仅当源向量中的第二个字符串 (index=1) 与我们旧的记忆值不同。

相当直截了当。

另一个优化是，立即整理出正确的参数，而不是首先存储包含所有数据的完整向量，而是仅存储必要的数据。这将大大减少必要的内存。

将 while 循环修改为：

string line = "";                              
string oldValue{};
while (getline(file, line))                 //Iterate through each line 
//and split the content 
//using delimiter
{
vector<string> vec;                       //Vector contains a row from 
//input file 
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
if (oldValue != vec[1]) {
dataList.push_back(vec);
}
oldValue = vec[1];
}

有了这个，你从一开始就把它做好了。

附加解决方案如下所示

#include <vector>
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <fstream>
#include <sstream>
#include <algorithm>
std::istringstream testFile{R"(1;1_380; Value1; Value2; Value3; Value4
2;1_380; Value5; Value6; Value7; Value8
3;1_380; Value9 Value10 
41;2_380; Value11; Value12; Value13
42;2_380; Value15
42;2_380; Value16
500;3_380; Value99
400000; 6_392; Value17; Value18; Value19; Value20
400001; 6_392; Value21; Value22; Value23; Value24)"
};

class LineAsVector {    // Proxy for the input Iterator
public:
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, LineAsVector& lv) {
// Read a line
std::string line; lv.completeLine.clear();
std::getline(is, line); 
// The delimiter
const std::regex re(";");
// Split values and copy into resulting vector
std::copy(  std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(lv.completeLine));
return is; 
}
// Cast the type 'CompleteLine' to std::string
operator std::vector<std::string>() const { return completeLine; }
protected:
// Temporary to hold the read vector
std::vector<std::string> completeLine{};
};
int main()
{
// This is the resulting vector which will contain the result
std::vector<std::vector<std::string>> parameter{};

// One copy statement to copy all necessary data from the file to the parameter list
std::copy_if (
std::istream_iterator<LineAsVector>(testFile),
std::istream_iterator<LineAsVector>(),
std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);

// For debug purposes: Show result on screen
std::for_each(parameter.begin(), parameter.end(), [](std::vector<std::string> & sv) {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, " "));
std::cout << 'n';
} 
);
return 0;
}

请注意：在函数 main 中，我们在一个语句中完成所有操作：std::copy_if。在这种情况下，源是一个std::istream，所以你想要的std::ifstream(文件)或任何内容。在SO中，我使用std::istringstream，因为我不能在这里使用文件。但它是一样的。只需替换std::istream_iterator中的变量即可。我们使用std::istream_iterator迭代文件。

多么可怜，没有人会读到这个。

好的伙计们，我正在玩我的代码，并意识到@Armins第二个解决方案(修改 while 循环)不考虑无序列表，即如果一个元素在很久以后再次出现，它会与以前的元素(oldValue)进行比较并插入，尽管它已经存在于我的容器中......

经过一些阅读(显然还有更多)，我倾向于@Paul的unordered_set。我的第一个问题就在这里出现：你为什么不建议set？从我的发现来看，unordered_set显然对搜索操作更快。在我个人非常有限的头脑中，这很难理解......但我不想在这里挖得太深。这是你的理由吗？还是我错过了其他优势？

尽管您的建议，我还是尝试使用set，这在我的情况下似乎更好，因为更有序的方式。我的代码再次拒绝运行：

set<vector<string> > CSVReader::getData() {
ifstream file(fileName);
set<vector<string> > container;
string line = "";
string uniqueValue{};
while (getline(file, line))                          //Iterate through each line and split the content using delimiter
{
//Vector contains a row from RAO file
vector<string> vec;                        
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
uniqueValue = vec[2];
//Line (or vector) is added to container if the uniqueValue, e.g. 1_380, appears for the first time                   
if(!container.count(uniqueValue))
{
container.insert(vec);
}
}
file.close();
return container;  
}

错误说：

error: no matching function for call to 'std::set<std::vector<std::__cxx11::basic_string<char> > >::count(std::__cxx11::string&)'
if(!localDetails.count(localDetail))

既然我效仿了你的榜样，我做错了什么？

PS：只是阅读有关SO政策的信息...希望这个额外的问题是可以接受的