从TXT中取出大数据并对其进行排序

Taking large datas from txt and sort them

本文关键字：排序数据 TXT 更新时间：2023-10-16

我正在尝试创建一个C 项目，该项目从TXT文件中获取文件名并计算它们并列入其中的前10个列表。一小部分输入如下所示：

local - - [24/Oct/1994:13:41:41 -0600] "GET index.html HTTP/1.0" 200 150   
local - - [24/Oct/1994:13:41:41 -0600] "GET 1.gif HTTP/1.0" 200 1210  
local - - [24/Oct/1994:13:43:13 -0600] "GET index.html HTTP/``1.0" 200 3185  
local - - [24/Oct/1994:13:43:14 -0600] "GET 2.gif HTTP/1.0" 200 2555         
local - - [24/Oct/1994:13:43:15 -0600] "GET 3.gif HTTP/1.0" 200 36403   
local - - [24/Oct/1994:13:43:17 -0600] "GET 4.gif HTTP/1.0" 200 441    
local - - [24/Oct/1994:13:46:45 -0600] "GET index.html HTTP/1.0" 200 31853

我要做的代码在下面：

#include <iostream>
#include <fstream>
#include <sstream>
#include <unordered_map>
#include <vector>
#include <iterator>
#include <algorithm>
#include <functional>

std::string get_file_name(const std::string& s) {
    std::size_t first = s.find_first_of(""");
    std::size_t last = s.find_last_of(""");
    std::string request = s.substr(first, first - last);
    std::size_t file_begin = request.find_first_of(' ');
    std::string truncated_request = request.substr(++file_begin);
    std::size_t file_end = truncated_request.find(' ');
    std::string file_name = truncated_request.substr(0, file_end);
    return file_name;
}

int main() {
    std::ifstream f_s("text.txt");
    std::string content;
    std::unordered_map<std::string,long int> file_access_counts;
    while (std::getline(f_s, content)) {
        auto file_name = get_file_name(content);
        auto item = file_access_counts.find(file_name);
        if (item != file_access_counts.end()) {
            ++file_access_counts.at(file_name);
        }
        else {
            file_access_counts.insert(std::make_pair(file_name, 1));
        }
    }
    f_s.close();
    std::ofstream ofs;
    ofs.open("all.txt", std::ofstream::out | std::ofstream::app);
    for (auto& n : file_access_counts)
        ofs << n.first << ", " << n.second << std::endl;
    std::ifstream file("all.txt");
    std::vector<std::string> rows;
    while (!file.eof())
    {
        std::string line;
        std::getline(file, line);
        rows.push_back(line);
    }
    std::sort(rows.begin(), rows.end());
    std::vector<std::string>::iterator iterator = rows.begin();
    for (; iterator != rows.end(); ++iterator)
        std::cout << *iterator << std::endl;
    getchar();

    return 0;
}

当我执行时，它向我显示了文件名，并且重复了多少次，但不是从最高到最低的次数，我认为它不会与大数据（例如50000个数据）一起使用。你能帮助我吗？谢谢。

all.txt的内容在被读回后进行分类。问题是计数是在行的末端，因此仅影响名称之后的排序。

all.txt：

3.gif, 1
index.html, 3
1.gif, 1
2.gif, 1
4.gif, 1

rows vector After：

1.gif, 1
2.gif, 1
3.gif, 1
4.gif, 1
index.html, 3

要么更改值写入all.txt的方式，要么在排序之前解析计数。

如果将计数放在线的开头，请确保与零一起使用，所以3以后出现。