用c++快速编写多映射文件

Write multimap fo file Fast way c++

本文关键字:映射 文件 c++      更新时间:2023-10-16

我使用以下代码来计算单词的频率://Briana Morrison 为Owen编写的程序

//#pragma warning (disable : 4786)
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
using namespace std;
// program assumes that the filename is the only thing passed into program
// if you are using standard argc and argv, then arguments to main should change, and uncomment 
//   first line.
int main(int argc, char * argv[])
{
    string filename(argv[1]);
  //  string filename;
    //cout << "Enter filename" << endl;
    //cin >> filename;
    ifstream infile(filename.c_str());
    //ifstream infile("poe.txt");
    string word;
    bool debug = false; // for debugging purposes
    int count = 0;      // count of words for debugging
    // create a map of words to frequencies
    map<string, int, less<string> > words;
    // create a multimap of frequencies to words
    multimap<int, string, greater<int> > freq;
    // loop while there is input in the file  
    infile >> word; //priming read
    while (infile)
    {
       count++;
       // convert word to lowercase
       for (int i = 0; i < word.length(); i++)
           if ('A' <= word[i] && word[i] <= 'Z')
              word[i] = tolower(word[i]);
        if (debug) cout << word << endl;
        // if word not found, add to map, otherwise increment count
        if (words.find(word) != words.end())
        {
            words[word]++;
            if (debug) cout << word << " found and count incremented to " << words[word] << endl;
        }
        else
        {
            words[word] = 1;
            if (debug) cout << word << " not found and count incremented to " << words[word] << endl;
        }
        infile >> word;
    }
    if (debug) cout << "count is " << count << " and map has " << words.size() << endl;
    // now go through map and add everything to multimap...words still in alphabetical order
    map<string, int, less<string> >::iterator it = words.begin();
    for (it = words.begin(); it != words.end(); it++)
    {
        pair<int, string> p(it->second, it->first);
        freq.insert(p);
    }
    if (debug) cout << "map has " << words.size() << " and multimap has " << freq.size() << endl;
    ofstream outfile("myout.txt");
    multimap<int, string, greater<int> >::iterator myit=freq.begin();
    for (myit = freq.begin(); myit != freq.end(); myit++)
    {
        outfile << myit->first << "t" << myit->second << endl;
    }
    outfile.close();
  return 0;
}

问题不在这里,我认为

当我将单词写入文件时,每次迭代都会变慢,为什么?

        ofstream outfile("myout.txt");
        multimap<int, string, greater<int> >::iterator myit=freq.begin();
        for (myit = freq.begin(); myit != freq.end(); myit++)
        {
            outfil<< myit->first << "t" << myit->second << endl;
        }
       outfile.close();

如何将多映射快速写入文件?

您可以使用'n'而不是std::endl来避免对每一行都进行刷新。

outfil << myit->first << 't' << myit->second << 'n';
for (myit = freq.begin(); myit != freq.end(); ++myit)
{
    outfil<< myit->first << "t" << myit->second << "n";
}

这应该更快。

或者,您可以缓冲数据并一次写入所有数据,而不是逐行写入。

我不明白为什么循环每次迭代都会变慢,但请注意,您使用的是格式的输出operator<<就是这样做的),这是出了名的慢。如果您的字符串不包含空字节,您可以通过ostream::write(即)写入std::string来提高代码的效率

outfil << myit->first;
outfil.write( "t", 1 );
outfil.write( myit->second.c_str(), myit->second.size() );
outfil.write( "n", 1 );