用c++快速编写多映射文件
Write multimap fo file Fast way c++
我使用以下代码来计算单词的频率://Briana Morrison 为Owen编写的程序
//#pragma warning (disable : 4786)
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
using namespace std;
// program assumes that the filename is the only thing passed into program
// if you are using standard argc and argv, then arguments to main should change, and uncomment
// first line.
int main(int argc, char * argv[])
{
string filename(argv[1]);
// string filename;
//cout << "Enter filename" << endl;
//cin >> filename;
ifstream infile(filename.c_str());
//ifstream infile("poe.txt");
string word;
bool debug = false; // for debugging purposes
int count = 0; // count of words for debugging
// create a map of words to frequencies
map<string, int, less<string> > words;
// create a multimap of frequencies to words
multimap<int, string, greater<int> > freq;
// loop while there is input in the file
infile >> word; //priming read
while (infile)
{
count++;
// convert word to lowercase
for (int i = 0; i < word.length(); i++)
if ('A' <= word[i] && word[i] <= 'Z')
word[i] = tolower(word[i]);
if (debug) cout << word << endl;
// if word not found, add to map, otherwise increment count
if (words.find(word) != words.end())
{
words[word]++;
if (debug) cout << word << " found and count incremented to " << words[word] << endl;
}
else
{
words[word] = 1;
if (debug) cout << word << " not found and count incremented to " << words[word] << endl;
}
infile >> word;
}
if (debug) cout << "count is " << count << " and map has " << words.size() << endl;
// now go through map and add everything to multimap...words still in alphabetical order
map<string, int, less<string> >::iterator it = words.begin();
for (it = words.begin(); it != words.end(); it++)
{
pair<int, string> p(it->second, it->first);
freq.insert(p);
}
if (debug) cout << "map has " << words.size() << " and multimap has " << freq.size() << endl;
ofstream outfile("myout.txt");
multimap<int, string, greater<int> >::iterator myit=freq.begin();
for (myit = freq.begin(); myit != freq.end(); myit++)
{
outfile << myit->first << "t" << myit->second << endl;
}
outfile.close();
return 0;
}
问题不在这里,我认为
当我将单词写入文件时,每次迭代都会变慢,为什么?
ofstream outfile("myout.txt");
multimap<int, string, greater<int> >::iterator myit=freq.begin();
for (myit = freq.begin(); myit != freq.end(); myit++)
{
outfil<< myit->first << "t" << myit->second << endl;
}
outfile.close();
如何将多映射快速写入文件?
您可以使用'n'
而不是std::endl
来避免对每一行都进行刷新。
outfil << myit->first << 't' << myit->second << 'n';
for (myit = freq.begin(); myit != freq.end(); ++myit)
{
outfil<< myit->first << "t" << myit->second << "n";
}
这应该更快。
或者,您可以缓冲数据并一次写入所有数据,而不是逐行写入。
我不明白为什么循环每次迭代都会变慢,但请注意,您使用的是格式的输出(operator<<
就是这样做的),这是出了名的慢。如果您的字符串不包含空字节,您可以通过ostream::write
(即)写入std::string
来提高代码的效率
outfil << myit->first;
outfil.write( "t", 1 );
outfil.write( myit->second.c_str(), myit->second.size() );
outfil.write( "n", 1 );
相关文章:
- C/C++ - 查询平台相关的换行符(用于内存映射文件)
- 写入映射文件
- 内存映射文件访问非常慢
- 我正在尝试创建一个C++映射,该映射在boost内存映射文件中具有向量值
- 无法从地址打开映射文件
- 为什么 du -sh 输出错误大小的内存映射文件
- 使用 mmap 映射文件中的不同段
- Growing Boost.使用单个写入器的进程间内存映射文件
- 调整窗口内存映射文件的大小,而不会使指针失效
- 映射文件中成员结构的地址
- 与从C++到C#的内存映射文件共享链式结构
- 在 C++ 和 C# 之间共享内存映射文件结构
- 如何使用 Boost 的"mapped_file_sink"类刷新内存映射文件?
- 使用Solaris 64位或Linux 32位到Linux 64位的内存映射文件
- 循环访问提升mapped_region/内存映射文件
- 使用 Win32/WINAPI 通过内存映射文件传输数据
- qtCreator 错误:无法映射文件,errno=22 文件用于架构x86_64?
- 复制存储在内存映射文件中的数组的一部分
- 用于读取输入文件的内存映射文件的安全
- 写入和读取映射文件C++