C++ 如何在使用哈希函数时计算冲突次数
C++ How to count number of collisions while using a hash function?
我被分配了这个实验室,我需要在其中创建一个哈希函数,并计算在对多达 30000 个元素的文件进行哈希处理时发生的冲突次数。这是我到目前为止的代码
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
long hashcode(string s){
long seed = 31;
long hash = 0;
for(int i = 0; i < s.length(); i++){
hash = (hash * seed) + s[i];
}
return hash % 10007;
};
int main(int argc, char* argv[]){
int count = 0;
int collisions = 0;
fstream input(argv[1]);
string x;
int array[30000];
//File stream
while(!input.eof()){
input>>x;
array[count] = hashcode(x);
count++;
for(int i = 0; i<count; i++){
if(array[i]==hashcode(x)){
collisions++;
}
}
}
cout<<"Total Input is " <<count-1<<endl;
cout<<"Collision # is "<<collisions<<endl;
}
我只是不确定如何计算碰撞次数。我尝试将每个哈希值存储到一个数组中,然后搜索该数组,但是当只有 10000 个元素时,它导致了大约 12000 次冲突。任何关于如何计算碰撞的建议,或者即使我的哈希函数可以使用改进,也将不胜感激。谢谢。
问题是你正在叙述碰撞(假设你的列表中有 4 个相同的元素,没有别的,然后通过你的算法看看你会计算多少次碰撞(
相反,创建一组哈希代码,每次计算哈希代码时,检查它是否在集中。如果它在集合中,则增加碰撞总数。如果它不在集合中,请将其添加到集合中。
编辑:
为了快速修补您的算法,我做了以下操作:在循环后增加计数,并在发现冲突后中断 for 循环。这仍然不是非常有效,因为我们正在循环遍历所有结果(使用设置的数据结构会更快(,但这至少应该是正确的。
还对其进行了调整,因此我们不会一遍又一遍地计算哈希码(x(:
int main(int argc, char* argv[]){
int count = 0;
int collisions = 0;
fstream input(argv[1]);
string x;
int array[30000];
//File stream
while(!input.eof()){
input>>x;
array[count] = hashcode(x);
for(int i = 0; i<count; i++){
if(array[i]==array[count]){
collisions++;
// Once we've found one collision, we don't want to count all of them.
break;
}
}
// We don't want to check our hashcode against the value we just added
// so we should only increment count here.
count++;
}
cout<<"Total Input is " <<count-1<<endl;
cout<<"Collision # is "<<collisions<<endl;
}
为了教育的利益而添加的答案。这可能是你教授的下一堂课。
几乎可以肯定,检测哈希冲突的最有效方法是使用哈希集(又名unordered_set(
#include <iostream>
#include <unordered_set>
#include <fstream>
#include <string>
// your hash algorithm
long hashcode(std::string const &s) {
long seed = 31;
long hash = 0;
for (int i = 0; i < s.length(); i++) {
hash = (hash * seed) + s[i];
}
return hash % 10007;
};
int main(int argc, char **argv) {
std::ifstream is{argv[1]};
std::unordered_set<long> seen_before;
seen_before.reserve(10007);
std::string buffer;
int collisions = 0, count = 0;
while (is >> buffer) {
++count;
auto hash = hashcode(buffer);
auto i = seen_before.find(hash);
if (i == seen_before.end()) {
seen_before.emplace_hint(i, hash);
}
else {
++collisions;
}
}
std::cout << "Total Input is " << count << std::endl;
std::cout << "Collision # is " << collisions << std::endl;
}
有关哈希表的说明,请参阅哈希表的工作原理?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
// Generate a hash code that is in the range of our hash table.
// The range we are using is zero to 10,007 so that our table is
// large enough and the prime number size reduces the probability
// of collisions from different strings hashing to the same value.
unsigned long hashcode(string s){
unsigned long seed = 31;
unsigned long hash = 0;
for (int i = 0; i < s.length(); i++){
hash = (hash * seed) + s[i];
}
// we want to generate a hash code that is the size of our table.
// so we mod the calculated hash to ensure that it is in the proper range
// of our hash table entries. 10007 is a prime number which provides
// better characteristics than a non-prime number table size.
return hash % 10007;
};
int main(int argc, char * argv[]){
int count = 0;
int collisions = 0;
fstream input(argv[1]);
string x;
int array[30000] = { 0 };
//File stream
while (!input.eof()){
input >> x; // get the next string to hash
count++; // count the number of strings hashed.
// hash the string and use the hash as an index into our hash table.
// the hash table is only used to keep a count of how many times a particular
// hash has been generated. So the table entries are ints that start with zero.
// If the value is greater than zero then we have a collision.
// So we use postfix increment to check the existing value while incrementing
// the hash table entry.
if ((array[hashcode(x)]++) > 0)
collisions++;
}
cout << "Total Input is " << count << endl;
cout << "Collision # is " << collisions << endl;
return 0;
}
相关文章:
- 为什么"do while"循环不断退出,即使条件计算结果为 false?
- 递归函数计算序列中的平方和(并输出过程)
- (C++)分析树以计算返回错误值的简单算术表达式
- 我的字符计数代码计算错误.为什么
- 在计算中使用二的幂有多有利可图
- 写入位置0x0000000C时发生访问冲突
- 如何计算文件中的"columns"数?
- 计算排序向量的向量中唯一值的计数
- 如何使用 std::累积在 C++ 中计算总和立方体
- GL_SHADERSTORAGE_BUFFER位置是否与其他着色器位置冲突
- 使用Qt C++计算类似Git的SHA1哈希
- OpenCV C++.快速计算混淆矩阵
- cpp二进制搜索问题,计算给定数组中输入元素的出现次数
- 使用cmake从源代码构建MySQL连接器/C++失败(与以前的声明冲突)
- C++如何计算用户输入的数字中的偶数位数
- 如何计算数据类型的范围,例如int
- (此函数用于计算 SMA).在 FA.exe 中0x00509159引发异常: 0xC0000005:访问冲突读取位置0
- C++ 提升哈希计算而不会发生冲突
- C++ 如何在使用哈希函数时计算冲突次数
- 在试图计算2幅图像的FFT(快速傅里叶变换)时发生访问冲突