对于SparseMatrix实现，我们能做的最好的事情是什么？

What's the best we can do for a SparseMatrix implementation?

本文关键字：最好的事情是什么 SparseMatrix 实现我们对于更新时间：2023-10-16

我使用的是特征矩阵框架和稀疏向量库。我遇到了性能问题，我需要的都是稀疏矢量点积。所以我推出了我自己的SparseMatrix实现，希望它能更快一点：

一点示例代码：

#include <map>
using namespace std ;
struct SparseMatrix
{
  map<int, Vector> vals ;
  Vector dot( SparseMatrix& o )
  {
    SparseMatrix *LS, *RS ;
    // iterate over the smaller of the 2 
    if( vals.size() < o.vals.size() )
    {
      // walk vals
      LS = this ;
      RS = &o ;
    }
    else
    {
      LS = &o ;
      RS = this ;
    }
    // walk LS
    Vector sum = 0 ;
    for( map<int,Vector>::iterator LSIter = LS->vals.begin() ; LSIter != LS->vals.end() ; ++LSIter )
    {
      const int& key = LSIter->first ;
      // use the key, see if RS has a similar entry.
      map<int,Vector>::iterator RSIter = RS->vals.find( key );
      if( RSIter != RS->vals.end() )
        sum += RSIter->second * LSIter->second ;
    }
    return sum ;
  }
} ;

所以2个向量的点积，比如说有这样的条目：

+---------------+|vec 1||指数值||2 18||7 4||18 33|+---------------++---------------+|vec 2||指数值||2 1||15 87||21 92|+---------------+

则点积为18。

所以，正如你所看到的，我用std::map来查找元素，看看一个向量中的元素是否在另一个向量里。

由于我只使用整数索引和1d数组，有没有办法让查找更快？我的稀疏矩阵乘法仍然是一个瓶颈（我的代码性能仅略快于Eigen）

有一个向量对：index->value，按索引排序。然后同时迭代两个向量。若两个向量中有相同的索引，则将值相乘，将其相加到结果中，然后转到两个向量的下一对。否则，在对的第一个元素较小的向量上增加迭代索引。

当然，假设你不经常修改数据。

即伪代码：

for i = 0, j = 0; i < vec1.size && j < vec2.size:
    if vec1[i].first == vec2[j].first:
        result += vec1[i++].second * vec2[j++].second
    elif vec1[i].first < vec2[j].first:
        i++
    else
        j++