计算数组元素中不同绝对值的个数

count the number of distinct absolute values among the elements of the array

本文关键字：绝对值数组元素计算更新时间：2023-10-16

我在面试中被要求找出数组元素中不同绝对值的个数。我想出了下面的解决方案(用c++)，但面试官对代码的运行时效率并不满意。

我将欣赏指针，我如何能提高这段代码的运行时效率?
我如何计算下面代码的效率?for循环执行A.size()次。但是，我不确定STL std::find的效率(在最坏的情况下，它可能是O(n)，因此使此代码O(n²) ?

代码是:

int countAbsoluteDistinct ( const std::vector<int> &A ) {
  using namespace std;
  list<int> x;
  vector<int>::const_iterator it;
  for(it = A.begin();it < A.end();it++)
    if(find(x.begin(),x.end(),abs(*it)) == x.end())
      x.push_back(abs(*it));
  return x.size();
}

对已设置的代码提出替代代码。

注意，我们不希望改变调用者的vector，而是按值取值。最好让编译器为我们复制，而不是自己创建。如果可以销毁它们的值，则可以使用非const引用。

#include <vector>
#include <algorithm>
#include <iterator>
#include <cstdlib>
using namespace std;
int count_distinct_abs(vector<int> v)
{
    transform(v.begin(), v.end(), v.begin(), abs); // O(n) where n = distance(v.end(), v.begin())
    sort(v.begin(), v.end()); // Average case O(n log n), worst case O(n^2) (usually implemented as quicksort.
    // To guarantee worst case O(n log n) replace with make_heap, then sort_heap.
    // Unique will take a sorted range, and move things around to get duplicated
    // items to the back and returns an iterator to the end of the unique section of the range
    auto unique_end = unique(v.begin(), v.end()); // Again n comparisons
    return distance(v.begin(), unique_end); // Constant time for random access iterators (like vector's)
}

这里的优点是，如果我们决定按值取值，我们只分配/复制一次，其余的都在原地完成，同时仍然给你O(n log n)在v大小上的平均复杂度。

std::find()是线性的(O(n))。我将使用一个排序的关联容器来处理这个问题，特别是std::set。

#include <vector>
#include <set>
using namespace std;
int distict_abs(const vector<int>& v)
{
   std::set<int> distinct_container;
   for(auto curr_int = v.begin(), end = v.end(); // no need to call v.end() multiple times
       curr_int != end;
       ++curr_int)
   {
       // std::set only allows single entries
       // since that is what we want, we don't care that this fails 
       // if the second (or more) of the same value is attempted to 
       // be inserted.
       distinct_container.insert(abs(*curr_int));
   }
   return distinct_container.size();
}

这种方法仍然存在一些运行时损失。随着容器大小的增加，使用单独的容器会产生动态分配的成本。您可以在适当的地方这样做，而不会出现这种惩罚，但是对于这个级别的代码，有时最好是清晰和显式的，让优化器(在编译器中)完成它的工作。

是的，这将是O(N²) -您将最终对每个元素进行线性搜索。

两个比较明显的替代方案是使用std::set或std::unordered_set。如果你没有c++ 0x，你可以用tr1::unordered_set或boost::unordered_set代替std::unordered_set。

std::set中的每次插入都是O(log N)，所以你的总复杂度是O(N log N)。

对于unordered_set，每次插入具有恒定的(预期的)复杂度，总体上具有线性复杂度。

基本上，用std::set替换std::list。如果操作得当，这将为您提供O(log(set.size()))搜索+ O(1)插入。此外，为了提高效率，缓存abs(*it)的结果是有意义的，尽管这只会产生最小(可以忽略不计)的影响。这种方法的效率是您所能得到的最好的，不需要使用非常好的散列(std::set使用bin树)或关于向量中值的更多信息。

由于我对之前的答案不满意，这里是我今天的答案。你最初的问题没有提到向量有多大。假设您的std::vector<>非常大，并且副本很少(为什么不呢?)这意味着使用另一个容器(例如。std::set<>)基本上会重复你的内存消耗。既然您的目标只是简单地计数非重复，那么为什么要这样做呢?

我喜欢@Flame的答案，但我对std::unique的呼叫并不满意。你花了很多时间仔细地排序你的向量，然后简单地丢弃排序后的数组，而你可以重用它。

我在STD库中找不到任何真正优雅的东西，所以这里是我的建议(std::transform + std::abs + STD::sort的混合物，但不触及之后的排序数组)。

// count the number of distinct absolute values among the elements of the sorted container
template<class ForwardIt>
typename std::iterator_traits<ForwardIt>::difference_type 
count_unique(ForwardIt first, ForwardIt last)
{
  if (first == last)
    return 0;
  typename std::iterator_traits<ForwardIt>::difference_type 
    count = 1;
  ForwardIt previous = first;
  while (++first != last) {
    if (!(*previous == *first) ) ++count;
    ++previous;
  }
  return count;
}

使用前向迭代器:

#include <iostream>
#include <list>
int main()
{
  std::list<int> nums {1, 3, 3, 3, 5, 5, 7,8};
  std::cout << count_unique( std::begin(nums), std::end(nums) ) << std::endl;
  const int array[] = { 0,0,0,1,2,3,3,3,4,4,4,4};
  const int n = sizeof array / sizeof * array;
  std::cout << count_unique( array, array + n ) << std::endl;
  return 0;
}

两点

std::list非常不利于搜索。
使用std::集。插入是对数的，它删除重复并排序。插入每个值O(n log n)，然后使用set::size计算有多少个值

编辑:

回答你问题的第2部分，c++标准规定了容器和算法操作的最坏情况。

Find:因为你使用的是Find的自由函数版本，它接受迭代器，所以它不能对传递的序列做任何假设，它不能假设范围是有序的，所以它必须遍历每一项，直到找到匹配项，这是O(n)。

如果你使用set::find，另一方面，这个成员find可以利用集合的结构，它的性能要求是O(log N)，其中N是集合的大小。

首先回答你的第二个问题，是的，代码是O(n^2)，因为find的复杂性是O(n)。

你可以选择改进它。如果数字范围很低，您可以设置一个足够大的数组，并在迭代源数据时增加计数。如果范围更大但稀疏，则可以使用某种哈希表来进行计数。这两个选项都是线性复杂度。

否则，我将做一次迭代来获取每个项目的abs值，然后对它们进行排序，然后您可以在单个额外的传递中进行聚合。这里排序的复杂度是n log(n)。其他的传递对于复杂性来说并不重要。

我认为std::map也可能很有趣:

int absoluteDistinct(const vector<int> &A) 
{
    map<int, char> my_map;
    for (vector<int>::const_iterator it = A.begin(); it != A.end(); it++)
    {
        my_map[abs(*it)] = 0;
    }
    return my_map.size();
}

正如@Jerry所说，为了稍微改进大多数其他答案的主题，您可以使用std::map或std::set而不是std::map或std::unordered_set(或boost等效)。

这将使运行时间从O(n lgn)或O(n)降低。

另一种可能性，取决于给定数据的范围，您可能能够执行基数排序的变体，尽管问题中没有任何内容立即表明这一点。

使用基数式排序对列表进行排序，效率为0 (n)。比较相邻的值。

最好的方法是自定义快速排序算法，这样当我们分区时，每当我们得到两个相等的元素时，然后用范围内的最后一个元素覆盖第二个重复的元素，然后缩小范围。这将确保您不会处理重复元素两次。同样，在快速排序完成后，元素的范围是答案复杂度仍然是0 (n* ln -n)，但这应该节省至少两次数组传递。

同样，节省与重复的百分比成正比。想象一下，如果他们把原来的问题改成"假设90%的元素是重复的"……

一个方法:

空间高效:使用哈希映射。O(logN)*O(n)表示插入，只保留成功插入元素的计数。

时间效率:使用哈希表O(n)进行插入，只保留成功插入元素的计数。

您的代码中有嵌套循环。如果你扫描整个数组的每个元素它会给你O(n^2)的时间复杂度这在大多数情况下是不可接受的。这就是合并排序和快速排序算法出现的原因，它们可以节省处理周期和机器的工作量。我建议您按照建议的链接重新设计您的程序。