这是我的程序编写每十亿个组合的一种更有效的方式

A more efficient way for my program to write every billionth combination?

本文关键字：组合一种方式有效十亿我的程序更新时间：2023-10-16

因此，下面的程序会生成这个主字符串中的字符组合，您将在程序中看到。首先，程序生成所有48个选择12的组合，然后一直到48个选择19。

问题是，组合的总数是65万亿，这是不可能在合理的时间内计算出来的。我想，"好吧，好吧，我只会把每十亿分之一写入文件。"好吧，这也需要大量的时间，因为这个程序仍然需要计算到65万亿，即使它只写入每十亿分之二的组合。

在我的程序中，有什么可以修改的吗？以避免计算到一个非常大的数字，但仍然将每十亿个组合写入一个文件？

#include <iostream>
#include <string>
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}
unsigned long long count = 0;
int main()
{
  ofstream myfile;
  myfile.open ("m = 8.txt");
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnop";
  for (int i = 12; i <= 19; i++)
  {
    std::size_t comb_size = i;
    do
    { 
      if (count == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;
      if (++count % 1000000000 == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;
    }while(next_combination(s.begin(),s.begin()+ comb_size,s.end()));
  }
  myfile.close();
  cout << "Done!" << endl;
  system("PAUSE");
  return 0;
}

我有一个使用不同库的简单转换，它比您的库快36X。它仍然是蛮力。但在我的机器上，我估计你的代码需要418天才能完成，而我的代码只需要3.65天。仍然长得离谱。但这归结为一个漫长的周末。

这是我的代码：

#include <iostream>
#include <string>
#include <fstream>
#include "../combinations/combinations"
using namespace std;
unsigned long long count = 0;
int main()
{
  ofstream myfile("m = 8.txt");
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnop";
  for (int i = 12; i <= 19; i++)
     for_each_combination(s.begin(), s.begin() + i, s.end(),
        [&](std::string::const_iterator f, std::string::const_iterator l) -> bool
        {
          if (::count++ % 1000000000 == 0)
            myfile << std::string(f, l) << std::endl;
          return false;
        });
  myfile.close();
  cout << "Done!" << endl;
  return 0;
}

在内环中减少count的测试次数可使性能提高15%。

"../组合/组合"指的是这个库：

http://howardhinnant.github.io/combinations.html

该链接包括描述和完整的源代码。

该测试程序也可以很容易地进行修改，以计算组合的总数：

#include <iostream>
#include <string>
#include "../combinations/combinations"
using namespace std;

int main()
{
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnop";
  unsigned long long count = 0;
  for (int i = 12; i <= 19; i++)
     count += count_each_combination(s.begin(), s.begin() + i, s.end());
  cout << "Done! " << count << endl;
  return 0;
}

输出：

Done! 27189132782091

该代码是带有boost许可证的开源代码（它不是boost库的一部分）。请随意使用。

这是我之前写的代码，用于查找给定字符串的第k个排列。我认为我的想法与@Tarik类似，即我们不需要在第k个之前列出所有排列。

string getPermutation(string s, int k) {
    string res;
    int n = s.size();
    int total = 1, digits = n - 1;
    for (int i = 1; i < n; ++i)
        total *= i;
    while (res.size() < n)
    {
        int i = 0;
        for (int m = 1; m < (int) ceil(k * 1.0 / total); ++m)
            i++;
        res += s[i];
        s.erase(s.begin() + i); // erase from string is not a good idea:)
        k = (k % total == 0) ? total : k % total;
        total = (total == 1) ? 1 : total / digits--;
    }
    return res;
}

它适用于短字符串。例如，getPermutation("12345", 37)将返回24135。

但是对于长度为48的字符串s，即使类型为long long，变量total也会溢出。所以我们需要做额外的工作来处理这个问题。

我的代码有些难以理解：）

您可以改进我的代码
更新：我意识到你需要的是组合而不是排列。我完全错了！忘记我的代码：）

来源http://en.wikipedia.org/wiki/Combinadic存在直接计算第k个组合的算法。您需要首先存储Pascal三角形。如果你需要一些代码示例，你可以看看（Python语言）https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py.

您可以使用位向量来加快一些计算，这些计算是根据国际象棋编程Wiki的位推页面改编的。

#include <iostream>
#include <iomanip>
#include <cstdint>
using U64 = uint64_t;
// generate the next integer with the same number of bits as c
U64 next_combination(U64 c) 
{
    auto const smallest = c & -c;
    auto const ripple = c + smallest;
    auto ones = c ^ ripple;
    ones = (ones >> 2) / smallest;
    return ripple | ones;
}
// generate all integers with k of the first n bits set
template<class Function>
void for_each_combination(std::size_t n, std::size_t k, Function fun)
{
    U64 y;
    auto const n_mask = (1ULL << n) - 1; // mask with all n bits set to 1
    auto const k_mask = (1ULL << k) - 1; // mask with first k bits set to 1
    auto x = k_mask; fun(x);
    for (; (y = next_combination(x) & n_mask) > x; x = y) fun(y);
}
int main() 
{
    auto const million = 1000000ULL;
    auto count = U64 { 0 };
    for (auto i = 12; i < 20; ++i) {
        for_each_combination(48, i, [&](U64 c) {
        /*if (count++ & million == 0) std::cout << std::dec << std::setfill(' ') << std::setw(8) << (count - 1) / million << ": " << std::hex << std::showbase << std::setfill('0') << std::setw(16) << c << "n";*/
            ++count;
        });
    }
    std::cout << count << "n";
}

在我的Xeon E5-1650@3.2 Ghz的虚拟机箱中的单个内核上，我的最佳估计是，计数器需要3.52天才能增加2.7e13倍（本身不生成输出）。它只适用于n<64，除非您使用某个128位整数类。

给定一个n位中k位设置为1的位向量，将其映射到原始字符序列或任何其他类型并打印所需的任何组合是一件简单的事情。对于没有随机迭代器访问的序列，它当然比Howard Hinnant的方法更昂贵。

如果你不在乎实际计数是多少，你可以使用32位int，它仍然可以让你知道你达到了10亿。