Linux 真正分配内存它不会在C++代码中

Linux really allocating memory it shoudn't in C++ code

本文关键字：C++ 代码分配内存 Linux 更新时间：2023-10-16

在Linux中，在我们实际使用该内存之前，内核不会分配任何物理内存页，但是我在这里很难找到为什么它实际上确实分配了这些内存：

   for(int t = 0; t < T; t++){
      for(int b = 0; b < B; b++){
         Matrix[t][b].length = 0;
         Matrix[t][b].size = 60;
         Matrix[t][b].pointers = (Node**)malloc(60*sizeof(Node*)); 
         }
   }

然后，我访问此数据结构以向其添加一个元素，如下所示：

   Node* elem = NULL;
   Matrix[a][b].length++;
   Matrix[a][b]->pointers[ Matrix[a][b].length ] = elem;

从本质上讲，我在运行我的程序时在 htop 旁边，如果我增加 no，Linux 确实会分配更多内存。我在上面的代码中有"60"。为什么？当第一个元素添加到数组中时，它不应该只分配一页吗？

这取决于您的 Linux 系统的配置方式。

这是一个简单的 C 程序，它尝试分配 1TB 的内存并接触其中的一些内存。

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
  char *array[1000];
  int i;
  for (i = 0; i < 1000; ++i)
  {
    if (NULL == (array[i] = malloc((int) 1e9)))
    {
      perror("malloc failed!");
      return -1;
    }
    array[i][0] = 'H';
  }
  for (i = 0; i < 1000; ++i)
    printf("%c", array[i][0]);
  printf("n");
  sleep(10);
  return 0;
}

当我在它旁边运行时，它说 VIRT 内存使用量为 931g（其中 g 表示 GiB），而 RES 仅达到 4380 KiB。

现在，当我通过/sbin/sysctl -w vm.overcommit_memory=2更改系统以使用不同的过度使用策略并重新运行它时，我得到：

malloc failed!: Cannot allocate memory

因此，您的系统可能正在使用与预期不同的过度使用策略。有关更多信息，请阅读此内容。

您假设 malloc/new 不会导致写入任何内存，因此操作系统分配物理内存是不正确的（对于您拥有的内存分配器实现）。

我已经在以下简单程序中重现了您描述的行为：

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
  char **array[128][128];
  int    size;
  int    i, j;
  if (1 == argc || 0 >= (size = atoi(argv[1])))
    fprintf(stderr, "usage: %s <num>; where num > 0n", argv[0]), exit(-1);
  for (i = 0; i < 128; ++i)
    for (j = 0; j < 128; ++j)
      if (NULL == (array[i][j] = malloc(size * sizeof(char*))))
      {
        fprintf(stderr, "malloc failed when i = %d, j = %dn", i, j);
        perror(NULL);
        return -1;
      }
  sleep(10);
  return 0;
}

当我使用各种小size参数作为输入运行它时，VIRT 和 RES 内存占用量（如 top 报告的那样）会同步增长，即使我没有明确接触我正在分配的内部数组。

这基本上成立，直到size超过 ~512。此后，RES 保持在 64 MiB 不变，而 VIRT 可能非常大（例如size，当 10M 时为 - 1220 GiB）。这是因为 512 * 8 = 4096，这是 Linux 系统上常见的虚拟页面大小，而 128 * 128 * 4096 B = 64 MiB。

因此，看起来每个分配的第一页都被映射到物理内存，可能是因为 malloc/new 本身正在写入部分分配以用于自己的内部簿记。当然，许多小的分配可能适合并放在同一个页面上，因此对于许多这样的分配，只有一个页面映射到物理内存。

在您的代码示例中，更改数组的大小很重要，因为这意味着一个页面上可以容纳的数组更少，因此需要 malloc/new 本身接触更多的内存页（因此由操作系统映射到物理内存）在程序运行期间。

当您使用 60 时，大约需要 480 个字节，因此这些分配中的 ~8 个可以放在一个页面上。当您使用 100 时，大约需要 800 个字节，因此只有 ~5 个分配可以放在一个页面上。因此，我希望"100 程序"使用的内存大约是"60 程序"的 8/5，这似乎是一个足够大的差异，可以让您的机器开始交换到稳定存储。

如果每个较小的"60"分配的大小已经超过 1 页，则将其更改为较大的"100"不会影响程序的初始物理内存使用量，就像您最初预期的那样。

PS - 我认为您是否明确触摸分配的初始页面将无关紧要，因为 malloc/new 已经这样做了（对于您拥有的内存分配器实现）。

下面是一个草图，如果你通常期望你的 b 数组通常很小，通常小于 2^X 指针（在下面的代码中 X = 5），但也可以处理它们变得更大的特殊情况。

如果预期使用量不匹配，可以向下调整 X。您也可以将最小大小的数组从 0 向上调整（并且不分配较小的 2^i 级别），如果您预计大多数数组通常至少使用 2^Y 指针（例如 - Y = 3）。

如果您认为实际上 X == Y（例如 - 4）用于您的使用模式，那么您只需分配一次 B * （0x1 <<X） * sizeof（Node*）并将该 T 数组划分给您的 b。然后，如果一个 b 数组需要超过 2^X 指针，那么求助于 malloc 对于它，如果需要进一步增长，则求助于 realloc。

这里的要点是，初始分配将映射到非常少的物理内存，从而解决最初激发您原始问题的问题。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define T           1278
#define B           131072
#define CAP_MAX_LG2 5        
#define CAP_MAX     (0x1 << CAP_MAX_LG2)  // pre-alloc T's to handle all B arrays of length up to 2^CAP_MAX_LG2
typedef struct Node Node;
typedef struct
{
  int    t;                               // so a matrix element can know to which T_Allocation it belongs
  int    length;
  int    cap_lg2;                         // log base 2 of capacity; -1 if capacity is zero
  Node **pointers;
} MatrixElem;
typedef struct
{
  Node **base;                            // pre-allocs B * 2^(CAP_MAX_LG2 + 1) Node pointers; every b array can be any of { 0, 1, 2, 4, 8, ..., CAP_MAX } capacity
  Node **frees_pow2[CAP_MAX_LG2 + 1];     // frees_pow2[i] will point at the next free array of 2^i pointers to Node to allocate to a growing b array
} T_Allocation;
MatrixElem   Matrix[T][B];
T_Allocation T_Allocs[T];
int  Node_init(Node *n) { return 0; } // just a dummy
void Node_fini(Node *n) { }           // just a dummy 
int  Node_eq(const Node *n1, const Node *n2)  { return 0; } // just a dummy
void Init(void)
{
  for(int t = 0; t < T; t++) 
  {
    T_Allocs[t].base = malloc(B * (0x1 << (CAP_MAX_LG2 + 1)) * sizeof(Node*));
    if (NULL == T_Allocs[t].base)
      abort();
    T_Allocs[t].free_pows2[0] = T_Allocs[t].base;
    for (int x = 1; x <= CAP_MAX_LG2; ++x)
      T_Allocs[t].frees_pow2[x] = &T_Allocs[t].base[B * (0x1 << (x - 1))];
    for(int b = 0; b < B; b++)
    {
      Matrix[t][b].t        = t;
      Matrix[t][b].length   = 0;
      Matrix[t][b].cap_lg2  = -1;
      Matrix[t][b].pointers = NULL;
    }
  }
}
Node *addElement(MatrixElem *elem)
{
  if (-1 == elem->cap_lg2 || elem->length == (0x1 << elem->cap_lg2))  // elem needs a bigger pointers array to add an element
  {
    int new_cap_lg2 = elem->cap_lg2 + 1;
    int new_cap     = (0x1 << new_cap_lg2);
    if (new_cap_lg2 <= CAP_MAX_LG2)            // new b array can still fit in pre-allocated space in T
    {
      Node **new_pointers = T_Allocs[elem->t].frees_pow2[new_cap_lg2];
      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;
      T_Allocs[elem->t].frees_pow2[new_cap_lg2] += new_cap;
    }
    else if (elem->cap_lg2 == CAP_MAX_LG2)     // exceeding pre-alloc'ed arrays in T; use malloc
    {
      Node **new_pointers = malloc(new_cap * sizeof(Node*));
      if (NULL == new_pointers)
        return NULL;
      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;
    } 
    else                                       // already exceeded pre-alloc'ed arrays in T; use realloc
    {
      Node **new_pointers = realloc(elem->pointers, new_cap * sizeof(Node*));
      if (NULL == new_pointers)
        return NULL;
      elem->pointers = new_pointers;
    }
    ++elem->cap_lg2;
  }
  Node *ret = malloc(sizeof(Node);
  if (ret)
  {
    Node_init(ret);
    elem->pointers[elem->length] = ret;
    ++elem->length;
  }
  return ret;
}
int removeElement(const Node *a, MatrixElem *elem)
{
  int i;
  for (i = 0; i < elem->length && !Node_eq(a, elem->pointers[i]); ++i);
  if (i == elem->length)
    return -1;
  Node_fini(elem->pointers[i]);
  free(elem->pointers[i]);
  --elem->length;
  memmove(&elem->pointers[i], &elem->pointers[i+1], sizeof(Node*) * (elem->length - i));
  return 0;
}
int main()
{
  return 0;
}