malloc_trim(0)释放线程竞技场的fastbin
malloc_trim(0) Releases Fastbins of Thread Arenas?
在过去一周左右的时间里,我一直在调查一个应用程序中的问题,其中内存使用随着时间的推移而累积。我把范围缩小到复制
的一行 std::vector< std::vector< std::vector< std::map< uint, map< uint, std::bitset< N> > > > > >
在工作线程(我意识到这是一个荒谬的方式来组织内存)。在常规的基础上,工作线程被销毁、重新创建,并在线程启动时复制该内存结构。被复制的原始数据通过主线程的引用传递给工作线程。
使用malloc_stat和malloc_info,我可以看到,当工作线程被销毁时,它正在使用的竞技场/堆在它的空闲fastbin列表中保留了用于该结构的内存。这是有意义的,因为有许多小于64字节的单独分配。
问题是,当工作线程被重新创建时,它会创建一个新的竞技场/堆,而不是重用以前的竞技场/堆,这样以前竞技场/堆的fastbin就永远不会被重用。最终,系统在重用以前的堆/竞技场之前耗尽内存,以重用它们持有的fastbin。
偶然地,我发现在我的主线程中调用malloc_trim(0),在加入工作线程之后,导致线程竞技场/堆中的fastbin被释放。据我所知,这种行为是未记录的。有人能解释一下吗?
下面是我用来查看这种行为的一些测试代码:
// includes
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <iostream>
#include <stdexcept>
#include <stdio.h>
#include <string>
#include <mcheck.h>
#include <malloc.h>
#include <map>
#include <bitset>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
// Number of bits per bitset.
const int sizeOfBitsets = 40;
// Executes a system command. Used to get output of "free -m".
std::string ExecuteSystemCommand(const char* cmd) {
char buffer[128];
std::string result = "";
FILE* pipe = popen(cmd, "r");
if (!pipe) throw std::runtime_error("popen() failed!");
try {
while (!feof(pipe)) {
if (fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
} catch (...) {
pclose(pipe);
throw;
}
pclose(pipe);
return result;
}
// Prints output of "free -m" and output of malloc_stat().
void PrintMemoryStats()
{
try
{
char *buf;
size_t size;
FILE *fp;
std::string myCommand("free -m");
std::string result = ExecuteSystemCommand(myCommand.c_str());
printf("Free memory is n%sn", result.c_str());
malloc_stats();
fp = open_memstream(&buf, &size);
malloc_info(0, fp);
fclose(fp);
printf("# Memory Allocation Statsn%sn#> ", buf);
free(buf);
}
catch(...)
{
printf("Unable to print memory stats.n");
throw;
}
}
void MakeCopies(std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > >& data)
{
try
{
// Create copies.
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyA(data);
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyB(data);
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyC(data);
// Print memory info.
printf("Memory after creating data copies:n");
PrintMemoryStats();
}
catch(...)
{
printf("Unable to make copies.");
throw;
}
}
int main(int argc, char** argv)
{
try
{
// When uncommented, disables the use of fastbins.
// mallopt(M_MXFAST, 0);
// Print memory info.
printf("Memory to start is:n");
PrintMemoryStats();
// Sizes of original data.
int sizeOfDataA = 2048;
int sizeOfDataB = 4;
int sizeOfDataC = 128;
int sizeOfDataD = 20;
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > testData;
// Populate data.
testData.resize(sizeOfDataA);
for(int a = 0; a < sizeOfDataA; ++a)
{
testData.at(a).resize(sizeOfDataB);
for(int b = 0; b < sizeOfDataB; ++b)
{
for(int c = 0; c < sizeOfDataC; ++c)
{
std::map<uint, std::bitset<sizeOfBitsets> > dataMap;
testData.at(a).at(b).insert(std::pair<uint, std::map<uint, std::bitset<sizeOfBitsets> > >(c, dataMap));
for(int d = 0; d < sizeOfDataD; ++d)
{
std::bitset<sizeOfBitsets> testBitset;
testData.at(a).at(b).at(c).insert(std::pair<uint, std::bitset<sizeOfBitsets> >(d, testBitset));
}
}
}
}
// Print memory info.
printf("Memory to after creating original data is:n");
PrintMemoryStats();
// Start thread to make copies and wait to join.
{
boost::shared_ptr<boost::thread> makeCopiesThread = boost::shared_ptr<boost::thread>(new boost::thread(&MakeCopies, boost::ref(testData)));
makeCopiesThread->join();
}
// Print memory info.
printf("Memory to after joining thread is:n");
PrintMemoryStats();
malloc_trim(0);
// Print memory info.
printf("Memory to after malloc_trim(0) is:n");
PrintMemoryStats();
return 0;
}
catch(...)
{
// Log warning.
printf("Unable to run application.");
// Return failure.
return 1;
}
// Return success.
return 0;
}
malloc trim调用前后的有趣输出是(查找"look HERE!"):
#> Memory to after joining thread is:
Free memory is
total used free shared buff/cache available
Mem: 257676 7361 246396 25 3918 249757
Swap: 1023 0 1023
Arena 0:
system bytes = 1443450880
in use bytes = 1443316976
Arena 1:
system bytes = 35000320
in use bytes = 6608
Total (incl. mmap):
system bytes = 1478451200
in use bytes = 1443323584
max mmap regions = 0
max mmap bytes = 0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="241" to="241" total="241" count="1"/>
<size from="529" to="529" total="529" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="770"/>
<system type="current" size="1443450880"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443450880"/>
<aspace type="mprotect" size="1443450880"/>
</heap>
<heap nr="1">
<sizes>
<size from="33" to="48" total="48" count="1"/>
<size from="49" to="64" total="4026531712" count="62914558"/> <-- LOOK HERE!
<size from="65" to="80" total="160" count="2"/>
<size from="81" to="96" total="301989888" count="3145728"/> <-- LOOK HERE!
<size from="33" to="33" total="231" count="7"/>
<size from="49" to="49" total="1274" count="26"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6177" size="1433105"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6179" size="1433875"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773418496"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478451200"/>
<aspace type="mprotect" size="1478451200"/>
</malloc>
#> Memory to after malloc_trim(0) is:
Free memory is
total used free shared buff/cache available
Mem: 257676 3269 250488 25 3918 253850
Swap: 1023 0 1023
Arena 0:
system bytes = 1443319808
in use bytes = 1443316976
Arena 1:
system bytes = 35000320
in use bytes = 6608
Total (incl. mmap):
system bytes = 1478320128
in use bytes = 1443323584
max mmap regions = 0
max mmap bytes = 0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="209" to="209" total="209" count="1"/>
<size from="529" to="529" total="529" count="1"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6146" size="1432338"/>
<system type="current" size="1443459072"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443459072"/>
<aspace type="mprotect" size="1443459072"/>
</heap>
<heap nr="1"> <---------------------------------------- LOOK HERE!
<sizes> <-- HERE!
<unsorted from="0" to="67108801" total="4296392384" count="6208"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6208" size="4296392384"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="12354" size="4297824722"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773426688"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478459392"/>
<aspace type="mprotect" size="1478459392"/>
</malloc>
#>
几乎没有关于malloc_info输出的文档,所以我不确定我指出的那些输出是否真的是快速的箱子。为了验证它们确实是fastbin,我取消了
代码行的注释。mallopt(M_MXFAST, 0);
在调用malloc_trim(0)之前禁用fastbins的使用以及在加入线程之后堆1的内存使用情况,看起来就像在调用malloc_trim(0)之后启用fastbins一样。最重要的是,禁用fastbin会在线程连接后立即将内存返回给系统。调用malloc_trim(0),在加入线程并启用fastbins后,也会将内存返回给系统。
malloc_trim(0)的文档声明它只能从主竞技场堆的顶部释放内存,那么这里发生了什么?我在CentOS 7上运行glibc版本2.17
malloc_trim(0)声明它只能从主竞技场堆的顶部释放内存,那么这里发生了什么?
它可以被称为"过时"或"不正确"的文档。Glibc没有malloc_trim
函数的文档;Linux使用man-pages项目中的man-pages。malloc_trim
http://man7.org/linux/man-pages/man3/malloc_trim.3.html的手册页是在2012年由新手册页的维护者编写的。可能他使用了一些注释从glibc malloc/malloc.c源代码http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675
676 malloc_trim(size_t pad);
677
678 If possible, gives memory back to the system (via negative
679 arguments to sbrk) if there is unused memory at the `high' end of
680 the malloc pool. You can call this after freeing large blocks of
681 memory to potentially reduce the system-level memory requirements
682 of a program. However, it cannot guarantee to reduce memory. Under
683 some allocation patterns, some large free blocks of memory will be
684 locked between two used chunks, so they cannot be given back to
685 the system.
686
687 The `pad' argument to malloc_trim represents the amount of free
688 trailing space to leave untrimmed. If this argument is zero,
689 only the minimum amount of memory to maintain internal data
690 structures will be left (one page or less). Non-zero arguments
691 can be supplied to maintain enough trailing space to service
692 future expected allocations without having to re-obtain memory
693 from the system.
694
695 Malloc_trim returns 1 if it actually released any memory, else 0.
696 On systems that do not support "negative sbrks", it will always
697 return 0.
在glibc中的实际实现是__malloc_trim
,它有用于在竞技场上迭代的代码:
http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c # 4552
4552 int
4553 __malloc_trim (size_t s)
4560 mstate ar_ptr = &main_arena;
4561 do
4562 {
4563 (void) mutex_lock (&ar_ptr->mutex);
4564 result |= mtrim (ar_ptr, s);
4565 (void) mutex_unlock (&ar_ptr->mutex);
4566
4567 ar_ptr = ar_ptr->next;
4568 }
4569 while (ar_ptr != &main_arena);
每个竞技场都使用mtrim()
(mTRIm()
)函数进行裁剪,该函数调用malloc_consolidate()
将所有空闲段从fastbin(它们在自由时不会合并,因为它们是快速的)转换为正常的空闲块(与相邻块合并)
4498 /* Ensure initialization/consolidation */
4499 malloc_consolidate (av);
4111 malloc_consolidate is a specialized version of free() that tears
4112 down chunks held in fastbins.
1581 Fastbins
1591 Chunks in fastbins keep their inuse bit set, so they cannot
1592 be consolidated with other free chunks. malloc_consolidate
1593 releases all chunks in fastbins and consolidates them with
1594 other free chunks.
问题是,当工作线程被重新创建时,它会创建一个新的竞技场/堆,而不是重用以前的竞技场/堆,这样以前竞技场/堆的fastbin就永远不会被重用。
这很奇怪。根据设计,在glibc malloc中,竞技场的最大数量由cpu_core_count * 8限制(适用于64位平台);cpu_core_count * 2(适用于32位平台)或通过环境变量MALLOC_ARENA_MAX
/mallopt
参数M_ARENA_MAX
.
您可以限制应用程序的竞技场数量;定期调用malloc_trim()
,或者使用"大"大小调用malloc()
(它将调用malloc_consolidate
),然后在销毁之前从线程中调用free()
:
3319 _int_malloc (mstate av, size_t bytes)
3368 if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
// fastbin allocation path
3405 if (in_smallbin_range (nb))
// smallbin path; malloc_consolidate may be called
3437 If this is a large request, consolidate fastbins before continuing.
3438 While it might look excessive to kill all fastbins before
3439 even seeing if there is space available, this avoids
3440 fragmentation problems normally associated with fastbins.
3441 Also, in practice, programs tend to have runs of either small or
3442 large requests, but less often mixtures, so consolidation is not
3443 invoked all that often in most programs. And the programs that
3444 it is called frequently in otherwise tend to fragment.
3445 */
3446
3447 else
3448 {
3449 idx = largebin_index (nb);
3450 if (have_fastchunks (av))
3451 malloc_consolidate (av);
3452 }
PS: malloc_trim
手册页有注释https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65:
+.SH NOTES
+This function only releases memory in the main arena.
+." malloc/malloc.c::mTRIm():
+." return result | (av == &main_arena ? sYSTRIm (pad, av) : 0);
是的,有main_arena的检查,但它是在malloc_trim
实现mTRIm()
的最后,它只是为了调用sbrk()
负偏移量。自2007年(glibc 2.9及更新版本)以来,有另一种将内存返回给操作系统的方法:madvise(MADV_DONTNEED)
,它在所有领域都使用(并且没有由glibc补丁或手册页的作者记录)。每个竞技场都需要团结。还有一些代码用于修剪(munmapping) mmap-ed堆的顶部块(heap_trim
/shrink_heap
从slow path free()调用),但它不会从malloc_trim
调用。
- 从不同线程使用int64的不同字节安全吗
- 删除一个线程上有数百万个字符串的大型哈希映射会影响另一个线程的性能
- 在C++中使用cURL和多线程
- 为什么我的C#代码在调用回C++COM直到Task时会暂停.等待/线程.加入
- 在cuda线程之间共享大量常量数据
- 如何将元素添加到数组的线程安全函数?
- 线程,如果else语句,都是错误的上下文切换后,会发生什么
- C++Boost Asio Pool线程,带有lambda函数和传递引用变量
- Qt C++静态thread_local QNetworkAccessManager是线程应用程序的好选择吗
- 异常属于C++中的线程还是进程
- C++中的线程安全删除
- C++使用params创建线程函数会导致转换错误
- 类与私有变量的其他类之间的线程安全性
- CoInitialize()在单独的线程上崩溃而不返回
- c++中的线程池
- 线程之间的布尔停止信号
- 为什么std::async使用同一个线程运行函数
- 用于矢量处理的多个线程
- C++为线程工作动态地分割例程
- malloc_trim(0)释放线程竞技场的fastbin