Fork me on GitHub

版权声明 本站原创文章 由 萌叔 发表
转载请注明 萌叔 | https://vearne.cc

起因: 生产环境的一台redis机器

Can't save in background: fork: Cannot allocate memory

导致redis服务停止,但是当时机器的内存是64G,redis使用到的内存只有40多G
我们都知道,redis 如果开启了持久化,RDB模式的bgsave 以及 AOF模式下,重写appendonly.aof 都会导致redis fork 出一个子进程。但是难道操作系统的进程fork难道不应该是copy-on-write 的吗?

这件事让我重新关注起redis启动时的日志来。
首先来看看redis启动时所报的日志

1610:M 12 Sep 07:46:20.524 # Server started, Redis version 3.0.1
1610:M 12 Sep 07:46:20.524 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1610:M 12 Sep 07:46:20.524 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1610:M 12 Sep 07:46:20.525 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1610:M 12 Sep 07:46:20.525 * The server is now ready to accept connections on port 6379
1610:M 12 Sep 07:57:21.819 * Background saving started by pid 1615
1615:C 12 Sep 07:57:21.827 * DB saved on disk
1615:C 12 Sep 07:57:21.827 * RDB: 4 MB of memory used by copy-on-write
1610:M 12 Sep 07:57:21.925 * Background saving terminated with success

可以看到警告有3个
1. overcommit_memory
2. THP
3. TCP backlog

overcommit_memory

Defines the conditions that determine whether a large memory request is accepted or denied. There are three possible values for this parameter:
0 — The default setting. The kernel performs heuristic memory overcommit handling by estimating the amount of memory available and failing requests that are blatantly invalid. Unfortunately, since memory is allocated using a heuristic rather than a precise algorithm, this setting can sometimes allow available memory on the system to be overloaded.
1 — The kernel performs no memory overcommit handling. Under this setting, the potential for memory overload is increased, but so is performance for memory-intensive tasks.
2 — The kernel denies requests for memory equal to or larger than the sum of total available swap and the percentage of physical RAM specified in overcommit_ratio. This setting is best if you want a lesser risk of memory overcommitment.

当操作系统收到内存分配请求时,它会依据overcommit_memory 设定的条件,考虑接受或者拒绝这个请求
0 — 默认设置 内核使用启发式算法,来估算可用的内存量,直接拒绝不合理的请求
1 — 内核不考虑内存是否够用,直接同意请求,在这种设置下,潜在的内存过载风险增加了,但有利于内存密集型任务
2 — 如果程序请求的内存分配大于等于 交换分区和物理内存的总和 * overcommit_ratio / 100 则拒绝这个请求
默认是 交换分区和物理内存总和的50%

默认设置是0,只要内存请求超过物理内存的剩余量,请求就会被拒绝。设置1,不管实际物理内存使用量,直接同意请求。设置1是一种比较粗放式的对内存请求的管理方式,我认为更为优雅的方式是使用2,并且将overcommit_ratio 的值设为60 ~ 80

echo "vm.overcommit_memory=2" >> /etc/sysctl.conf
echo "vm.overcommit_ratio=70" >> /etc/sysctl.conf
sysctl -p

Transparent Huge Pages
操作系统默认的内存页大小是4kB,可以如果使用更大的内存页比如2MB,就可以使用同样多的页表项,管理更大的内存空间,但是对于redis这样的内存数据库,它会导致内存分配的速度变慢,并且导致内存的实际使用率下降,因此redis推荐我们关闭此项

echo never > /sys/kernel/mm/transparent_hugepage/enabled

TCP backlog

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored. See tcp(7) for more information.
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.

对于linux 而言backlog 是指已经完成了3次握手,且等待 accept 的连接
如果被没有被accept, 连接会一直在队列中排队,队列的最大长度为 backlog
可以想见, 在client 非常多且创建和关闭连接非常频繁的场景下,这个参数会非常有用。

echo "net.core.somaxconn = 511" >> /etc/sysctl.conf
sysctl -p

参考资料:
overcommit_memory
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-captun.html
Transparent Huge Pages
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html
TCP backlog
http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html
http://linux.die.net/man/2/listen

PS:
提醒大家留意日志中的这几句话

1942:C 14 Sep 23:17:42.568 * RDB: 2 MB of memory used by copy-on-write
1660:M 14 Sep 23:17:42.650 * Background saving terminated with success

执行BGSAVE 过程,也就是即使执行fork操作,由于有copy-on-write机制,实际真正被额外分配的物理内存也就2MB而已。观察这个值对于调整overcommit_ratio的具体值会有不小的参考价值

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据