Since some months I have not been posting anything on my blog. I should admit that I was really busy. Recently, a friend asked me about the Out of Memory messages in Linux. How is it generated? What are the consequences? How can it be avoided in Linux machines? There is no specific answer to this as an investigation had to be carried out to have the Root Cause Analysis. Before getting into details about OOM, let's be clear that whenever the Kernel is starved of memory, it will start killing processes. Many times, Linux administrators will experience this and one of the fastest way to get rid of it is by adding extra swap. However, this is not the definite way of solving the issue. A preliminary assessment needs to be carried out followed by an action plan, alongside, a rollback methodology.
If after killing some processes, the kernel cannot free up some memory, it might lead to a kernel panic, deadlocks in applications, kernel hungs, or several defunct processes in the machine. I know cases where the machine change run level mode. There are cases of kernel panic in virtual machines where the cluster is not healthy. In brief, OOM is a subsystem to kill one or more processes with the aim to free memory. In the article Linux kernel crash simulation using kdump, I gave an explanation how to activate Kdump to generate a vmcore for analysis. However, to send the debug messages during an out of memory error to the vmcore, the SYSCTL file need to be configured. I will be using a CentOS 7 machine to illustrate the OOM parameters and configurations.
1.To activate OOM debug in the vmcore file, set the parameter vm.panic_on_oom to 1 using the following command:
systctl -w vm.panic_on_oom=1
To verify if the configuration has been taken into consideration, you can do a sysctl -a | grep -i oom. It is not recommended to test such parameters in the production environment.
2. To find out which process the kernel is going to kill, the kernel will read a function in the kernel code called badness() . The badness() calculate a numeric value about how bad this task has been. To be precise, it works by accumulating some "points" for each process running on the machine and will return those processes to a function called select_bad_process() in the linux kernel. This will eventually start the OOM mechanism which will kill the processes. The "points" are stored in the /proc/<pid>/oom_score. For example, here, i have a server running JAVA.
As you can see, the process number is 2153. The oom_score is 21
3. There are lots of considerations that are taken into account when calculating the badness score. Some of the factors are the Virtual Memory size (VM size), the Priority of the Process (NICE value), the Total Runtime, the Running user and the /proc/<pid>/oom_adj. You can also set up the oom_score_adj value for any PID between -1000 to 1000. The lowest possible value, -1000, is equivalent to disabling OOM killing entirely for that task since it will always report a badness score of 0.
4. Let's assume that you want to prevent a specific process from being killed.
echo -17 > /proc/$PID/oom_adj
5. If you know the process name of SSH Daemon and do not it from being killed, use the following command:
pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
6. To automate the sshd from being killed through a cron which will run each minute use the following:
* * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
7. Let's now simulate the OOM killer messages. Use the following command to start an out of memory event on the machine. echo f > /proc/sysrq-trigger You will notice an OOM error message in the /var/log/messages. As you can notice here, the PID 843 was calculated by the OOM killer before killing it. There is also the score number which is 4 in our case. Before the 'Out of memory' error, there will be a call trace which will be sent by the kernel. 8. To monitor how the OOM killer is generating scores, you can use the dstat command. To install the dstat package on RPM based machine use: yum install dstat or for debian based distribution use: apt-get install dstat Dstat is used to generate resource statistics. To use dstat to monitor the score from OOM killer use: dstat -top-oom
- oom_score_adj is used in new linux kernel. The deprecated function is oom_adj in old Linux machine.
- When disabling OOM killer under heavy memory pressure, it may cause the system to kernel panic.
- Making a process immune is not a definite way of solving problem, for example, when using JAVA Application. Use a thread/heap dump to analyse the situation before making a process immune.
- Dstat is now becoming an alternative for vmstat, netstat, iostat, ifstat and mpstat. For example, to monitor CPU in a program, use dstat -c --top-cpu -dn --top-mem
- Testing in production environment should be avoided!