Tag: kernel

Out of Memory (OOM) in Linux machines

Since some months I have not been posting anything on my blog. I should admit that I was really busy. Recently, a friend asked me about the Out of Memory messages in Linux. How is it generated? What are the consequences? How can it be avoided in Linux machines? There is no specific answer to this as an investigation had to be carried out to have the Root Cause Analysis. Before getting into details about OOM, let’s be clear that whenever the Kernel is starved of memory, it will start killing processes. Many times, Linux administrators will experience this and one of the fastest way to get rid of it is by adding extra swap. However, this is not the definite way of solving the issue. A preliminary assessment needs to be carried out followed by an action plan, alongside, a rollback methodology.

If after killing some processes, the kernel cannot free up some memory, it might lead to a kernel panic, deadlocks in applications, kernel hungs, or several defunct processes in the machine. I know cases where the machine change run level mode. There are cases of kernel panic in virtual machines where the cluster is not healthy. In brief, OOM is a subsystem to kill one or more processes with the aim to free memory. In the article Linux kernel crash simulation using kdump, I gave an explanation how to activate Kdump to generate a vmcore for analysis. However, to send the debug messages during an out of memory error to the vmcore, the SYSCTL file need to be configured. I will be using a CentOS 7 machine to illustrate the OOM parameters and configurations.

1.To activate OOM debug in the vmcore file, set the parameter vm.panic_on_oom to 1 using the following command:

systctl -w vm.panic_on_oom=1

To verify if the configuration has been taken into consideration, you can do a sysctl -a | grep -i oom. It is not recommended to test such parameters in the production environment.

2. To find out which process the kernel is going to kill, the kernel will read a function in the kernel code called badness() . The badness() calculate a numeric value about how bad this task has been. To be precise, it works by accumulating some “points” for each process running on the machine and will return those processes to a function called select_bad_process() in the linux kernel. This will eventually start the OOM mechanism which will kill the processes. The “points” are stored in the /proc/<pid>/oom_score. For example, here, i have a server running JAVA.

As you can see, the process number is 2153. The oom_score is 21

3. There are lots of considerations that are taken into account when calculating the badness score. Some of the factors are the Virtual Memory size (VM size), the Priority of the Process (NICE value), the Total Runtime, the Running user and the /proc/<pid>/oom_adj. You can also set up the oom_score_adj value for any PID between -1000 to 1000. The lowest possible value, -1000, is equivalent to disabling OOM killing entirely for that task since it will always report a badness score of 0.

4. Let’s assume that you want to prevent a specific process from being killed.

echo -17 > /proc/$PID/oom_adj

5. If you know the process name of SSH Daemon and do not it from being killed, use the following command:

pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

6. To automate the sshd from being killed through a cron which will run each minute use the following:

* * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
7. Let's now simulate the OOM killer messages. Use the following command to start an out of memory event 
on the machine.
echo f > /proc/sysrq-trigger 

You will notice an OOM error message in the /var/log/messages.
As you can notice here, the PID 843 was calculated by the OOM killer before killing it. 
There is also the score number which is 4 in our case.


Before the 'Out of memory' error, there will be a call trace which will be sent by the kernel.



8. To monitor how the OOM killer is generating scores, you can use the dstat command. To install the dstat 
package on RPM based machine use: 
yum install dstat 

or for debian based distribution use:
apt-get install dstat

Dstat is used to generate resource statistics. To use dstat to monitor the score from OOM killer use:
dstat -top-oom


TIPS:

  • oom_score_adj is used in new linux kernel. The deprecated function is oom_adj in old Linux machine.
  • When disabling OOM killer under heavy memory pressure, it may cause the system to kernel panic.
  • Making a process immune is not a definite way of solving problem, for example, when using JAVA Application. Use a thread/heap dump to analyse the situation before making a process immune.
  • Dstat is now becoming an alternative for vmstat, netstat, iostat, ifstat and mpstat. For example, to monitor CPU in a program, use dstat -c –top-cpu -dn –top-mem
  • Testing in production environment should be avoided!

Analyzing vmcore with crash

In the article linux kernel crash simulation using kdump, i gave a brief idea as to how to generate a vmcore file during a crash or hangs. On this article, i will emphasize on the analysis of a vmcore which has been generated and the tool ‘crash’ which can be used for advance analysis. In a future article, i will elaborate on how to decode the detailed information given with the crash tool. Lets see how to use the crash utility first.

tux-logo

1.Download the package kernel-debuginfo and kernel-debuginfo-common. You will notice a vmlinux file has been created just after the installation under /usr/lib/debug/lib/modules/2.6.32-573.7.1.el6.centos.plus.i686/vmlinux

Screenshot from 2015-11-02 12:49:34

yum install kernel-debuginfo kernel-debuginfo-common -y

2. Now, we will launch the crash utility which can be used for live debugging. By default, it will give you the info from the available vmcore.

crash /usr/lib/debug/lib/modules/2.6.32-573.7.1.el6.centos.plus.i686/vmlinux /boot/System.map-2.6.32-573.7.1.el6.i686

3. However, you can specify a specific vmcore file with the following command by mentioning the location of the vmcore

crash /usr/lib/debug/lib/modules/2.6.32-573.7.1.el6.centos.plus.i686/vmlinux /boot/System.map-2.6.32-573.7.1.el6.i686 /var/crash/127.0.0.1-2015-10-30-00\:12\:34/vmcore

Screenshot from 2015-11-02 13:52:46

4. You will have several information related to the kernel as well as the most interesting stuff is what have cause the panic that is the warning message. In this case it is a “SysRq”. If you remember from the last article we had fired an echo c > /proc/sysrq-trigger. Under the state tab it also gave an indication of the task SYSRQ running.

5. We can also check the process running on the crash utility using the PID given.

Screenshot from 2015-11-02 14:03:396. Another interesting command is the bt which enable us to see execution history of the process

Screenshot from 2015-11-02 14:05:22

7. The sys command will give you an idea of the system. ps | grep “>” – will show you running processes during time of the crash. mount command will show you partitions mounted etc..  h command for the history.

Tips:

  • A good crash utility manual page can be found at people.redhat.com/anderson. Almost all info is available there.
  • To be able to dowload the kernel-debuginfo package, you will need to activate the repo located at /etc/yum.repos.d
  • The version of the kernel of the machine should corroborate with that of the kernel-debug-info otherwise it will not work.

Linux Kernel-4.3 Compilation from source

The Linux Kernel 4.3 has been released today, Monday the 2nd of November 2015. I have compiled it from source on a Virtual Box CentOS 7 minimum install virtual machine for some further testing. I have also use my same old configuration file. You can also view detailed packages and commits on the git repo. Here, is a brief idea how to compile it from source.

Linux_kernel_map
Linux Kernel Map – Photo credits Wikipedia

1. You will need to download all the pre-requirements if you are on a minimum install.

yum groupinstall "Development Tools"
yum install ncurses-devel bc hmaccalc zlib-devel elfutils-libelf-devel binutils-devel qt-devel

2.Download the wget tools to download the Kernel itself.

yum install wget

3.Download and untar the kernel directory

wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.3.tar.gz
tar -xvzf linux-4.3.tar.gz

4. You will need to ensure that the decompressed directory is in the /usr/src/kernels directory. If you have untar it at a location other than this one move the linux-4.3 directory in the /usr/src/kernels

5. Choose your default kernel configuration options

make menuconfig

6. To use the old config file

make oldconfig

7. Compiling the kernel

make

8. Installing the kernel

make modules_install install

Tips:

  • Be sure to get rid of to many old kernels files in the /boot directory to do not get confused.
  • You can also use the command make olddefconfig to set the default values without prompting anew for configuration.
  • To set different boot options use the command sudo grub2-set-default 0 – 0 in this case is the default kernel.
  • The command make usually take lots of time. If you have 4 vCPU, you can use make as this: make -j 4 where j stands for jobs and 4 for all the 4 CPUs
  • uname -r allows you to find your kernel version. Example uname -r gives me 3.19.0-25-generic ; i.e the letter 3 is the major, number 19 is the minor (developmental stage) and 0 is the revision number

Linux Kernel crash simulation using Kdump

There are several reasons for a Linux Kernel Crash which may include hangs, hardware and software errors. We usually consider a “Kernel hangs” and a “Kernel crash” as just a ‘crash’. In fact, these are totally two different issues; a “hang” occurs due to a time consuming operation whilst a “crash” occurs instantaneously leading to a reboot. However, during the crash process prior to the reboot, the kernel will register “oops” messages.

In this article, i will lay emphasis on the installation of the tools for analyzing Linux Kernel crash. I will elaborate more on Linux Kernel errors in a future article. Right now, we will look at the installation of Kdump – Kernel dump, a Linux kernel dumping mechanism which uses a ‘kexec mechanism’ to enable us to collect a ‘dump’ of the Linux kernel called “vmcore” (virtual memory core). Whatever event occurred during the time of the crash is registered in the “vmcore” for future analysis.

tux-logo

“Kdump uses kexec to quickly boot to a dump-capture kernel whenever a dump of the system kernel’s memory needs to be taken (for example, when the system panics). The system kernel’s memory image is preserved across the reboot and is accessible to the dump-capture kernel.”Kernel.org

Follow the steps below:

1.On both CentOS 6/7, you will need to install the kexec package using the command yum install kexec-tools

2.vim /boot/grub/grub.conf and for the kernel you are actually running edit the parameter crashkernel = auto and replace it with crashkernel= 128M (I tested it on a virtual machine with 1024MB)

3.Start the Kdump service using the command service kdump start

4.Save this parameter and verify it using the command cat /proc/cmdline. Here is an screenshot how it should look

Screenshot from 2015-10-29 23:57:42

5.You would noticed that the Kdump have the following configuration files using the command rpm -qc kexec-tools

  • /etc/kdump.conf
  • /etc/rc.d/init.d/kdump
  • /etc/sysconfig/kdump
  • /etc/udev/rules.d/98-kexec.rules

6.You can also choose the location to save your vmcore. By default, it will be saved  in /var/crash/. However, if your /var directory is assigned to a different partition with low disk space, you can choose exactly where you want to generate your vmcore by modifying the parameter path /var/crash in the /etc/kdump.conf file.

7. After modification, you will need to restart the kdump service  using the command service kdump restart.

8. Now the last step is to crash the machine thus creating a vmcore. Use the command echo c > /proc/sysrq-trigger. You would noticed that this will take some time and the server will reboot by itself. A crash simulation has been done.

9. You will noticed now after the reboot that a vmcore file has been created in the /var/crash directory.

Screenshot from 2015-10-30 00:15:18

10. The size of the vmcore depends on the consequence of the crash. In this simulation its just 19M. It also depend on the kernel activity during the time of the crash.

Tips:

  • You can also specify crashkernel = auto on a 64 bit machine. However, you can calculate it as follows:
  • If your RAM is greater than 0 GB  and less than 2 GB use 128 MB
  • If your RAM is greater than 2 GB and less than 6 GB use 256 MB
  • If your RAM is greater than 6 GB and less than 8 GB use 512 MB and so on
  • You can also test with less than 128 MB, it may work but the reliability and consistency is cautioned
  • If the kdump service does not start after a fresh installation, you might need to reboot your machine.
  • Since you have allocate a portion of the memory to the kdump, you might need to reboot your machine again and test it with a free -m

Repair your Kernel Panic with Dracut

If you have encounter a Kernel Panic which usually happens after a major change in the Linux System, you can follow these procedures to rebuild the Kernel files with Dracut tools.

  1. Boot the server on rescue mode or simply through a live CD or ISO.
  2. To boot the server on rescue mode login on the Vsphere Interface and look for a live CD. In case of Kernel Panic on your own machine, you can boot your machine with a live CD.
  3. Once booted, create a directory in the folder /mnt
    mkdir /mnt/sysimage
  4. Use fdsik –l to find where is the /boot. However, you can also create another directory in mnt to mount different partitions. [sysimage is just a name given]
  5. Mount the disk into sysimage with the aim to mount the boot file. In my case the sda1 is the boot partition
    mount /dev/sda2 /mnt/sysimage
    
    mount/dev/sda1 /mnt/sysimage/boot
  6. Once the disks are mounted mount the proc/dev/ and sys folders. Here are the commands:
    mount - -bind /proc /mnt/sysimage/proc
    
    mount - -bind /dev /mnt/sysimage/dev
    
    mount - -bind/sys /mnt/sysimage/sys
  7. After the mount operations have been carried out, you need to access the directory by chrooting into it.
    chroot /mnt/sysimage
  8. Get into the directory sysimage 
  9. You can back up the /boot to another location and use the command Dracut to regenerate anew the file initramfs. An example is as follows: 
    dracut -f /boot/initramfs-2.6.32-358.el6.x86_64.img 2.6.32-358.el6.x86_64
  10. You can umount all partitions and /or simply reboot the machine.
 

 

Tips:

  • On Vcenter, you may need to boot go through the BIOS interface first before being able to boot through the ISO and force the BIOS screen to appear on your screen.
  • You may also use the Finnix ISO which is usually compatible with all UNIX system.
  • When firing the dracut command make sure you only paste the kernel version with the architecture. Do not use the file .img extension, otherwise it won’t work – Step9
  • The last part ‘2.6.32-358.el6.x86_64’ is just the same version which needs to be regenerated. -Step9
  • To know which kernel version your machine is actually using, you need to get into the grub folder and look for the grub.conf. The first option is usually the kernel used by default.
  • Sometimes, you need to try with the same version of the OS, it may happen after you have boot your machine with a live CD, the ISO which you have used do not detect your disk or the data store. You may for example think the disk is not good or there is a problem in the SAN.
  • However, without doing a root cause analysis, you cannot be certain if by repairing the initrd the Kernel Panic might be the unique solution. There are circumstances where a mounted NFS is not same version with the actual machine which can result in Kernel Panic. The Dracut solution is not a definite solution.
  • Always investigate on the Dmesg log if possible or the crash dump if same has been set up.