Repair your Kernel Panic with Dracut

If you have encounter a Kernel Panic which usually happens after a major change in the Linux System, you can follow these procedures to rebuild the Kernel files with Dracut tools.

  1. Boot the server on rescue mode or simply through a live CD or ISO.
  2. To boot the server on rescue mode login on the Vsphere Interface and look for a live CD. In case of Kernel Panic on your own machine, you can boot your machine with a live CD.
  3. Once booted, create a directory in the folder /mnt
    mkdir /mnt/sysimage
  4. Use fdsik –l to find where is the /boot. However, you can also create another directory in mnt to mount different partitions. [sysimage is just a name given]
  5. Mount the disk into sysimage with the aim to mount the boot file. In my case the sda1 is the boot partition
    mount /dev/sda2 /mnt/sysimage
    
    mount/dev/sda1 /mnt/sysimage/boot
  6. Once the disks are mounted mount the proc/dev/ and sys folders. Here are the commands:
    mount - -bind /proc /mnt/sysimage/proc
    
    mount - -bind /dev /mnt/sysimage/dev
    
    mount - -bind/sys /mnt/sysimage/sys
  7. After the mount operations have been carried out, you need to access the directory by chrooting into it.
    chroot /mnt/sysimage
  8. Get into the directory sysimage 
  9. You can back up the /boot to another location and use the command Dracut to regenerate anew the file initramfs. An example is as follows: 
    dracut -f /boot/initramfs-2.6.32-358.el6.x86_64.img 2.6.32-358.el6.x86_64
  10. You can umount all partitions and /or simply reboot the machine.
 

 

Tips:

  • On Vcenter, you may need to boot go through the BIOS interface first before being able to boot through the ISO and force the BIOS screen to appear on your screen.
  • You may also use the Finnix ISO which is usually compatible with all UNIX system.
  • When firing the dracut command make sure you only paste the kernel version with the architecture. Do not use the file .img extension, otherwise it won’t work – Step9
  • The last part ‘2.6.32-358.el6.x86_64’ is just the same version which needs to be regenerated. -Step9
  • To know which kernel version your machine is actually using, you need to get into the grub folder and look for the grub.conf. The first option is usually the kernel used by default.
  • Sometimes, you need to try with the same version of the OS, it may happen after you have boot your machine with a live CD, the ISO which you have used do not detect your disk or the data store. You may for example think the disk is not good or there is a problem in the SAN.
  • However, without doing a root cause analysis, you cannot be certain if by repairing the initrd the Kernel Panic might be the unique solution. There are circumstances where a mounted NFS is not same version with the actual machine which can result in Kernel Panic. The Dracut solution is not a definite solution.
  • Always investigate on the Dmesg log if possible or the crash dump if same has been set up.

Managing LVM with PVMOVE – Part 2

After a little introduction about LVM from the article Managing LVM with PVMOVE – Part 1, its time to get into the details of the pvmove command. Based on the scenario and constraints described in part 1, that i will elaborate on the pvmove operation here.

Before handling any operation, do your precheck tasks such as verification of the state of the server, URLs, services and application running,  the port they are listening etc.. This is done to be able to handle complicated tasks both at the system and application level. However, in respect of the pvmove operation, i would recommend you to fire the vgs, pvs and lvs commands as well as a fdisk -l to check for the physical disk state. Do a df -h and if possible ; a lsblk to list the all blocks for future references. On CentOS / RedHat you need to install the package util-linux to be able to use lsblk which is very interesting.

 

Screenshot from 2015-09-30 21:07:28

Lets say we have a disk of 100G [lets called it sdb] and we have to replace it by another disk of 20G [sdc]. We assume that there is 10G of data being use out of the 100G hard disk which looks reasonable and possible to be handle by the 20G hard disk that we have planned to shift to. On our Sdb we have 2 LV lets call it lvsql that is being use by MySQL having the database there and lvhome handling the rest of the server applications. See illustration from diagram on the right. Both LVs are found in the same VG called VGNAME.

So you might ask yourself how come if you perform a df -h on your machine you find that lvsql total size is 15G and that of lvhome is 80G making a total of 95G when the hard disk is a 100G [As described on the diagram above]. Is there 5G missing ? The answer is no. When you fire the command pvdisplay, you will notice that the “PE Size” had consumed 5 GB . For example on this screenshot on the left the PE Size is 4MB.

Usually, the 5Gb missing will be allocated there which means that the PE Size is used for a specific purpose for example in our situation a back up of the MySQL database or other process. If the missing size is not found it means that its still in the VG and has not been allocated to the LV. Its important to allocate some additional space there. So before start do a pvdisplay, lvdisplay and vgdisplay as well which are important. We now have our sde hard disk as described in this picture below.Screenshot from 2015-09-30 21:05:13

How to start? Its up to you to decide which lv you want to allocate more space as you have control over the vg. You can also add another physical disk and extend the vgname and resize the lvsql since a database usually grow in size.

 

 

 

Do your pre-check tasks as described.

  1. Stop all applications running such as MySQL and Atop in our case. You can use the lsof command to check if there are still processes and application writing on the disk.
  2. Once all applications have been stopped you can type the mount command to check the presence of the partitions as well as you have to confirm that there is no application writing on the disk – use lsof or fuser.
  3. Comment the two lines related to the 2 vg partitions in the /etc/fstab file. Usually it would be the lines /dev/vgname/lvsql and /dev/vgname/lvhome. You can now umount the disk. You would notice that it would not be possible to umount the partitions if ever an application is writing on it. Another issue is that if you have ssh running on the machine, you need to stop it! Then, how do you ssh on the machine? Use the console from vsphere or if its a physical machine boot it with a live cd.
  4. Next step is to do a lvreduce of the lv if possible according to the size. In our case 5GB being used out of 80. Do a lvreduce -L 7GB –resizefs /dev/vgname/lvhome. This is done because when you will move the pv it will be more fast.  The bigger the lv the more time it takes.
  5. Once all lv size has been reduced to the maximum, add the disk sdc. Make sure it gets detected if you are on vmware. Use the fdisk command to check and compare from your precheck test.
  6. Now create from the sdc a pv The command is pvcreate /dev/sdc.This means that you created a pv from the disk you have added.
  7. After the pv has been created extend the vg called vgname from the old disk (sdb) by using the disk sdc which you just added to the same old vg called vgname. Command is vgextend vgname /dev/sdc
  8. Now the magic starts, fire a pvmove /dev/sdb /dev/sdc – This means that you are moving  the pv allocated to the vgname belonging to the PEs of hard disk sdb into sdc.
  9. When the pvmove is completed, you just need to do a vgreduce vgname /dev/sdb . When you launch the vgreduce it will throw out the old disk out from the VG as you have reduce it completely. You can now remove the old disk.
  10. Since you have reduce the lvhome you will noticed that lvhome is at 7GB instead of 10G as we have reduce size in step 4 to accelerate the pvmove process. The step is to lvresize -l +100%FREE /dev/vgname/lvhome. You will noticed that the PE Size is intact as we had not resize the lvsql.
  11. You can now do a /sbin/resize2fs /dev/vgname/lvhome. Uncomment back the lines on fstab, mount -av and restart your applications.

Congrats you just did a pvmove. Comments below so that we can clear any ambiguities.


Managing LVM with PVMOVE – Part 1

One of the challenging issues that i have encountered is the manipulation of LVM – Logical Volume Management on virtual servers. Whilst writing this article, i noticed that i have to split it into parts as it looks bulky in some sort. Once you have understand the logic of the LVM, things get pretty easy to deal with. I will elaborate some details of LVM, and will surely get into some brief real life experience that is important to grasp. Lets take an example of a disk where there are some applications running like MySQL, Apache, some monitoring tools like Atop and Htop which are writing on the disk and we have to shrink that very disk or replace it with another disk. Lets also assume that the server is running on an ESX host and the operation will be carried out through the VMware  VCenter. How do you shrink a disk having its application generating IOs? How do you replace a disk with a smaller one having its data on a LVM?

In brief, here is what i have understood from what is LVM – Logical Volume Management.

We have Physical Volume (PV), Volume Groups (VG) and Logical Volume  (LV). These terms are a must to understand before proceeding with Logical Volume Management operations.

PV- These are hard disks or even Hard disk partitions. These PVs have logical unit numbers. Each PV is or can be composed of chunks called PEs (Physical Extents)

VG – The Volume Group is the highest level  of abstraction used with the LVM. Does this term looks complicated? I would say no. In the field of Software Engineering, there are two terms that are usually used that is modelling and meta-modelling. Just understand it like this if you are not familiar with software engineering.Modelling means one step or one level remove from reality whilst Meta-Modelling means modelling at a higher level of logic. So basically, it looks like some sort of higher modelling happened at the level of the VG. 

LV – The logical volume is a product of the VG. They are disks which are not restricted to the physical disk size and it can be resized  or even move easily without even having your application to be stopped or having your system unmounted. Of course, i need to do more research to get into more deeper explanation of the PV, PE, VG and LV. Lets now see an explanation through my horrible drawing skills.

Screenshot from 2015-09-29 21:04:46From the Diagram we conclude that :

  • PVs looks like hard disks divided into chunks called PEs
  • The VGs are just a high level of abstraction that should be look from the above.
  • VGs are created by combining PVs.

If you have access to a linux machine having LVM configured and some VG have already been created, you can start firing these commands to have an overview of the PV, VG and LV

  • pvs – Report information about physical volume
  • vgs – Information about volume groups
  • lvs – Information about logical volumes

Those physical disks can be picked up from the datastores, where RAID are configured. This act as another layer of security to be able to handle disk failures or lost of data at all cost.

Screenshot from 2015-09-29 21:37:28

On linux if you type vg or lv or pv press tab twice you will have an idea all the commands that exist and possibilities of manipulation. On part2 of this article i will take example of the pvmove command and actions that could be done to minimise impact before carrying out a pvmove operation.

Part2 of the article is on this link

 


Managing and Analysing disk usage on Linux

Disk usage management and analysis on servers can sometimes be extremely annoying even disastrous if proper management and analysis are not carried out regularly. However, you can also brewed some scripts to sort out the mess. In large companies, monitoring systems are usually set up to handle the situation especially where there are several servers with consequent sized partitions allocated for lets say /var/log directory. An inventory tool such as the OCS Inventory tool can also be used to monitor large number of servers.

diskusage

This blog post will be updated as my own notebook to remember the commands used during management of disk usage. You can update me some tricks and tips. I will update the article here 🙂

 

 

 

 

 

Managing disk space with ‘find’ command

1.To find the total size of a particular directory of more than 1000 days

find . -mtime +1000 -exec du -csh {} + | grep total$   

2.Find in the partition / files of more than 50 M and do a ls which is long listed and human readable.

find / -xdev -type f -size +50M -exec ls -lh '{}' ';' 

3.Find in the /tmp partition every file or directory with mtime of more than 1 day and delete same.

find /tmp -name "develop*" -mtime +1 -exec rm -rf {} \; 

4.Count from the file /tmp/uniqDirectory  in the /home directory (uniqDirectory), every directory having the same unique name.

find /home > /tmp/uniqDirectory && for i in $(cat /tmp/uniqDirectory);do echo $i; ls -l /home/test/$i |wc -l;done

5. Find from /tmp all files having the extension .sh or .jar and calculate the total size.

find . -type f \( -iname "*.sh" -or -iname "*.jar" \) -exec du -csh {} + | grep total$

6. Find all files in /tmp by checking which files are not being used by any process and delete same if not being used.

find /tmp -type f | while read files ; do fuser -s $files || rm -rf  $files ; done

Another interesting issue that you might encounter is a sudden increase of the log size which might be caused by an application due to some failure issues Example a sudden increase of binary logs generated by MySQL or a core dump generated!

Lets say we have crash package installed on a server. The crash package will generate a core dump for analysis as to why the application was crashed. This is sometimes annoying as you cannot expect as to when an application is going to fail especially if you have many developers, system administrators working on the same platform. I would suggest a script which would send a mail to a particular user if ever the core dump has been generated. I placed a script here on GitHub to handle such situation.

Naturally, a log rotate is of great help as well as crons to purge a certain temporary logs. The “du” command is helpful but when it comes to choose and pick for a particular reason, you will need to handle the situation with the find command.

Tips:

  • You should be extremely careful when deleting files from find command. Imagine some replication log file is being used by an Oracle Database Server which has been deleted. This would be disastrous.
  • Also make sure that, you see the content of the log as any file can be named as *log or *_log

Deploying WordPress labs on Virtual Box

Building miniature virtual labs on Virtualbox are most of time fascinating especially when you have to troubleshoot between the virtual servers within a network environment, however there are usual bugs that i have to deal with. The difference between NATNETWORK and that of NAT on VirtualBox differs differently to what i have noticed, this can be seen on the official website documentation.

However, i have noticed that in both situation, you are provided with a virtual router within virtualbox. In the case of a NAT network, you are NOT allowed to ping between two VMs on NAT network unless you have established a tunnel whereas in the option of the NATNETWORK, this allows you to choose to dynamically range of IPs through the DHCP functionality on VirtualBox and you are also allowed to ping the outside world as well as other VMs on NATNETWORK.I have noticed that this work only on the new version compared to old ones where the NAT and NATNETWORK works almost the same way. There are still many discrepancies if ‘NatNetwork’ is the real name that should have been set!!

Screenshot from 2015-09-27 00:48:18

I have install Centos [minimum install] on my first lab. Here are the procedures for building the webserver.

  1. yum install httpd wget mysql-server php php-mysql php-gd nmap traceroute w3m vim
  2. wget https://wordpress.com/latest.tar.gz
  3. tar -xzf latest.tar.gz && cp -r wordpress /var/www
  4. chown -R apache:apache /var/www/wordpress
  5. vi /etc/httpd/conf.d/myweb.conf 

create the vhost with the following values

  • <VirtualHost *:80>
  • DocumentRoot /var/www/wordpress
  • ServerName www.myweb.com
  • ServerAlias myweb.com
  • <DIrectory /var/www/wordpress>
  • Options FollowSymlinks
  • Allow from all
  • </Directory>
  • ErrorLog /var/log/httpd/wordpress-error-log
  • CustomLog /var/log/httpd/wordpress-access-log common
  • </VirtualHost>

Time to create the Database

  1. mysql -u root -p  [mysqld service should be started first]
  2. CREATE DATABASE mydb;
  3. CREATE USER [email protected];
  4. SET PASSWORD FOR [email protected]= PASSWORD (“mypassword”);
  5. GRANT ALL PRIVILEGES ON mydb.* TO [email protected] IDENTIFIED BY ‘mypassword’;
  6. FLUSH PRIVILEGES;

Exit MySQL and proceed with the following instructions.

  1. mv /var/www/wordpress/wp-sample-config.php wp-config.php 
  2. Vi wp-config.php and modify username, dbname, password and hostname
  3. vi /etc/hosts and enter myweb.com to run as localhost
  4. Service httpd start // service httpd graceful // service mysqld start 
  5. w3m www.myweb.com register on wordpress. Website up

Setting up the SSL

  1. For ssl activation [https] do this yum install openssl mod_ssl
  2. openssl genrsa -out ca.key 2048 [to generate a signed certificate]
  3. openssl req -new -key ca.key -out ca.csr [to generate the .csr]
  4. openssl x509 -req -days 365 -in ca.csr -signkey ca.key -out ca.crt [generate a self-signed key]
  5. cp ca.crt /etc/pki/tls/certs
  6. cp ca.key /etc/pki/tls/private/ca.key
  7. cp ca.csr /etc/pki/tls/private/ca.csr
  8. vi /etc/httpd/conf.d/myweb.conf and add another vhost with the following values
  • <VirtualHost *:443> 
  • SSLEngine on
  • SSLCertificateFile /etc/pki/tls/certs/ca.crt
  • SSLCertificateKeyFile /etc/pki/tls/private/ca.key
  • DocumentRoot /var/www/wordpress
  • ServerName www.myweb.com
  • ServerAlias myweb.com
  • <DIrectory /var/www/wordpress>
  • Options FollowSymlinks
  • Allow from all
  • </Directory>
  • ErrorLog /var/log/httpd/wordpress-error-log
  • CustomLog /var/log/httpd/wordpress-access-log common
  • </VirtualHost>
  1. Service httpd graceful and website up on https

To make the website accessible on any hosts on same natnetwork, edit /etc/resolv.conf with ipaddress 10.0.2.4 myweb.com

176619

Now that two servers are configured the same way, you can add another server as load Balancing to access the servers behind the load balancer. What is most interesting is that end users (hosts) will know only the load balancing server. I have achieve this by installing Pound on the server use as Load Balancing. This means that end users [hosts] will access the load balancing server which will in turn decides upon master/slave priorities. Pound converts server3 to a reverse proxy load balancing server. The aim is to take http/s request from the hosts and request server 1/2 according to the configuration.

Based on this article a new Bash project is actually being brewed on Github to automate the installation of WordPress, Apache, MySQL and all the application specified. This project should enable anyone to deploy a website through the script.