Tag: Linux Performance

An agentless servers inventory with Ansible & Ansible-CMDB

Building from scratch an agentless inventory system for Linux servers is a very time-consuming task. To have precise information about your server’s inventory, Ansible comes to be very handy, especially if you are restricted to install an agent on the servers. However, there are some pieces of information that the Ansible’s inventory mechanism cannot retrieve from the default inventory. In this case, a Playbook needs to be created to retrieve those pieces of information. Examples are VMware tool and other application versions which you might want to include in your inventory system. Since Ansible makes it easy to create JSON files, this can be easily manipulated for other interesting tasks, say an HTML static page. I would recommend Ansible-CMDB which is very handy for such conversion. The Ansible-CMDB allows you to create a pure HTML file based on the JSON file that was generated by Ansible. Ansible-CMDB is another amazing tool created by Ferry Boender.


Photo credits: Ansible.com
Photo credits: Ansible.com

Let’s have a look how the agentless servers inventory with Ansible and Ansible-CMDB works. It’s important to understand the prerequisites needed before installing Ansible. There are other articles which I published on Ansible:

Ansible Basics and Pre-requisites

1. In this article, you will get an overview of what Ansible inventory is capable of. Start by gathering the information that you will need for your inventory system. The goal is to make a plan first.

2. As explained in the article Getting started with Ansible deployment, you have to define a group and record the name of your servers(which can be resolved through the host file or DNS server) or IP’s. Let’s assume that the name of the group is “test“.

3. Launch the following command to see a JSON output which will describe the inventory of the machine. As you may notice that Ansible had fetched all the data.


Ansible -m setup test

4. You can also append the output to a specific directory for future use with Ansible-cmdb. I would advise creating a specific directory (I created /home/Ansible-Workdesk) to prevent confusion where the file is appended.

Ansible-m setup --tree out/ test

5. At this point, you will have several files created in a tree format, i.e; specific file with the name of the server containing JSON information about the servers inventory.

Getting Hands-on with Ansible-cmdb

6. Now, you will have to install Ansible-cmdb which is pretty fast and easy. Do make sure that you follow all the requirements before installation:

git clone https://github.com/fboender/ansible-cmdb
cd ansible-cmdb && make install

7. To convert the JSON files into HTML, use the following command:

ansible-cmdb -t html_fancy_split out/

8. You should notice a directory called “cmdb” which contain some HTML files. Open the index.html and view your server inventory system.

Tweaking the default template

9. As mentioned previously, there is some information which is not available by default on the index.html template. You can tweak the /usr/local/lib/ansible-cmdb/ansiblecmdb/data/tpl/html_fancy_defs.html page and add more content, for example, ‘uptime‘ of the servers. To make the “Uptime” column visible, add the following line in the “Column definitions” section:


{"title": "Uptime",        "id": "uptime",        "func": col_uptime,         "sType": "string", "visible": True},

Also, add the following lines in the “Column functions” section :

<%def name="col_uptime(host, **kwargs)">
${jsonxs(host, 'ansible_facts.uptime', default='')}
</%def>

Whatever comes after the dot just after ansible_fact.<xxx> is the parent value in the JSON file. Repeat step 7. Here is how the end result looks like.

Photo credits: Ferry Boender
Photo credits: Ferry Boender

Getting beyond Ansible-cmdb

Now, imagine that you want to include a specific application version (Example VMware tool version ) in the HTML inventory file. As I mentioned in part 4, I created the directory /home/Ansible-Workdesk. This where the “out” and “cmdb” directories have been created.

10. Create another directory called /home/Ansible-Workdesk/other_info/vmwaretool. I use this directory to deposit another JSON file for the VMware tool version after launching a playbook. Here is an extract from my InventoryUsingAnsibleCMDB.yml Playbook.

- setup:
  register: setup_res

- command: vmware-toolbox-cmd -v
  register: vmwareversion

- set_fact:
  vmwareversion: '{ "vmwareversion": {{ vmwareversion.stdout_lines }} }'

You can view the whole Ansible Playbook here on my Github.

11. Once the playbook has been executed, you will have identical files name in /home/Ansible-Workdesk/out and /home/Ansible-Workdesk/out/other_info/vmwaretool.

12. However, the content will be different. The one in the “out” directory will contain JSON files about the default Ansible inventory, whilst, the one in the “vmwaretool” directory will contain a JSON file about the VMware tool version having its parent as “vmwareversion“. I change the parent from “stdout_lines” to “vmwareversion” using the set_fact module in Ansible.

13. By now, you are ready to tweak the html_fancy_defs.html again as described in part 9. Both the Column definitions and Column functions need to be appended. Here is the line to be added in the Column definitions section:

{“title”: “VMware Tool”,        “id”: “vmwareversion”,        “func”: col_vmwareversion,         “sType”: “string”, “visible”: True},

And that of the Column functions section:

<%def name=“col_vmwareversion(host, **kwargs)”>
${jsonxs(host, ‘vmwareversion’, default=”)}
</%def>

14. Repeat steps at part 7 with the “vmwaretool” directory.


ansible-cmdb -t html_fancy_split out/ out/other_info/vmwaretool/

In case, you are able to create an Ansible Playbook to create valid JSON files by merging those in the vmwaretool directory to that of the out directory, please comment below. I would like to hear more about it.

Tips:

  • More Playbooks can be found on my Ansible-Playbooks Github repository.
  • With regards to part 3, if direct root access has been disabled on the destination servers, you can use -u <username> which will permit you to connect on the server.
  • The ansible-cmdb command also allows you to generate CSV file.
  • Part 10 lays emphasis on a separate JSON file. If you have been able to merge both outputs on the same JSON file that has been created by ansible default inventory please comment below.
  • The group in the ansible host file can also be added to the server inventory html file. Please see the ansible-cmdb doc for more information.

Debugging disk issues with blktrace, blkparse, btrace and btt in Linux environment

blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information. This man page describes blktrace, which records the i/o event trace information for a specific block device to a file.

 

Photo credits : www.brendangregg.com
Photo credits : www.brendangregg.com

Limitations and Solutions with blktrace

There are several limitations and solutions when using blktrace. We will focus mainly on its goal and how the blktrace tool can help us in our day to day task. Assuming that you want to debug any IO related issue, the first command will be probably an iostat. These utilities can be installed through a yum install sysstat blktrace. For example:

iostat -tkx -p 1 2

The limitation here with iostat is that it does not gave us which process is utilising which IO.  To resolve this issue, we can use another utility such as iotop. Here is an example of the iotop output.

blktrace, iotop, blkparse, btt and btrace

Here iotop shows us exactly which process is consuming ‘which’ and ‘how’ much IO. But another problem with that solution is that it does not give detailed information. So, blktrace comes handy as it gives layer-wise information. How it does that? It sees what is going on exactly inside block I/O layer. When used correctly, it’s possible to generate events for all I/O request and monitor it from where it is evolving. Though it extracts data from the kernel, it is not an analysis tool and the interpretation of the data need to be carried out by you. However, you can feed the data to btt or blkparse to get the analysis done.

Before looking at blktrace, let’s check out the I/O architecture. Basically, at the userspace the user will write and read data. This is what the User Process at the user space. The user do not write directly to the disk. They first write to the VFS Page Cache and from which there are various I/O Scheduler and the Device Driver will interact with the disk to write the data.

Photo credits: msreekan.com
Photo credits: msreekan.com

The blktrace will normally capture events during the process. Here is a cheat sheet to understand blktrace event capture.

photo credits: programering.com
photo credits: programering.com

blkparse will parse and format events acquired from blktrace. If you do not want to run blkparse, btrace is a shortcut to generate data out of blktrace and blkparse. Finally we have btt which will analyze data from blktrace and generate time deltas for each layer.

Another tool to grasp before moving on with blktrace is debugfs which is a simple-to-use RAM-based file system specially designed for debugging purposes. It exists as a simple way for kernel developers to make information available to user space. Unlike /proc, which is only meant for information about a process, or sysfs, which has strict one-value-per-file rules, debugfs has no rules at all. Developers can put any information they want there.lwn.

So the first thing to do is to mount the debugfs file system using the following command:

mount -t debugfs debugfs /sys/kernel/debug

The aim is to allow a kernel developer to make information available in user space. The output of the command below describe how to mount and verify same. You can use the command mount to test if same has been successful. Now that you have the debug file system, you can capture the events.

Diving into the commands

1.So you can use blktrace to trace out the I/O on the machine.

blktrace -d /dev/sda -o -|blkparse -i -

2. At the same time, on another console launch the following command to generate some I/O for testing purpose.

dd if=/dev/zero of=/mnt/test1 bs=1M count=1

From the blktrace console you will get an output which will end up as follows :

CPU0 (8,0):
 Reads Queued:           2,       60KiB Writes Queued:       5,132,   20,524KiB
 Read Dispatches:        2,       60KiB Write Dispatches:       61,   20,524KiB
 Reads Requeued:         0 Writes Requeued:         0
 Reads Completed:        2,       60KiB Writes Completed:       63,   20,524KiB
 Read Merges:            0,        0KiB Write Merges:        5,070,   20,280KiB
 Read depth:             1         Write depth:             7
 IO unplugs:            14         Timer unplugs:           9
Throughput (R/W): 8KiB/s / 2,754KiB/s
Events (8,0): 21,234 entries
Skips: 166 forward (1,940,721 -  98.9%)

3. Same result can also be achieved using the btrace command. Apply the same principle as in part 2 once the command has been launched.

btrace /dev/sda

4. In part 1, 2 and 4 the blktrace commands were launched in such a way that it will run forever – without exiting. In this particular example,  I will output the file name for analysis. Assume that you want to run the blktrace for 30 seconds, the command will be as follows:

blktrace -w 30 -d /dev/sda -o io-debugging

5. On another console, launch the following command:

dd if=/dev/sda of=/dev/null bs=1M count=10 iflag=direct

6. Wait for 30 seconds to allow step 4 to be completed. I got the following results just after:

[[email protected] mnt]# blktrace -w 30 -d /dev/sda -o io-debugging
=== sda ===
  CPU  0:                  510 events,       24 KiB data
  Total:                   510 events (dropped 0),       24 KiB data

7. You will notice at the directory /mnt  the file will be created. To read it use the command blkparse.

blkparse io-debugging.blktrace.0 | less

8. Now let’s see a simple extract from the blkparse command:

8,0    0        1     0.000000000  5686  Q   R 0 + 1024 [dd]
8,0    0        0     0.000028926     0  m   N cfq5686S / alloced
8,0    0        2     0.000029869  5686  G   R 0 + 1024 [dd]
8,0    0        3     0.000034500  5686  P   N [dd]
8,0    0        4     0.000036509  5686  I   R 0 + 1024 [dd]
8,0    0        0     0.000038209     0  m   N cfq5686S / insert_request
8,0    0        0     0.000039472     0  m   N cfq5686S / add_to_rr

The first column shows the device major,minor tuple, the second column gives information about the CPU and it goes on with the sequence, the timestamps, PID of the process issuing the IO process. The 6th column shows the event type, e.g. ‘Q’ means IO handled by request queue code. Please refer to above diagram for more info. The 7th column is R for Read, W for Write, D for block, B for Barrier operation and finally the last one is the block number and a following + number is the number of blocks requested. The final field between the [ ] brackets is the process name of the process issuing the request. In our case, I am running the command dd.

9.The output can be also analyzed using btt command. You will get almost the same information.

btt -i io-debugging.blktrace.0

Some interesting information here is D2C means the amount of time the IO has been spending into the device whilst Q2C means the total time take as there might be different IO concurrent.

A graphical user interface to makes life easier

The Seekwatcher program generates graphs from blktrace runs to help visualize IO patterns and performance. It can plot multiple blktrace runs together, making it easy to compare the differences between different benchmark runs. You should install the seekwatcher package if you need to visualize detailed information about IO patterns.

The command to be used to generate a picture to for analysis is as follows where seek.png is the output of the png name given.

seekwatcher -t io-debugging.blktrace.0 -o seek.png

What is also interesting is that you can create a movie-like with seekwatch to view the graph.

seekwatcher -t io-debugging.blktrace.0 -o seekmoving.mpg --movie

Tips:

  • For the debugfs, you can also edit the /etc/fstab file to make the mount point permanently. 

  • By default, blktrace will capture all events. This can be limited with the argument -a.
  • In case you want to capture persistently for a long time or for a certain amount of time, then use the argument -w.
  • blktrace will also stored the extracted data in local directory with a format device.blktrace.cpu, for example sda1.blktrace.cpu.
  • At step 1 and 2, you will need to fire a CTRL +C to stop the blktrace.
  • As seen in part 2, you have created test test1 file, do delete it same may consume disk space on your machine.

  • On part 5, the size of the file created in the  /mnt will not exceeds more that 1M since same has been specified in the command.
  • At part 9, you will noticed several other information which can be helpful such as D2C and  Q2C.
    • Q2D latency – time from request submission to Device.
    • D2C latency – Device latency for processing request.
    • Q2C latency – total latency , Q2D + D2C = Q2C.
  • To be able to generate the movie, you will have to install mencoder with all its dependencies.

Auditing Linux Operating System with Lynis

Auditing a Linux System is one of the most important aspect when it comes to security. After deploying a simple Centos 7 Linux machine on virtual box, I made an audit using Lynis. It is amazing how many tiny flaws can be seen right from the beginning of a fresh installation. Lynis Enterprise performs security scanning for Linux, macOS, and Unix systems. It helps you discover and solve issues quickly, so you can focus on your business and projects again.Cisofy.

Credits: cisofy.com
Credits: cisofy.com

Introduction

The Lynis tool performs both security and compliance auditing. It has a free and paid version which comes very handy especially if you are on a business environment. The installation of the Lynis tool is pretty simple. You can install it through the Linux repository itself, download the tar file or clone it directly from Github.

 

Scanning Performed by Lynis

1. I downloaded the tar file with the following command:

wget https://cisofy.com/files/lynis-2.6.0.tar.gz

2. Then, just untar the file and get into it

tar -xzf lynis-2.6.0.tar.gz && cd lynis

3. Once into the untar directory, launch the following command:

./lynis audit system --quick

 
As you can see from the output above, there are several suggestions at the end of the scan. In case the paid version of the application was used, more information and commands as how to remediate the situation would be given including support from Lynis. As regards to the free version, you can also debug by yourself several security aspects from the suggestions.
 
Suggestions, Compliance and Improvement.

 
1.The first two suggestions were about minimum and maximum password age.

Configure minimum password age in /etc/login.defs [AUTH-9286]

Configure maximum password age in /etc/login.defs [AUTH-9286]

To check the minimum and maximum password age, use the chage command :
chage -l
 

2. Use chage -m root to set the minimum password age and chage -M root to set maximum password age:

Also, you will have to set the parameter in the /etc/login.defs file

3. Delete accounts which are no longer used [AUTH-9288]

It is also suggested to delete accounts which are no longer in use. This suggestion was prompted as I created a user  “nitin” account during installation and did not use it yet. For the purpose of this blog, I deleted it using userdel -r nitin

4. Default umask in /etc/profile or /etc/profile.d/custom.sh could be more strict (e.g. 027) [AUTH-9328]

Default umask values are taken from the information provided in the /etc/login.defs file for RHEL (Red Hat) based distros. Debian and Ubuntu Linux based system use /etc/deluser.conf. To change default umask value to 027 which is actually 022 by default, you will need to modify the /etc/profile script as follows:

5. To decrease the impact of a full /home file system, place /home on a separated partition [FILE-6310]

  To decrease the impact of a full /tmp file system, place /tmp on a separated partition [FILE-6310]

  To decrease the impact of a full /var file system, place /var on a separated partition [FILE-6310]

In the article Move your /home to another partition, you will have detailed explanations to sort out this issue.

6. Disable drivers like USB storage when not used, to prevent unauthorized storage or data theft [STRG-1840]

   Disable drivers like firewire storage when not used, to prevent unauthorized storage or data theft [STRG-1846]

To disable USB and firewire storage drivers, add the following lines in /etc/modprobe.d/blacklist.conf then do a modprobe usb-storage && modprobe firewire-core

blacklist firewire-core
blacklist usb-storage

7. Split resolving between localhost and the hostname of the system [NAME-4406]

This issue is only about hostname and localhost in /etc/hosts which could confuse some applications installed on the machine. According to cisofy, for proper resolving, the entries of localhost and the local defined hostname, could be split. Using some middleware and some applications, resolving of the hostname to localhost, might confuse the software.

8. Install package ‘yum-utils’ for better consistency checking of the package database [PKGS-7384]

      Consider running ARP monitoring software (arpwatch,arpon) [NETW-3032]

The yum-utils and arpwatch are nice tools to perform more debugging and verification. Install it using the following commands:

yum install yum-utils arpwatch -y

9. You are advised to hide the mail_name (option: smtpd_banner) from your postfix configuration. Use postconf -e or change your main.cf file (/etc/postfix/main.cf) [MAIL-8818]

You just have to uncomment the following line and lauch a postconf -e. However, since this is a fresh install, and I’m not using postfix, it is better to stop the service.

 10.  Check iptables rules to see which rules are currently not used [FIRE-4513]

Since, I’m not on a production environment, it is very difficult to identify unused iptables rules right now. Once on the production environment, this situation is different. According to Cisofy, the best way is to “use iptables –list –numeric –verbose to display all rules. Check for rules which didn’t get a hit and repeat this process several times (e.g. in a few weeks). Finally remove any unneeded rules.”

 11. Consider hardening SSH configuration [SSH-7408]

  •     – Details  : AllowTcpForwarding (YES –> NO)
  •     – Details  : ClientAliveCountMax (3 –> 2)
  •     – Details  : Compression (YES –> (DELAYED|NO))
  •     – Details  : LogLevel (INFO –> VERBOSE)
  •     – Details  : MaxAuthTries (6 –> 2)
  •     – Details  : MaxSessions (10 –> 2)
  •     – Details  : PermitRootLogin (YES –> NO)
  •     – Details  : Port (22 –> )
  •     – Details  : TCPKeepAlive (YES –> NO)
  •     – Details  : UseDNS (YES –> NO)
  •     – Details  : X11Forwarding (YES –> NO)
  •     – Details  : AllowAgentForwarding (YES –> NO)

Again, hardening SSH is one of the most important to evade attacks especially from SSH bots. It all depends how your network infrastructure is configured and whether it is accessible from the internet or not. However, these details viewed are very informative.

12. Periodic system scan, malware and ransomware scanners are now a must. According to statistics, servers are being hacked constantly. Pervasive Monitoring is becoming a heavy cash deal for malicious softwares. 

The Lynis Command

Lynis documentation is pretty straight forward with a cheat sheet. The arguments are self explicit. Here are some hints.

1.Performs a system audit which is the most common audit.

lynis audit system

2. Provides command to do a remote scan.

lynis audit system remote <host>

3. Views the settings of default profile.

lynis show settings

4. Checks if you are using most recent version of Lynis

lynis update info

5. More information about a specific test-id

lynis show details <test-id>

6. To scan whole system

lynix --check-all Q

7. To see all available parameters of Lynis

lynis show options

At the end of any Lynis command, it will also prompt you where the logs have been stored for your future references. It is usually in /var/log/lynis.log. The systutorial on lynis is also a good start to grasp the command. All common systems based on Unix/Linux are supported. Examples include Linux, AIX, *BSD, HP-UX, macOS and Solaris. For package management, the following tools are supported:- dpkg/apt, pacman, pkg_info, RPM, YUM, zypper.

Out of Memory (OOM) in Linux machines

Since some months I have not been posting anything on my blog. I should admit that I was really busy. Recently, a friend asked me about the Out of Memory messages in Linux. How is it generated? What are the consequences? How can it be avoided in Linux machines? There is no specific answer to this as an investigation had to be carried out to have the Root Cause Analysis. Before getting into details about OOM, let’s be clear that whenever the Kernel is starved of memory, it will start killing processes. Many times, Linux administrators will experience this and one of the fastest way to get rid of it is by adding extra swap. However, this is not the definite way of solving the issue. A preliminary assessment needs to be carried out followed by an action plan, alongside, a rollback methodology.

If after killing some processes, the kernel cannot free up some memory, it might lead to a kernel panic, deadlocks in applications, kernel hungs, or several defunct processes in the machine. I know cases where the machine change run level mode. There are cases of kernel panic in virtual machines where the cluster is not healthy. In brief, OOM is a subsystem to kill one or more processes with the aim to free memory. In the article Linux kernel crash simulation using kdump, I gave an explanation how to activate Kdump to generate a vmcore for analysis. However, to send the debug messages during an out of memory error to the vmcore, the SYSCTL file need to be configured. I will be using a CentOS 7 machine to illustrate the OOM parameters and configurations.

1.To activate OOM debug in the vmcore file, set the parameter vm.panic_on_oom to 1 using the following command:

systctl -w vm.panic_on_oom=1

To verify if the configuration has been taken into consideration, you can do a sysctl -a | grep -i oom. It is not recommended to test such parameters in the production environment.

2. To find out which process the kernel is going to kill, the kernel will read a function in the kernel code called badness() . The badness() calculate a numeric value about how bad this task has been. To be precise, it works by accumulating some “points” for each process running on the machine and will return those processes to a function called select_bad_process() in the linux kernel. This will eventually start the OOM mechanism which will kill the processes. The “points” are stored in the /proc/<pid>/oom_score. For example, here, i have a server running JAVA.

As you can see, the process number is 2153. The oom_score is 21

3. There are lots of considerations that are taken into account when calculating the badness score. Some of the factors are the Virtual Memory size (VM size), the Priority of the Process (NICE value), the Total Runtime, the Running user and the /proc/<pid>/oom_adj. You can also set up the oom_score_adj value for any PID between -1000 to 1000. The lowest possible value, -1000, is equivalent to disabling OOM killing entirely for that task since it will always report a badness score of 0.

4. Let’s assume that you want to prevent a specific process from being killed.

echo -17 > /proc/$PID/oom_adj

5. If you know the process name of SSH Daemon and do not it from being killed, use the following command:

pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

6. To automate the sshd from being killed through a cron which will run each minute use the following:

* * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
7. Let's now simulate the OOM killer messages. Use the following command to start an out of memory event 
on the machine.
echo f > /proc/sysrq-trigger 

You will notice an OOM error message in the /var/log/messages.
As you can notice here, the PID 843 was calculated by the OOM killer before killing it. 
There is also the score number which is 4 in our case.


Before the 'Out of memory' error, there will be a call trace which will be sent by the kernel.



8. To monitor how the OOM killer is generating scores, you can use the dstat command. To install the dstat 
package on RPM based machine use: 
yum install dstat 

or for debian based distribution use:
apt-get install dstat

Dstat is used to generate resource statistics. To use dstat to monitor the score from OOM killer use:
dstat -top-oom


TIPS:

  • oom_score_adj is used in new linux kernel. The deprecated function is oom_adj in old Linux machine.
  • When disabling OOM killer under heavy memory pressure, it may cause the system to kernel panic.
  • Making a process immune is not a definite way of solving problem, for example, when using JAVA Application. Use a thread/heap dump to analyse the situation before making a process immune.
  • Dstat is now becoming an alternative for vmstat, netstat, iostat, ifstat and mpstat. For example, to monitor CPU in a program, use dstat -c –top-cpu -dn –top-mem
  • Testing in production environment should be avoided!

SAR command daily tips and tricks

As promised on Twitter days back, I would post some interesting tips and tricks using the SAR (System Activity Report) linux command. The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. The accounting system, based on the values in the count and interval parameters, writes information the specified number of times spaced at the specified intervals in seconds. If the interval parameter is set to zero, the sar command displays the average statistics for the time since the system was started.die.net 

Understanding SAR and its main configuration files

The SAR command is part of the sysstat package which is a multi-purpose analysis tool and it is useful to pin point specific issue related to CPU, Memory, I/O and Network. The command is really useful especially to plot the output on a graph for visual analysis and reporting. One example of such tool is GNUplot. To install GNUplot and SAR use the command yum install sysstat gnuplot. The configuration file of SAR is located at /etc/cron.d/sysstat and /etc/sysconfig/sysstat . If you would perform a rpm -ql sysstat | less , you would noticed that there are other binaries such as iostat,mpstat,pidstat etc.. that comes along with the package sysstat.

Difference between SA and SAR logs

In the directory /etc/cron.d/sysstat you would noticed that there is a cron which have been set up by default to run every ten minutes. The purpose is to write a log in the directory /var/log/sa . In this directory there are two type of files starting with sa and sar. SAR is the text file while SA is a binary. The sa file – Binary is updated every 10 minutes whereas the sar file is written at the end of the day. This parameter have been configured in the cron itself. By using the file command you would know which one is a binary or a text file.

To open the sa file, you need to use the command sar -f . Here is an example:

The /etc/sysconfig/sysstat file allows you to configure how long you want to keep the log, compression etc..

Some ways to use the SAR command

You can also visualize sar logs live using the sar command with the start and ending second. Let’s say you have run a command on the background or simply want to track the resource status for some seconds, you can use the command sar 1 3 Here is an example with the command

sar 1 3

Checking the load average, we apply the same principle but with the following command. The load average will also include load on each processor.

sar -q 1 3

To check for memory being consumed per seconds, use the following command

sar -r 1 3

To check number of memory coming in and out of the swap space, use the following command

sar -W 1 3

For the Disk I/O read write per seconds use the following command. Read/Write on disk also depend on the hardware

sar -b 1 3

For info about the CPU use the following

sar -u 1 3

To monitor the network activity in terms of packets in and out per interface received and compressed, use the following command

sar -n DEV 1 3

If the sar file has not been generated yet from the binary, you can use the following command, let’s say to convert it to a text in the /tmp directory.

sar -A -f  sa25 > /tmp/sar25

KSAR Graph with SAR logs

Now, in production environment, you need to to analyze for example at a specific time where memory or CPU was high. This can be done by means of a graph. I use the Ksar program. Ksar is a BSD-Licensed JAVA based application which create graph based on sar logs. You will need to install JAVA Runtime and launch the run.sh script to install the Ksar program. Once downloaded, just click on ‘Data’ and ‘Load from text file’. This is an example of swap usage

TIPS:

  • The SAR output by default is in 12 hour clock format. To make it become a 24 hour clock format edit the .bashrc file and insert the parameter alias sar=”LANG=C sar
  • GNUPlot is one another application to plot your information on a graph for better analysis.