Tag: Linux Performance

Debugging disk issues with blktrace, blkparse, btrace and btt in Linux environment

blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information. This man page describes blktrace, which records the i/o event trace information for a specific block device to a file.

 

Photo credits : www.brendangregg.com
Photo credits : www.brendangregg.com

Limitations and Solutions with blktrace

There are several limitations and solutions when using blktrace. We will focus mainly on its goal and how the blktrace tool can help us in our day to day task. Assuming that you want to debug any IO related issue, the first command will be probably an iostat. These utilities can be installed through a yum install sysstat blktrace. For example:

iostat -tkx -p 1 2

The limitation here with iostat is that it does not gave us which process is utilising which IO.  To resolve this issue, we can use another utility such as iotop. Here is an example of the iotop output.

blktrace, iotop, blkparse, btt and btrace

Here iotop shows us exactly which process is consuming ‘which’ and ‘how’ much IO. But another problem with that solution is that it does not give detailed information. So, blktrace comes handy as it gives layer-wise information. How it does that? It sees what is going on exactly inside block I/O layer. When used correctly, it’s possible to generate events for all I/O request and monitor it from where it is evolving. Though it extracts data from the kernel, it is not an analysis tool and the interpretation of the data need to be carried out by you. However, you can feed the data to btt or blkparse to get the analysis done.

Before looking at blktrace, let’s check out the I/O architecture. Basically, at the userspace the user will write and read data. This is what the User Process at the user space. The user do not write directly to the disk. They first write to the VFS Page Cache and from which there are various I/O Scheduler and the Device Driver will interact with the disk to write the data.

Photo credits: msreekan.com
Photo credits: msreekan.com

The blktrace will normally capture events during the process. Here is a cheat sheet to understand blktrace event capture.

photo credits: programering.com
photo credits: programering.com

blkparse will parse and format events acquired from blktrace. If you do not want to run blkparse, btrace is a shortcut to generate data out of blktrace and blkparse. Finally we have btt which will analyze data from blktrace and generate time deltas for each layer.

Another tool to grasp before moving on with blktrace is debugfs which is a simple-to-use RAM-based file system specially designed for debugging purposes. It exists as a simple way for kernel developers to make information available to user space. Unlike /proc, which is only meant for information about a process, or sysfs, which has strict one-value-per-file rules, debugfs has no rules at all. Developers can put any information they want there.lwn.

So the first thing to do is to mount the debugfs file system using the following command:

mount -t debugfs debugfs /sys/kernel/debug

The aim is to allow a kernel developer to make information available in user space. The output of the command below describe how to mount and verify same. You can use the command mount to test if same has been successful. Now that you have the debug file system, you can capture the events.

Diving into the commands

1.So you can use blktrace to trace out the I/O on the machine.

blktrace -d /dev/sda -o -|blkparse -i -

2. At the same time, on another console launch the following command to generate some I/O for testing purpose.

dd if=/dev/zero of=/mnt/test1 bs=1M count=1

From the blktrace console you will get an output which will end up as follows :

CPU0 (8,0):
 Reads Queued:           2,       60KiB Writes Queued:       5,132,   20,524KiB
 Read Dispatches:        2,       60KiB Write Dispatches:       61,   20,524KiB
 Reads Requeued:         0 Writes Requeued:         0
 Reads Completed:        2,       60KiB Writes Completed:       63,   20,524KiB
 Read Merges:            0,        0KiB Write Merges:        5,070,   20,280KiB
 Read depth:             1         Write depth:             7
 IO unplugs:            14         Timer unplugs:           9
Throughput (R/W): 8KiB/s / 2,754KiB/s
Events (8,0): 21,234 entries
Skips: 166 forward (1,940,721 -  98.9%)

3. Same result can also be achieved using the btrace command. Apply the same principle as in part 2 once the command has been launched.

btrace /dev/sda

4. In part 1, 2 and 4 the blktrace commands were launched in such a way that it will run forever – without exiting. In this particular example,  I will output the file name for analysis. Assume that you want to run the blktrace for 30 seconds, the command will be as follows:

blktrace -w 30 -d /dev/sda -o io-debugging

5. On another console, launch the following command:

dd if=/dev/sda of=/dev/null bs=1M count=10 iflag=direct

6. Wait for 30 seconds to allow step 4 to be completed. I got the following results just after:

[[email protected] mnt]# blktrace -w 30 -d /dev/sda -o io-debugging
=== sda ===
  CPU  0:                  510 events,       24 KiB data
  Total:                   510 events (dropped 0),       24 KiB data

7. You will notice at the directory /mnt  the file will be created. To read it use the command blkparse.

blkparse io-debugging.blktrace.0 | less

8. Now let’s see a simple extract from the blkparse command:

8,0    0        1     0.000000000  5686  Q   R 0 + 1024 [dd]
8,0    0        0     0.000028926     0  m   N cfq5686S / alloced
8,0    0        2     0.000029869  5686  G   R 0 + 1024 [dd]
8,0    0        3     0.000034500  5686  P   N [dd]
8,0    0        4     0.000036509  5686  I   R 0 + 1024 [dd]
8,0    0        0     0.000038209     0  m   N cfq5686S / insert_request
8,0    0        0     0.000039472     0  m   N cfq5686S / add_to_rr

The first column shows the device major,minor tuple, the second column gives information about the CPU and it goes on with the sequence, the timestamps, PID of the process issuing the IO process. The 6th column shows the event type, e.g. ‘Q’ means IO handled by request queue code. Please refer to above diagram for more info. The 7th column is R for Read, W for Write, D for block, B for Barrier operation and finally the last one is the block number and a following + number is the number of blocks requested. The final field between the [ ] brackets is the process name of the process issuing the request. In our case, I am running the command dd.

9.The output can be also analyzed using btt command. You will get almost the same information.

btt -i io-debugging.blktrace.0

Some interesting information here is D2C means the amount of time the IO has been spending into the device whilst Q2C means the total time take as there might be different IO concurrent.

A graphical user interface to makes life easier

The Seekwatcher program generates graphs from blktrace runs to help visualize IO patterns and performance. It can plot multiple blktrace runs together, making it easy to compare the differences between different benchmark runs. You should install the seekwatcher package if you need to visualize detailed information about IO patterns.

The command to be used to generate a picture to for analysis is as follows where seek.png is the output of the png name given.

seekwatcher -t io-debugging.blktrace.0 -o seek.png

What is also interesting is that you can create a movie-like with seekwatch to view the graph.

seekwatcher -t io-debugging.blktrace.0 -o seekmoving.mpg --movie

Tips:

  • For the debugfs, you can also edit the /etc/fstab file to make the mount point permanently. 

  • By default, blktrace will capture all events. This can be limited with the argument -a.
  • In case you want to capture persistently for a long time or for a certain amount of time, then use the argument -w.
  • blktrace will also stored the extracted data in local directory with a format device.blktrace.cpu, for example sda1.blktrace.cpu.
  • At step 1 and 2, you will need to fire a CTRL +C to stop the blktrace.
  • As seen in part 2, you have created test test1 file, do delete it same may consume disk space on your machine.

  • On part 5, the size of the file created in the  /mnt will not exceeds more that 1M since same has been specified in the command.
  • At part 9, you will noticed several other information which can be helpful such as D2C and  Q2C.
    • Q2D latency – time from request submission to Device.
    • D2C latency – Device latency for processing request.
    • Q2C latency – total latency , Q2D + D2C = Q2C.
  • To be able to generate the movie, you will have to install mencoder with all its dependencies.

Auditing Linux Operating System with Lynis

Auditing a Linux System is one of the most important aspect when it comes to security. After deploying a simple Centos 7 Linux machine on virtual box, I made an audit using Lynis. It is amazing how many tiny flaws can be seen right from the beginning of a fresh installation. Lynis Enterprise performs security scanning for Linux, macOS, and Unix systems. It helps you discover and solve issues quickly, so you can focus on your business and projects again.Cisofy.

Credits: cisofy.com
Credits: cisofy.com

Introduction

The Lynis tool performs both security and compliance auditing. It has a free and paid version which comes very handy especially if you are on a business environment. The installation of the Lynis tool is pretty simple. You can install it through the Linux repository itself, download the tar file or clone it directly from Github.

 

Scanning Performed by Lynis

1. I downloaded the tar file with the following command:

wget https://cisofy.com/files/lynis-2.6.0.tar.gz

2. Then, just untar the file and get into it

tar -xzf lynis-2.6.0.tar.gz && cd lynis

3. Once into the untar directory, launch the following command:

./lynis audit system --quick

 
As you can see from the output above, there are several suggestions at the end of the scan. In case the paid version of the application was used, more information and commands as how to remediate the situation would be given including support from Lynis. As regards to the free version, you can also debug by yourself several security aspects from the suggestions.
 
Suggestions, Compliance and Improvement.

 
1.The first two suggestions were about minimum and maximum password age.

Configure minimum password age in /etc/login.defs [AUTH-9286]

Configure maximum password age in /etc/login.defs [AUTH-9286]

To check the minimum and maximum password age, use the chage command :
chage -l
 

2. Use chage -m root to set the minimum password age and chage -M root to set maximum password age:

Also, you will have to set the parameter in the /etc/login.defs file

3. Delete accounts which are no longer used [AUTH-9288]

It is also suggested to delete accounts which are no longer in use. This suggestion was prompted as I created a user  “nitin” account during installation and did not use it yet. For the purpose of this blog, I deleted it using userdel -r nitin

4. Default umask in /etc/profile or /etc/profile.d/custom.sh could be more strict (e.g. 027) [AUTH-9328]

Default umask values are taken from the information provided in the /etc/login.defs file for RHEL (Red Hat) based distros. Debian and Ubuntu Linux based system use /etc/deluser.conf. To change default umask value to 027 which is actually 022 by default, you will need to modify the /etc/profile script as follows:

5. To decrease the impact of a full /home file system, place /home on a separated partition [FILE-6310]

  To decrease the impact of a full /tmp file system, place /tmp on a separated partition [FILE-6310]

  To decrease the impact of a full /var file system, place /var on a separated partition [FILE-6310]

In the article Move your /home to another partition, you will have detailed explanations to sort out this issue.

6. Disable drivers like USB storage when not used, to prevent unauthorized storage or data theft [STRG-1840]

   Disable drivers like firewire storage when not used, to prevent unauthorized storage or data theft [STRG-1846]

To disable USB and firewire storage drivers, add the following lines in /etc/modprobe.d/blacklist.conf then do a modprobe usb-storage && modprobe firewire-core

blacklist firewire-core
blacklist usb-storage

7. Split resolving between localhost and the hostname of the system [NAME-4406]

This issue is only about hostname and localhost in /etc/hosts which could confuse some applications installed on the machine. According to cisofy, for proper resolving, the entries of localhost and the local defined hostname, could be split. Using some middleware and some applications, resolving of the hostname to localhost, might confuse the software.

8. Install package ‘yum-utils’ for better consistency checking of the package database [PKGS-7384]

      Consider running ARP monitoring software (arpwatch,arpon) [NETW-3032]

The yum-utils and arpwatch are nice tools to perform more debugging and verification. Install it using the following commands:

yum install yum-utils arpwatch -y

9. You are advised to hide the mail_name (option: smtpd_banner) from your postfix configuration. Use postconf -e or change your main.cf file (/etc/postfix/main.cf) [MAIL-8818]

You just have to uncomment the following line and lauch a postconf -e. However, since this is a fresh install, and I’m not using postfix, it is better to stop the service.

 10.  Check iptables rules to see which rules are currently not used [FIRE-4513]

Since, I’m not on a production environment, it is very difficult to identify unused iptables rules right now. Once on the production environment, this situation is different. According to Cisofy, the best way is to “use iptables –list –numeric –verbose to display all rules. Check for rules which didn’t get a hit and repeat this process several times (e.g. in a few weeks). Finally remove any unneeded rules.”

 11. Consider hardening SSH configuration [SSH-7408]

  •     – Details  : AllowTcpForwarding (YES –> NO)
  •     – Details  : ClientAliveCountMax (3 –> 2)
  •     – Details  : Compression (YES –> (DELAYED|NO))
  •     – Details  : LogLevel (INFO –> VERBOSE)
  •     – Details  : MaxAuthTries (6 –> 2)
  •     – Details  : MaxSessions (10 –> 2)
  •     – Details  : PermitRootLogin (YES –> NO)
  •     – Details  : Port (22 –> )
  •     – Details  : TCPKeepAlive (YES –> NO)
  •     – Details  : UseDNS (YES –> NO)
  •     – Details  : X11Forwarding (YES –> NO)
  •     – Details  : AllowAgentForwarding (YES –> NO)

Again, hardening SSH is one of the most important to evade attacks especially from SSH bots. It all depends how your network infrastructure is configured and whether it is accessible from the internet or not. However, these details viewed are very informative.

12. Periodic system scan, malware and ransomware scanners are now a must. According to statistics, servers are being hacked constantly. Pervasive Monitoring is becoming a heavy cash deal for malicious softwares. 

The Lynis Command

Lynis documentation is pretty straight forward with a cheat sheet. The arguments are self explicit. Here are some hints.

1.Performs a system audit which is the most common audit.

lynis audit system

2. Provides command to do a remote scan.

lynis audit system remote <host>

3. Views the settings of default profile.

lynis show settings

4. Checks if you are using most recent version of Lynis

lynis update info

5. More information about a specific test-id

lynis show details <test-id>

6. To scan whole system

lynix --check-all Q

7. To see all available parameters of Lynis

lynis show options

At the end of any Lynis command, it will also prompt you where the logs have been stored for your future references. It is usually in /var/log/lynis.log. The systutorial on lynis is also a good start to grasp the command. All common systems based on Unix/Linux are supported. Examples include Linux, AIX, *BSD, HP-UX, macOS and Solaris. For package management, the following tools are supported:- dpkg/apt, pacman, pkg_info, RPM, YUM, zypper.

Out of Memory (OOM) in Linux machines

Since some months I have not been posting anything on my blog. I should admit that I was really busy. Recently, a friend asked me about the Out of Memory messages in Linux. How is it generated? What are the consequences? How can it be avoided in Linux machines? There is no specific answer to this as an investigation had to be carried out to have the Root Cause Analysis. Before getting into details about OOM, let’s be clear that whenever the Kernel is starved of memory, it will start killing processes. Many times, Linux administrators will experience this and one of the fastest way to get rid of it is by adding extra swap. However, this is not the definite way of solving the issue. A preliminary assessment needs to be carried out followed by an action plan, alongside, a rollback methodology.

If after killing some processes, the kernel cannot free up some memory, it might lead to a kernel panic, deadlocks in applications, kernel hungs, or several defunct processes in the machine. I know cases where the machine change run level mode. There are cases of kernel panic in virtual machines where the cluster is not healthy. In brief, OOM is a subsystem to kill one or more processes with the aim to free memory. In the article Linux kernel crash simulation using kdump, I gave an explanation how to activate Kdump to generate a vmcore for analysis. However, to send the debug messages during an out of memory error to the vmcore, the SYSCTL file need to be configured. I will be using a CentOS 7 machine to illustrate the OOM parameters and configurations.

1.To activate OOM debug in the vmcore file, set the parameter vm.panic_on_oom to 1 using the following command:

systctl -w vm.panic_on_oom=1

To verify if the configuration has been taken into consideration, you can do a sysctl -a | grep -i oom. It is not recommended to test such parameters in the production environment.

2. To find out which process the kernel is going to kill, the kernel will read a function in the kernel code called badness() . The badness() calculate a numeric value about how bad this task has been. To be precise, it works by accumulating some “points” for each process running on the machine and will return those processes to a function called select_bad_process() in the linux kernel. This will eventually start the OOM mechanism which will kill the processes. The “points” are stored in the /proc/<pid>/oom_score. For example, here, i have a server running JAVA.

As you can see, the process number is 2153. The oom_score is 21

3. There are lots of considerations that are taken into account when calculating the badness score. Some of the factors are the Virtual Memory size (VM size), the Priority of the Process (NICE value), the Total Runtime, the Running user and the /proc/<pid>/oom_adj. You can also set up the oom_score_adj value for any PID between -1000 to 1000. The lowest possible value, -1000, is equivalent to disabling OOM killing entirely for that task since it will always report a badness score of 0.

4. Let’s assume that you want to prevent a specific process from being killed.

echo -17 > /proc/$PID/oom_adj

5. If you know the process name of SSH Daemon and do not it from being killed, use the following command:

pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

6. To automate the sshd from being killed through a cron which will run each minute use the following:

* * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
7. Let's now simulate the OOM killer messages. Use the following command to start an out of memory event 
on the machine.
echo f > /proc/sysrq-trigger 

You will notice an OOM error message in the /var/log/messages.
As you can notice here, the PID 843 was calculated by the OOM killer before killing it. 
There is also the score number which is 4 in our case.


Before the 'Out of memory' error, there will be a call trace which will be sent by the kernel.



8. To monitor how the OOM killer is generating scores, you can use the dstat command. To install the dstat 
package on RPM based machine use: 
yum install dstat 

or for debian based distribution use:
apt-get install dstat

Dstat is used to generate resource statistics. To use dstat to monitor the score from OOM killer use:
dstat -top-oom


TIPS:

  • oom_score_adj is used in new linux kernel. The deprecated function is oom_adj in old Linux machine.
  • When disabling OOM killer under heavy memory pressure, it may cause the system to kernel panic.
  • Making a process immune is not a definite way of solving problem, for example, when using JAVA Application. Use a thread/heap dump to analyse the situation before making a process immune.
  • Dstat is now becoming an alternative for vmstat, netstat, iostat, ifstat and mpstat. For example, to monitor CPU in a program, use dstat -c –top-cpu -dn –top-mem
  • Testing in production environment should be avoided!

SAR command daily tips and tricks

As promised on Twitter days back, I would post some interesting tips and tricks using the SAR (System Activity Report) linux command. The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. The accounting system, based on the values in the count and interval parameters, writes information the specified number of times spaced at the specified intervals in seconds. If the interval parameter is set to zero, the sar command displays the average statistics for the time since the system was started.die.net 

Understanding SAR and its main configuration files

The SAR command is part of the sysstat package which is a multi-purpose analysis tool and it is useful to pin point specific issue related to CPU, Memory, I/O and Network. The command is really useful especially to plot the output on a graph for visual analysis and reporting. One example of such tool is GNUplot. To install GNUplot and SAR use the command yum install sysstat gnuplot. The configuration file of SAR is located at /etc/cron.d/sysstat and /etc/sysconfig/sysstat . If you would perform a rpm -ql sysstat | less , you would noticed that there are other binaries such as iostat,mpstat,pidstat etc.. that comes along with the package sysstat.

Difference between SA and SAR logs

In the directory /etc/cron.d/sysstat you would noticed that there is a cron which have been set up by default to run every ten minutes. The purpose is to write a log in the directory /var/log/sa . In this directory there are two type of files starting with sa and sar. SAR is the text file while SA is a binary. The sa file – Binary is updated every 10 minutes whereas the sar file is written at the end of the day. This parameter have been configured in the cron itself. By using the file command you would know which one is a binary or a text file.

To open the sa file, you need to use the command sar -f . Here is an example:

The /etc/sysconfig/sysstat file allows you to configure how long you want to keep the log, compression etc..

Some ways to use the SAR command

You can also visualize sar logs live using the sar command with the start and ending second. Let’s say you have run a command on the background or simply want to track the resource status for some seconds, you can use the command sar 1 3 Here is an example with the command

sar 1 3

Checking the load average, we apply the same principle but with the following command. The load average will also include load on each processor.

sar -q 1 3

To check for memory being consumed per seconds, use the following command

sar -r 1 3

To check number of memory coming in and out of the swap space, use the following command

sar -W 1 3

For the Disk I/O read write per seconds use the following command. Read/Write on disk also depend on the hardware

sar -b 1 3

For info about the CPU use the following

sar -u 1 3

To monitor the network activity in terms of packets in and out per interface received and compressed, use the following command

sar -n DEV 1 3

If the sar file has not been generated yet from the binary, you can use the following command, let’s say to convert it to a text in the /tmp directory.

sar -A -f  sa25 > /tmp/sar25

KSAR Graph with SAR logs

Now, in production environment, you need to to analyze for example at a specific time where memory or CPU was high. This can be done by means of a graph. I use the Ksar program. Ksar is a BSD-Licensed JAVA based application which create graph based on sar logs. You will need to install JAVA Runtime and launch the run.sh script to install the Ksar program. Once downloaded, just click on ‘Data’ and ‘Load from text file’. This is an example of swap usage

TIPS:

  • The SAR output by default is in 12 hour clock format. To make it become a 24 hour clock format edit the .bashrc file and insert the parameter alias sar=”LANG=C sar
  • GNUPlot is one another application to plot your information on a graph for better analysis.

What goes on behind the Network Time Protocol ?

Several times, I had discussions with friends on how NTP works! What is the logic behind NTP and its configurations? We noticed that there are several terms and calculations to grasp especially when it comes to debugging. Well, I decided to make a research on it and shed some ideas on NTP – Network Time Protocol. These recent days, we have noticed several vulnerabilities and attacks going on the NTP servers. NTP is a protocol designed to synchronize the clocks of computers over a network.

Photo credits: networktimefoundation.org
Photo credits: networktimefoundation.org

NTP is a utility where timestamps are used. Examples are logs, database replications, the time packets  exchanged in a network. NTP uses its own binary format and runs on port 123 UDP. RFC 1305 and RFC 2030 give detailed explanations of NTP.

The logic behind NTP

In brief, packets are exchanged between the NTP server and the client in the following order. It is to be noted that latency is an important issue when it comes to NTP:

  1. The NTP client will send a request with a timestamp.
  2. The NTP server will return the packet with 3 timestamps.
    • echo of the client timestamp
    • The timestamp of the received timestamp by the server
    • The timestamp response sent by the server.
  3. The client will then estimate the offset (the difference in timestamps between the client and the server)

A client may have several NTP servers configured, but will synchronize with only one NTP server. A server may also take some time to respond to a client depending when it is not busy. NTP packets are exchanged between 64 – 1024 seconds for each server. – This configurations are called “minpoll” and “maxpoll”

Some basic configuration on a CentOS 7 machine

The command timedatectl can be used to check if NTP is enabled or not

[[email protected] ~]# timedatectl | egrep -i ntp
     NTP enabled: yes
NTP synchronized: yes

You can also check if the service is running using the following command;

systemctl status chronyd.service

The configuration file is located by default at /etc/chrony.conf . You will notice that the NTP servers are configured by default in that file.

[[email protected] ~]# head -n 7 /etc/chrony.conf 

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst

You can check if your machine are synchronised from the sources with the following command. The column Name/IP address are the location from where the time is being synchronized.

[[email protected] ~]# chronyc sources


210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^+ cpt-ntp.mweb.co.za            2   6   367    40    +29ms[  +29ms] +/-  151ms
^* cpto-afr-01.time.jpbe.de      2   7   377    43    +69ms[  +66ms] +/-  178ms
^+ ntp2.inx.net.za               2   6   377    43    +64ms[  +64ms] +/-  181ms
^+ ns.bitco.co.za                3   7   377   107    +79ms[  +77ms] +/-  199ms

A drift means a deviation. A drift happens when  the hardware clock is either fast or slow compared to the NTP server clock. The drift file contains 2 values. If it’s a positive number, it means the clock is fast from the NTP server whereas if it is a negative number, it means the clock is slow compared to the NTP server. Here i have a slow clock.

[[email protected] ~]# cat /var/lib/chrony/drift 

         -870.203668          1484.980507
The Maths - How the NTP client adjust its response from the NTP server

Now, we will get into the math behind the logic as discussed previously what happens when the NTP client request the time from the NTP server.

  1. NTP Client A send request to server X – Let’s assume A=100 where 100 is the time of the client
  2. NTP Server X received the request after some secs. – Let’s assume X= 150 where 150 is the time of the server.
  3. Being given that, the request from NTP client is not necessarily served immediately, there is lapse of time at this point. let’s assume that X is now 160
  4. We now have 3 values i.e; The time the client sent the request, the time (real time) the server received the request and the time the server want to respond back.
  5. Now the NTP client gets the request back at 120. This is because the NTP client has its own time.
  6. Client will now determine the time using the formulae B-A – (Y-X) which means 120-100-(160-150) = 10 seconds
  7. Client assumed that the time it took to get the response from server to client is 10/2 = 5 seconds. Assuming 5 seconds is the latency.
  8. Now the client adds 5 seconds to  the server time at the time it received the response which makes 160 +5 = 165 seconds.
  9. The client knows it needs to add 45 seconds to its clock. This is done by subtracting 165 – 120 = 45 seconds where 45 seconds is the difference between the client and the server clock to which the client will set forward its clock by 45 seconds. This indication will be given in the drift file in PPM – Parts Per Million.

TIPS:

  • If the iburst parameters are removed, communication between the server will 8 times faster.
  • You can also increase the verbosity of the command chronyc sources by adding the parameter -v to it and detailed explanation of the values will be given i.e; chronyc source -v
  • You can pick up different NTP servers from the NIST website and restart the NTP service (chronyd).
  • NTP was invented by David L. Mills in 1981 and it is based on Marzullo’s algorithm to get accurate time from several sources.
  • Timestamps of NTP are stored in seconds and it is 64 bit in size – 32 bit for number of seconds and 32 bit for fraction of seconds.
  • Number in drift file is measured in PPM – Parts per million
  • Offset – The difference in timestamps between the client and the server
  • Burst – The speed of communication between NTP client and NTP server will be 8 times more if “burst” is used.
  • The drift value is in PPM – Parts Per Million.
  • To convert into PPM is easy. Since we have 86,400 seconds in a day, therefore, 86,400 / 1,000,000 = 0.0864 PPM
  • If my drift file shows a value of Z  where Z = 30.3 simply do (30.3 x 0.0864) to get the drift file into milliseconds.