Analyzing Nginx Logs with Python: Reverse DNS Lookup, IP Enumeration, and Counting

Estimated read time 4 min read

While experimenting with parsing my Nginx access logs in Python a while ago, it occurred to me that this could serve as the basis for another blog post. My goal was to identify the most frequently occurring IP address and export the data into a CSV format for easier filtering. Although achieving the same outcome with a bash script is quicker, I viewed this exercise as a warm-up in using Python for the task. Truth be told, I don’t currently have a practical use case for this specific approach.

Dissection of the script

1. Importing the module re – regular expression:

import re

2. Declaring a variable for the log. In my case it’s in the same folder where the script is:

NginxLog="access.log"

3. Declaring a second variable as to what to use to filter the log. regexp is a variable here.

The ‘r’ before the string ensures that the backslashes are treated as literal characters, which is important when dealing with regular expressions that often use backslashes for special characters or escape sequences.

[0-9] is in a character class from 0 until 9

{1,3} is defining how many times from 0 to 9 can be possible. It’s called a quantifier.

regexp = r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"

4. ‘with open’ is a reserved word in Python which means open a file and ensure it’s closed after. And once it opens, just put it into a variable called “file”

with open(NginxLog) as file:

5. So another variable called IPList is calling the “re” function to find all logs based upon the regular expression defined in ‘regexp’.

IPList = re.findall(regexp, logs)

6. Finally, a visual output as follows:

print(IPList)

Putting Everything Together

7. So, let’s further enhance it by adding the following: The Collections module in Python provides the Counter class, which is useful for counting the occurrences of elements in a collection.

from collections import Counter

8. And this time we want to print the same result using a counter

print(Counter(IPList))

9. Now that we have a list which is called a “tuple” in Python. So to get more refined data with IPs and Counts do the following: A for loop to iterate into the list and ‘items’ is a method of a Counter that returns a view object that displays a list of a dictionary’s key-value tuple pairs

 for IPs, Counts in Counter(IPList).items():
print(IPs, Counts)

The second version of the result

 

 Where is the CSV?

10. Now, let’s call a module called CSV

import csv

11. Again the statement here is opening/writing to it. It’s then parsed to the variable named ‘file’.

with open("finalresult.csv","w",) as file:

12. Keeps on writing to the same variable named file declared above but instead as a CSV.

WriteIntoTheCSV = csv.writer(file)

13. Now we write a single row into a CSV file with two columns (IPs List and Number of Counts)

WriteIntoTheCSV.writerow(["IPs List","Number of Counts"])

14. Iterate through the output of the variable ‘IPList’ again, and write to the file

for IPs, Counts in Counter(IPList).items():
WriteIntoTheCSV.writerow([IPs,Counts])

Adding everything together


Assuming we don’t want a CSV, but instead a DNS lookup on all the IPs

15. To be able to do a DNS lookup on all the IPs, we need to module ‘socket’.

import socket

16. After importing the ‘socket’, this function has been created: So the ‘reverse_dns_lookup’ is a function that has been defined and it takes only an IP address as an argument.

def reverse_dns_lookup(ip):

17. As some IP reverse DNS can raise an exception we use ‘try’

try:

18. This is where the reverse DNS lookup is done using ‘socket.gethostbyaddr(ip)’.

The result of the lookup is a tuple containing the primary domain name associated with the IP address (host_name), an alias list, and an IP address list. The underscore _ is used to ignore the parts of the tuple that are not needed in this case.

host_name, _, _ = socket.gethostbyaddr(ip)

19. That’s the condition where the DNS lookup is successful, the function returns the obtained hostname.

return host_name

20. If fails then just say N/A

except socket.herror:
return "N/A"

21. The write-to file is created as follows:

with open("finalresult.csv", "w", newline='') as file:
WriteIntoTheCSV = csv.writer(file)
WriteIntoTheCSV.writerow(["IPs List", "Number of Counts", "Reverse DNS"])

Final script result

Nitin J Mutkawoa https://tunnelix.com

Blogger at tunnelix.com | Founding member of cyberstorm.mu | An Aficionado Journey in Opensource & Linux – And now It's a NASDAQ touch!

You May Also Like

More From Author