Skip to content

Project 03 — Log Analyzer

Parse Apache/Nginx access logs using grep, awk, and sed to produce a human-readable summary report of traffic patterns and errors.

What You Will Build

A log analyzer that reads a web server access log and produces: - Total request count - Top 10 most requested URLs - Top 10 IPs by request count - HTTP status code distribution (200, 404, 500, etc.) - Top 5 slowest requests (if response time is in the log) - Error rate percentage

Sample Access Log Format (Apache Combined)

192.168.1.1 - - [15/Jan/2024:09:23:14 +0000] "GET /index.html HTTP/1.1" 200 2048 "-" "Mozilla/5.0"
10.0.0.42 - - [15/Jan/2024:09:23:15 +0000] "POST /api/login HTTP/1.1" 401 187 "-" "curl/7.68.0"
172.16.0.1 - - [15/Jan/2024:09:23:16 +0000] "GET /admin HTTP/1.1" 404 512 "-" "python-requests/2.28.0"

Getting Started

Step 1 — Extract Individual Fields

LOGFILE="$1"

# Total requests
total=$(wc -l < "$LOGFILE")

# Top 10 IPs
echo "=== Top 10 IPs ==="
awk '{print $1}' "$LOGFILE" | sort | uniq -c | sort -rn | head -10

# HTTP status codes
echo "=== Status Codes ==="
awk '{print $9}' "$LOGFILE" | sort | uniq -c | sort -rn

# Top URLs
echo "=== Top 10 URLs ==="
awk '{print $7}' "$LOGFILE" | sort | uniq -c | sort -rn | head -10

Step 2 — Error Rate

errors=$(awk '$9 ~ /^[45]/' "$LOGFILE" | wc -l)
error_pct=$(awk "BEGIN { printf \"%.1f\", $errors/$total*100 }")
echo "Error rate: ${error_pct}% ($errors / $total requests)"

Step 3 — Time-Based Analysis

# Requests per hour
echo "=== Requests by Hour ==="
awk '{print $4}' "$LOGFILE" \
    | grep -oP '\d{2}/\w+/\d{4}:\d{2}' \
    | sort | uniq -c

Step 4 — Full Report Script

Wrap everything in a script that: - Accepts the log file as an argument - Generates output with clear section headers - Optionally saves the report to a file (--output report.txt) - Handles compressed logs (*.gz) by piping through zcat

Sample Output

========================================
 Log Analysis Report
 File: /var/log/nginx/access.log
 Generated: 2024-01-15 10:00:00
 Total requests: 15,234
========================================

TOP 10 IPs
   1523  192.168.1.105
    987  10.0.0.42
    ...

STATUS CODE DISTRIBUTION
  12841  200 (OK)
    891  404 (Not Found)
    487  301 (Redirect)
     15  500 (Server Error)

ERROR RATE: 5.9% (891 / 15234)

Stretch Goals

  • Add GeoIP lookup using a local database or free API
  • Generate an HTML report with basic CSS charts
  • Alert if error rate exceeds a configurable threshold
  • Compare two log files and report the diff

[[02-file-organizer]] | [[04-backup-script]]