Project 03 — Log Analyzer¶
Parse Apache/Nginx access logs using grep, awk, and sed to produce a human-readable summary report of traffic patterns and errors.
What You Will Build¶
A log analyzer that reads a web server access log and produces: - Total request count - Top 10 most requested URLs - Top 10 IPs by request count - HTTP status code distribution (200, 404, 500, etc.) - Top 5 slowest requests (if response time is in the log) - Error rate percentage
Sample Access Log Format (Apache Combined)¶
192.168.1.1 - - [15/Jan/2024:09:23:14 +0000] "GET /index.html HTTP/1.1" 200 2048 "-" "Mozilla/5.0"
10.0.0.42 - - [15/Jan/2024:09:23:15 +0000] "POST /api/login HTTP/1.1" 401 187 "-" "curl/7.68.0"
172.16.0.1 - - [15/Jan/2024:09:23:16 +0000] "GET /admin HTTP/1.1" 404 512 "-" "python-requests/2.28.0"
Getting Started¶
Step 1 — Extract Individual Fields¶
LOGFILE="$1"
# Total requests
total=$(wc -l < "$LOGFILE")
# Top 10 IPs
echo "=== Top 10 IPs ==="
awk '{print $1}' "$LOGFILE" | sort | uniq -c | sort -rn | head -10
# HTTP status codes
echo "=== Status Codes ==="
awk '{print $9}' "$LOGFILE" | sort | uniq -c | sort -rn
# Top URLs
echo "=== Top 10 URLs ==="
awk '{print $7}' "$LOGFILE" | sort | uniq -c | sort -rn | head -10
Step 2 — Error Rate¶
errors=$(awk '$9 ~ /^[45]/' "$LOGFILE" | wc -l)
error_pct=$(awk "BEGIN { printf \"%.1f\", $errors/$total*100 }")
echo "Error rate: ${error_pct}% ($errors / $total requests)"
Step 3 — Time-Based Analysis¶
# Requests per hour
echo "=== Requests by Hour ==="
awk '{print $4}' "$LOGFILE" \
| grep -oP '\d{2}/\w+/\d{4}:\d{2}' \
| sort | uniq -c
Step 4 — Full Report Script¶
Wrap everything in a script that:
- Accepts the log file as an argument
- Generates output with clear section headers
- Optionally saves the report to a file (--output report.txt)
- Handles compressed logs (*.gz) by piping through zcat
Sample Output¶
========================================
Log Analysis Report
File: /var/log/nginx/access.log
Generated: 2024-01-15 10:00:00
Total requests: 15,234
========================================
TOP 10 IPs
1523 192.168.1.105
987 10.0.0.42
...
STATUS CODE DISTRIBUTION
12841 200 (OK)
891 404 (Not Found)
487 301 (Redirect)
15 500 (Server Error)
ERROR RATE: 5.9% (891 / 15234)
Stretch Goals¶
- Add GeoIP lookup using a local database or free API
- Generate an HTML report with basic CSS charts
- Alert if error rate exceeds a configurable threshold
- Compare two log files and report the diff
[[02-file-organizer]] | [[04-backup-script]]