Stream Editing with sed and awk¶
sed and awk let you transform streams of text — editing files without opening them, extracting specific fields from structured data, and building one-line data pipelines that would take 50 lines in any other language.
Learning Objectives¶
- Substitute text in a stream with
sed - Delete and print specific lines using
sed - Extract and compute on fields with
awk - Use
trto translate or delete characters - Chain
sedandawkinto real-world data pipelines
sed — Stream Editor¶
sed reads input line by line, applies editing commands, and writes to stdout. The most common use is substitution.
Substitution¶
sed 's/old/new/' file.txt # replace first occurrence per line
sed 's/old/new/g' file.txt # replace all occurrences per line (global)
sed 's/old/new/i' file.txt # case-insensitive replace
sed 's/old/new/2' file.txt # replace second occurrence only
# Edit in place (modify the file directly)
sed -i 's/http:/https:/g' config.txt
# Edit in place with backup (safer)
sed -i.bak 's/http:/https:/g' config.txt
-i on macOS requires an argument
On macOS (BSD sed), -i requires an explicit backup extension: sed -i '' 's/old/new/' file. On GNU sed (Linux), sed -i 's/old/new/' file works without an argument. For portable scripts use sed -i.bak on both.
Deleting Lines¶
sed '3d' file.txt # delete line 3
sed '3,7d' file.txt # delete lines 3 through 7
sed '/^#/d' file.txt # delete comment lines
sed '/^$/d' file.txt # delete blank lines
Printing Specific Lines¶
sed -n '5p' file.txt # print only line 5
sed -n '5,10p' file.txt # print lines 5-10
sed -n '/error/p' file.txt # print lines matching pattern
-n suppresses default output
By default, sed prints every line. -n suppresses this, so only explicit p commands produce output. Without -n, sed -n '5p' prints line 5 once; without the -n, sed '5p' would print line 5 twice (once from p and once from default output).
awk — Pattern-Action Processing¶
awk processes text field by field. It is particularly useful for structured data like CSV, logs, and command output.
Basic Structure¶
- If pattern is omitted, action runs on every line.
- If action is omitted, the default action is to print the matching line.
Field Variables¶
awk splits each line into fields on whitespace (or a delimiter you specify). Fields are $1, $2, etc. $0 is the entire line. NF is the number of fields. NR is the current line number.
# Print the first field of each line
awk '{ print $1 }' /etc/passwd
# Print fields 1 and 7 with a different separator
awk -F: '{ print $1, $7 }' /etc/passwd
Patterns and Conditionals¶
# Print lines where field 3 is greater than 1000 (UIDs > 1000 = regular users)
awk -F: '$3 > 1000 { print $1 }' /etc/passwd
# Print lines that match a pattern
awk '/error/ { print NR, $0 }' /var/log/app.log
# BEGIN and END blocks run once, before and after all input
awk 'BEGIN { count=0 } /error/ { count++ } END { print "Errors:", count }' app.log
# Sum the size column from ls -l output
ls -l | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'
Computing with awk¶
# Average of numbers in a file
awk '{ sum += $1; count++ } END { print sum/count }' numbers.txt
# Print lines longer than 80 characters
awk 'length($0) > 80' file.txt
awk vs cut
Use cut for simple fixed-column extraction (faster, simpler). Use awk when you need conditionals, arithmetic, or to reference fields by position dynamically.
tr — Translate Characters¶
tr reads from stdin and translates or deletes characters.
echo "hello world" | tr ' ' '_' # replace spaces with underscores
echo "hello world" | tr -s ' ' # squeeze repeated spaces into one
echo "hello123" | tr -d '0-9' # delete all digits
cat file.txt | tr -d '\r' # remove Windows line endings
Real-World Pipeline¶
Combining sed, awk, and tr:
# Parse an Apache access log: count requests per IP, top 10
awk '{ print $1 }' /var/log/apache2/access.log \
| sort | uniq -c | sort -rn | head -10
# Extract all email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt
Common Mistakes¶
Forgetting to quote sed patterns
Special characters in sed patterns (/, &, \) need escaping. If your replacement string contains /, use a different delimiter: sed 's|/old/path|/new/path|g'.
awk field numbering starts at 1
In awk, $1 is the first field. $0 is the entire line. There is no $0 as "nothing" — a common mistake from other languages.
Practice Exercises¶
Warm-Up (run and observe)¶
- Run
echo "the quick brown fox" | sed 's/o/0/g'. What happens? - Run
awk -F: '{ print $1 }' /etc/passwd | head -10. What does this show? - Run
echo "Hello World" | tr 'A-Z' 'a-z'. What doestrdo here?
Main (write a short script)¶
Create ~/scripts/csv_summary.sh that takes a CSV filename and prints the number of rows and the values in the first column:
#!/usr/bin/env bash
set -euo pipefail
FILE="${1:?Usage: $0 <csvfile>}"
echo "Rows: $(wc -l < "$FILE")"
echo "First column values:"
cut -d, -f1 "$FILE" | tail -n +2 | sort | uniq
Stretch¶
- Use
sedto remove all comment lines (starting with#) and blank lines from/etc/ssh/sshd_config. Print the result without modifying the file. - Write an
awkone-liner that prints the filename and line number of every line longer than 100 characters in all.mdfiles indocs/. - Research the difference between
sed -n '/start/,/end/p'andawk '/start/,/end/'. When would you prefer one over the other?
Interview Questions¶
- What does
sed 's/foo/bar/g'do, and what does thegmean?
Show answer
It substitutes every occurrence of foo with bar on each line. Without g, only the first occurrence per line is replaced. g stands for "global" — replace all occurrences on each line.
- What is
$0in awk?
Show answer
$0 is the entire current line (all fields). $1, $2, etc. are individual fields. NF is the number of fields. NR is the current record (line) number.
- How would you use
sedto delete blank lines from a file in place?
Show answer
sed -i '/^$/d' file.txt — the pattern ^$ matches lines with nothing between start and end (blank lines). -i edits in place. On macOS use sed -i '' '/^$/d' file.txt.