Stream Editing with sed and awk¶

sed and awk let you transform streams of text — editing files without opening them, extracting specific fields from structured data, and building one-line data pipelines that would take 50 lines in any other language.

Learning Objectives¶

Substitute text in a stream with sed
Delete and print specific lines using sed
Extract and compute on fields with awk
Use tr to translate or delete characters
Chain sed and awk into real-world data pipelines

`sed` — Stream Editor¶

sed reads input line by line, applies editing commands, and writes to stdout. The most common use is substitution.

Substitution¶

sed 's/old/new/' file.txt            # replace first occurrence per line
sed 's/old/new/g' file.txt           # replace all occurrences per line (global)
sed 's/old/new/i' file.txt           # case-insensitive replace
sed 's/old/new/2' file.txt           # replace second occurrence only

echo "hello world world" | sed 's/world/earth/g'

hello earth earth

# Edit in place (modify the file directly)
sed -i 's/http:/https:/g' config.txt

# Edit in place with backup (safer)
sed -i.bak 's/http:/https:/g' config.txt

-i on macOS requires an argument

On macOS (BSD sed), -i requires an explicit backup extension: sed -i '' 's/old/new/' file. On GNU sed (Linux), sed -i 's/old/new/' file works without an argument. For portable scripts use sed -i.bak on both.

Deleting Lines¶

sed '3d' file.txt                    # delete line 3
sed '3,7d' file.txt                  # delete lines 3 through 7
sed '/^#/d' file.txt                 # delete comment lines
sed '/^$/d' file.txt                 # delete blank lines

Printing Specific Lines¶

sed -n '5p' file.txt                 # print only line 5
sed -n '5,10p' file.txt              # print lines 5-10
sed -n '/error/p' file.txt           # print lines matching pattern

-n suppresses default output

By default, sed prints every line. -n suppresses this, so only explicit p commands produce output. Without -n, sed -n '5p' prints line 5 once; without the -n, sed '5p' would print line 5 twice (once from p and once from default output).

`awk` — Pattern-Action Processing¶

awk processes text field by field. It is particularly useful for structured data like CSV, logs, and command output.

Basic Structure¶

awk 'pattern { action }' file

If pattern is omitted, action runs on every line.
If action is omitted, the default action is to print the matching line.

Field Variables¶

awk splits each line into fields on whitespace (or a delimiter you specify). Fields are $1, $2, etc. $0 is the entire line. NF is the number of fields. NR is the current line number.

# Print the first field of each line
awk '{ print $1 }' /etc/passwd

# Print fields 1 and 7 with a different separator
awk -F: '{ print $1, $7 }' /etc/passwd

root /bin/bash daemon /usr/sbin/nologin bin /bin/sh

Patterns and Conditionals¶

# Print lines where field 3 is greater than 1000 (UIDs > 1000 = regular users)
awk -F: '$3 > 1000 { print $1 }' /etc/passwd

# Print lines that match a pattern
awk '/error/ { print NR, $0 }' /var/log/app.log

# BEGIN and END blocks run once, before and after all input
awk 'BEGIN { count=0 } /error/ { count++ } END { print "Errors:", count }' app.log

# Sum the size column from ls -l output
ls -l | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'

Total bytes: 204800

Computing with awk¶

# Average of numbers in a file
awk '{ sum += $1; count++ } END { print sum/count }' numbers.txt

# Print lines longer than 80 characters
awk 'length($0) > 80' file.txt

awk vs cut

Use cut for simple fixed-column extraction (faster, simpler). Use awk when you need conditionals, arithmetic, or to reference fields by position dynamically.

`tr` — Translate Characters¶

tr reads from stdin and translates or deletes characters.

echo "hello" | tr 'a-z' 'A-Z'        # uppercase

HELLO

echo "hello world" | tr ' ' '_'      # replace spaces with underscores
echo "hello   world" | tr -s ' '     # squeeze repeated spaces into one
echo "hello123" | tr -d '0-9'        # delete all digits
cat file.txt | tr -d '\r'            # remove Windows line endings

Real-World Pipeline¶

Combining sed, awk, and tr:

# Parse an Apache access log: count requests per IP, top 10
awk '{ print $1 }' /var/log/apache2/access.log \
  | sort | uniq -c | sort -rn | head -10

1523 192.168.1.105 987 10.0.0.42 654 172.16.0.1

# Extract all email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

Common Mistakes¶

Forgetting to quote sed patterns

Special characters in sed patterns (/, &, \) need escaping. If your replacement string contains /, use a different delimiter: sed 's|/old/path|/new/path|g'.

awk field numbering starts at 1

In awk, $1 is the first field. $0 is the entire line. There is no $0 as "nothing" — a common mistake from other languages.

Practice Exercises¶

Warm-Up (run and observe)¶

Run echo "the quick brown fox" | sed 's/o/0/g'. What happens?
Run awk -F: '{ print $1 }' /etc/passwd | head -10. What does this show?
Run echo "Hello World" | tr 'A-Z' 'a-z'. What does tr do here?

Main (write a short script)¶

Create ~/scripts/csv_summary.sh that takes a CSV filename and prints the number of rows and the values in the first column:

#!/usr/bin/env bash
set -euo pipefail

FILE="${1:?Usage: $0 <csvfile>}"

echo "Rows: $(wc -l < "$FILE")"
echo "First column values:"
cut -d, -f1 "$FILE" | tail -n +2 | sort | uniq

Stretch¶

Use sed to remove all comment lines (starting with #) and blank lines from /etc/ssh/sshd_config. Print the result without modifying the file.
Write an awk one-liner that prints the filename and line number of every line longer than 100 characters in all .md files in docs/.
Research the difference between sed -n '/start/,/end/p' and awk '/start/,/end/'. When would you prefer one over the other?

Interview Questions¶

What does sed 's/foo/bar/g' do, and what does the g mean?

Show answer

It substitutes every occurrence of foo with bar on each line. Without g, only the first occurrence per line is replaced. g stands for "global" — replace all occurrences on each line.

What is $0 in awk?

Show answer

$0 is the entire current line (all fields). $1, $2, etc. are individual fields. NF is the number of fields. NR is the current record (line) number.

How would you use sed to delete blank lines from a file in place?

Show answer

sed -i '/^$/d' file.txt — the pattern ^$ matches lines with nothing between start and end (blank lines). -i edits in place. On macOS use sed -i '' '/^$/d' file.txt.

day02-part1-text-processing | day03-part1-variables-quoting

Stream Editing with sed and awk¶

Learning Objectives¶

sed — Stream Editor¶

Substitution¶

Deleting Lines¶

Printing Specific Lines¶

awk — Pattern-Action Processing¶

Basic Structure¶

Field Variables¶

Patterns and Conditionals¶

Computing with awk¶

tr — Translate Characters¶

Real-World Pipeline¶

Common Mistakes¶

Practice Exercises¶

Warm-Up (run and observe)¶

Main (write a short script)¶

Stretch¶

Interview Questions¶

`sed` — Stream Editor¶

`awk` — Pattern-Action Processing¶

`tr` — Translate Characters¶