Skip to content

Stream Editing with sed and awk

sed and awk let you transform streams of text — editing files without opening them, extracting specific fields from structured data, and building one-line data pipelines that would take 50 lines in any other language.

Learning Objectives

  • Substitute text in a stream with sed
  • Delete and print specific lines using sed
  • Extract and compute on fields with awk
  • Use tr to translate or delete characters
  • Chain sed and awk into real-world data pipelines

sed — Stream Editor

sed reads input line by line, applies editing commands, and writes to stdout. The most common use is substitution.

Substitution

sed 's/old/new/' file.txt            # replace first occurrence per line
sed 's/old/new/g' file.txt           # replace all occurrences per line (global)
sed 's/old/new/i' file.txt           # case-insensitive replace
sed 's/old/new/2' file.txt           # replace second occurrence only

echo "hello world world" | sed 's/world/earth/g'
hello earth earth

# Edit in place (modify the file directly)
sed -i 's/http:/https:/g' config.txt

# Edit in place with backup (safer)
sed -i.bak 's/http:/https:/g' config.txt

-i on macOS requires an argument

On macOS (BSD sed), -i requires an explicit backup extension: sed -i '' 's/old/new/' file. On GNU sed (Linux), sed -i 's/old/new/' file works without an argument. For portable scripts use sed -i.bak on both.

Deleting Lines

sed '3d' file.txt                    # delete line 3
sed '3,7d' file.txt                  # delete lines 3 through 7
sed '/^#/d' file.txt                 # delete comment lines
sed '/^$/d' file.txt                 # delete blank lines

Printing Specific Lines

sed -n '5p' file.txt                 # print only line 5
sed -n '5,10p' file.txt              # print lines 5-10
sed -n '/error/p' file.txt           # print lines matching pattern

-n suppresses default output

By default, sed prints every line. -n suppresses this, so only explicit p commands produce output. Without -n, sed -n '5p' prints line 5 once; without the -n, sed '5p' would print line 5 twice (once from p and once from default output).


awk — Pattern-Action Processing

awk processes text field by field. It is particularly useful for structured data like CSV, logs, and command output.

Basic Structure

awk 'pattern { action }' file
  • If pattern is omitted, action runs on every line.
  • If action is omitted, the default action is to print the matching line.

Field Variables

awk splits each line into fields on whitespace (or a delimiter you specify). Fields are $1, $2, etc. $0 is the entire line. NF is the number of fields. NR is the current line number.

# Print the first field of each line
awk '{ print $1 }' /etc/passwd

# Print fields 1 and 7 with a different separator
awk -F: '{ print $1, $7 }' /etc/passwd
root /bin/bash
daemon /usr/sbin/nologin
bin /bin/sh

Patterns and Conditionals

# Print lines where field 3 is greater than 1000 (UIDs > 1000 = regular users)
awk -F: '$3 > 1000 { print $1 }' /etc/passwd

# Print lines that match a pattern
awk '/error/ { print NR, $0 }' /var/log/app.log

# BEGIN and END blocks run once, before and after all input
awk 'BEGIN { count=0 } /error/ { count++ } END { print "Errors:", count }' app.log

# Sum the size column from ls -l output
ls -l | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'
Total bytes: 204800

Computing with awk

# Average of numbers in a file
awk '{ sum += $1; count++ } END { print sum/count }' numbers.txt

# Print lines longer than 80 characters
awk 'length($0) > 80' file.txt

awk vs cut

Use cut for simple fixed-column extraction (faster, simpler). Use awk when you need conditionals, arithmetic, or to reference fields by position dynamically.


tr — Translate Characters

tr reads from stdin and translates or deletes characters.

echo "hello" | tr 'a-z' 'A-Z'        # uppercase
HELLO

echo "hello world" | tr ' ' '_'      # replace spaces with underscores
echo "hello   world" | tr -s ' '     # squeeze repeated spaces into one
echo "hello123" | tr -d '0-9'        # delete all digits
cat file.txt | tr -d '\r'            # remove Windows line endings

Real-World Pipeline

Combining sed, awk, and tr:

# Parse an Apache access log: count requests per IP, top 10
awk '{ print $1 }' /var/log/apache2/access.log \
  | sort | uniq -c | sort -rn | head -10
   1523 192.168.1.105
    987 10.0.0.42
    654 172.16.0.1

# Extract all email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

Common Mistakes

Forgetting to quote sed patterns

Special characters in sed patterns (/, &, \) need escaping. If your replacement string contains /, use a different delimiter: sed 's|/old/path|/new/path|g'.

awk field numbering starts at 1

In awk, $1 is the first field. $0 is the entire line. There is no $0 as "nothing" — a common mistake from other languages.


Practice Exercises

Warm-Up (run and observe)

  1. Run echo "the quick brown fox" | sed 's/o/0/g'. What happens?
  2. Run awk -F: '{ print $1 }' /etc/passwd | head -10. What does this show?
  3. Run echo "Hello World" | tr 'A-Z' 'a-z'. What does tr do here?

Main (write a short script)

Create ~/scripts/csv_summary.sh that takes a CSV filename and prints the number of rows and the values in the first column:

#!/usr/bin/env bash
set -euo pipefail

FILE="${1:?Usage: $0 <csvfile>}"

echo "Rows: $(wc -l < "$FILE")"
echo "First column values:"
cut -d, -f1 "$FILE" | tail -n +2 | sort | uniq

Stretch

  1. Use sed to remove all comment lines (starting with #) and blank lines from /etc/ssh/sshd_config. Print the result without modifying the file.
  2. Write an awk one-liner that prints the filename and line number of every line longer than 100 characters in all .md files in docs/.
  3. Research the difference between sed -n '/start/,/end/p' and awk '/start/,/end/'. When would you prefer one over the other?

Interview Questions

  1. What does sed 's/foo/bar/g' do, and what does the g mean?
Show answer

It substitutes every occurrence of foo with bar on each line. Without g, only the first occurrence per line is replaced. g stands for "global" — replace all occurrences on each line.

  1. What is $0 in awk?
Show answer

$0 is the entire current line (all fields). $1, $2, etc. are individual fields. NF is the number of fields. NR is the current record (line) number.

  1. How would you use sed to delete blank lines from a file in place?
Show answer

sed -i '/^$/d' file.txt — the pattern ^$ matches lines with nothing between start and end (blank lines). -i edits in place. On macOS use sed -i '' '/^$/d' file.txt.


day02-part1-text-processing | day03-part1-variables-quoting