I/O, Pipes & Redirection¶
The pipe | is Unix's most powerful idea — chain small tools together to build pipelines that would take hundreds of lines in any other language, with no intermediate files and near-zero overhead.
Learning Objectives¶
- Understand the three standard streams: stdin, stdout, stderr
- Redirect output to files with
>,>>, and2> - Connect commands with pipes
| - Use
teeto split output to a file and stdout simultaneously - Use process substitution
<(command)for advanced pipelines
Standard Streams¶
Every process has three default I/O streams:
| Stream | Number | Default | Variable |
|---|---|---|---|
| stdin | 0 | keyboard | — |
| stdout | 1 | terminal | — |
| stderr | 2 | terminal | — |
Redirection¶
Redirecting stdout¶
ls > filelist.txt # overwrite (creates or truncates)
ls >> filelist.txt # append
echo "hello" > /dev/null # discard output
Redirecting stderr¶
ls /nonexistent 2> errors.txt # redirect stderr to file
ls /nonexistent 2>> errors.txt # append stderr
ls /nonexistent 2>/dev/null # discard stderr
Redirecting Both stdout and stderr¶
command > output.txt 2>&1 # both to same file
command &> output.txt # bash shorthand (same effect)
command > output.txt 2>/dev/null # stdout to file, discard stderr
Order matters with 2>&1
command > file 2>&1 is correct — redirect stdout to file, then redirect stderr to where stdout is pointing (the file). command 2>&1 > file is wrong — it redirects stderr to the terminal (current stdout), then redirects stdout to the file.
Redirecting stdin¶
sort < names.txt # sort reads from file instead of keyboard
while IFS= read -r line; do
echo "$line"
done < data.txt
Pipes¶
A pipe | connects the stdout of one command to the stdin of the next:
ls -la | grep ".sh" # list files, show only .sh files
cat /etc/passwd | wc -l # count lines in passwd
ps aux | grep nginx # find nginx processes
Building Pipelines¶
# Top 5 most-used commands in your history
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -5
# Find all running processes owned by your user, sorted by memory
ps aux | grep "^$USER" | sort -k4 -rn | head -10
Pipes create subshells
Each command in a pipeline runs in a subshell. Variables set inside a pipeline are not visible in the parent shell. If you need to use a value computed in a pipeline, capture it: result=$(command1 | command2).
tee — Write to File and stdout Simultaneously¶
command | tee output.txt # write to file AND show on terminal
command | tee -a output.txt # append to file, also show on terminal
command | tee output.txt | wc -l # write to file, count lines in pipeline
# Log script output while still seeing it on the terminal
./deploy.sh | tee deploy_$(date +%Y%m%d).log
Here Documents¶
Pass multi-line input to a command:
# Single quotes on the delimiter prevent expansion
cat << 'EOF'
This is $literal
No $(expansion) here
EOF
Process Substitution¶
<(command) runs a command and presents its output as a file-like input:
diff <(sort file1.txt) <(sort file2.txt) # compare sorted versions
comm <(sort file1.txt) <(sort file2.txt) # find common/unique lines
Common Mistakes¶
Useless use of cat
cat file | command is equivalent to command < file. The cat is unnecessary and slightly slower. Use command < file or command file when possible.
Swallowing stderr in pipelines
In command1 | command2, only command1's stdout goes to command2. Errors from command1 still go to the terminal. To capture both: command1 2>&1 | command2.
Practice Exercises¶
Warm-Up (run and observe)¶
- Run
echo "hello" > /tmp/test.txtthencat /tmp/test.txt. Run it again — does the file grow or stay the same size? - Run
ls /nonexistent 2>/dev/null. What do you see? Where did the error message go? - Run
cat /etc/passwd | head -5thenhead -5 /etc/passwd. Are they equivalent? Which is more efficient?
Main (write a short script)¶
Create ~/scripts/monitor.sh that logs system metrics to a file and the terminal:
#!/usr/bin/env bash
set -euo pipefail
LOGFILE="/tmp/monitor_$(date +%Y%m%d_%H%M%S).log"
{
echo "=== System Report: $(date) ==="
echo "Hostname: $(hostname)"
echo "Uptime: $(uptime -p)"
echo ""
echo "=== Disk Usage ==="
df -h
echo ""
echo "=== Memory ==="
free -h
echo ""
echo "=== Top 5 Processes by CPU ==="
ps aux --sort=-%cpu | head -6
} | tee "$LOGFILE"
echo ""
echo "Report saved to: $LOGFILE"
Stretch¶
- Use process substitution to find files that are in
dir1but not indir2:comm -23 <(ls dir1 | sort) <(ls dir2 | sort). - Write a pipeline that finds the 3 largest files in
/var/log, showing only filename and size. - What does
exec > logfile.txt 2>&1do at the start of a script? How would you restore stdout after using it?
Interview Questions¶
- What is the difference between
>and>>?
Show answer
> overwrites — it truncates the file to zero length before writing. >> appends — it adds new content at the end of the existing file. If the file does not exist, both create it.
- Why does
var=$(command | grep pattern)sometimes not capture what you expect?
Show answer
Pipeline commands run in subshells. If var is set inside a pipeline, it is not visible outside. Capture the entire pipeline output: var=$(command | grep pattern) works correctly; the assignment captures the final stdout of the pipeline.
- What does
2>&1mean?
Show answer
"Redirect file descriptor 2 (stderr) to wherever file descriptor 1 (stdout) is currently pointing." The &1 means "the current destination of fd 1," not the literal string 1. It is used to merge stderr into stdout so both go to the same place.