File Operations at Scale¶
Batch renaming 1,000 files, syncing directories across machines, compressing archives — these are the tasks where shell scripting pays back its learning curve in minutes.
Learning Objectives¶
- Batch rename files safely with loops and parameter expansion
- Sync directories with
rsync - Create and extract archives with
tar,gzip, andzip - Verify file integrity with checksums (
md5sum,sha256sum) - Handle filenames with spaces, special characters, and Unicode
Batch Renaming Files¶
Safe Rename Pattern¶
#!/usr/bin/env bash
set -euo pipefail
# Rename all .jpeg files to .jpg
for file in *.jpeg; do
[[ -e "$file" ]] || continue
newname="${file%.jpeg}.jpg"
mv -v -- "$file" "$newname"
done
The -- before $file prevents filenames starting with - from being treated as flags.
Add Timestamp Prefix¶
TIMESTAMP=$(date +%Y%m%d)
for file in *.log; do
[[ -f "$file" ]] || continue
mv -- "$file" "${TIMESTAMP}_${file}"
done
Lowercase All Filenames¶
for file in *; do
[[ -f "$file" ]] || continue
lower="${file,,}" # bash 4+ lowercase expansion
[[ "$file" == "$lower" ]] || mv -- "$file" "$lower"
done
Test with echo first
Before running destructive mv or rm in a loop, replace mv with echo mv to preview what would happen. Only remove echo when you are confident the logic is correct.
rsync — Synchronize Directories¶
rsync copies only changed files, making it ideal for backups and deployments.
# Basic sync (local to local)
rsync -av source/ destination/
# Common flags
rsync -avz source/ user@remote:~/dest/ # compress during transfer
rsync -av --delete source/ dest/ # mirror: delete files not in source
rsync -av --exclude='*.log' source/ dest/ # skip log files
rsync -av --dry-run source/ dest/ # preview without copying
Key flags:
- -a — archive mode: recursive, preserves permissions, timestamps, symlinks
- -v — verbose
- -z — compress during transfer (useful over slow links)
- --delete — remove files from dest that do not exist in source
- --dry-run — show what would happen without doing it
Trailing slash on source matters
rsync -av src/ dest/ copies the contents of src into dest. rsync -av src dest/ copies the src directory itself into dest, creating dest/src/. The trailing slash on the source is a common gotcha.
Archives with tar¶
# Create an archive
tar -czf archive.tar.gz /path/to/directory/ # gzip compressed
tar -cjf archive.tar.bz2 /path/to/directory/ # bzip2 compressed
tar -cJf archive.tar.xz /path/to/directory/ # xz compressed (smaller)
# Extract an archive
tar -xzf archive.tar.gz # extract here
tar -xzf archive.tar.gz -C /target/dir/ # extract to specific dir
# List contents without extracting
tar -tzf archive.tar.gz
Flag mnemonics: c=create, x=extract, t=list, z=gzip, j=bzip2, J=xz, f=filename, v=verbose.
Timestamped Backup¶
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
tar -czf "backup_${TIMESTAMP}.tar.gz" ~/documents/
echo "Archive: backup_${TIMESTAMP}.tar.gz ($(du -sh "backup_${TIMESTAMP}.tar.gz" | cut -f1))"
Checksums¶
# Generate
md5sum file.txt > file.txt.md5
sha256sum archive.tar.gz > archive.tar.gz.sha256
# Verify
md5sum -c file.txt.md5 # OK if it prints "file.txt: OK"
sha256sum -c archive.sha256
# Generate for multiple files
sha256sum *.tar.gz > checksums.sha256
sha256sum -c checksums.sha256
Handling Problematic Filenames¶
Filenames can contain spaces, newlines, special characters, and Unicode. These patterns handle all of them safely:
# Always use -- to stop flag processing
mv -- "$file" "$newname"
rm -- "$file"
# Use null-delimited lists with find
find . -name "*.tmp" -print0 | xargs -0 rm --
# Quote every variable reference
for f in "$dir"/*; do
process "$f" # not $f
done
Practice Exercises¶
Main (write a short script)¶
Build the File Organizer project prototype:
#!/usr/bin/env bash
set -euo pipefail
SOURCE="${1:-.}"
DRY_RUN=false
[[ "${2:-}" == "--dry-run" ]] && DRY_RUN=true
declare -A EXTENSIONS
EXTENSIONS=([jpg]="images" [jpeg]="images" [png]="images"
[mp3]="music" [wav]="music" [flac]="music"
[pdf]="documents" [docx]="documents" [txt]="documents"
[zip]="archives" [tar]="archives" [gz]="archives")
for file in "$SOURCE"/*; do
[[ -f "$file" ]] || continue
ext="${file##*.}"
ext="${ext,,}"
dest_dir="${EXTENSIONS[$ext]:-other}"
if "$DRY_RUN"; then
echo "Would move: $file -> $SOURCE/$dest_dir/"
else
mkdir -p "$SOURCE/$dest_dir"
mv -- "$file" "$SOURCE/$dest_dir/"
fi
done
Stretch¶
- Add incremental backup logic to your backup script: only archive files newer than the last backup run (use
find -newer). - Research
rsync --link-dest. How does it enable space-efficient incremental backups?
Interview Questions¶
- What does the trailing slash mean in
rsync -av src/ dest/?
Show answer
A trailing slash on the source (src/) means "copy the contents of src." Without it (src), rsync copies the directory itself. So rsync src/ dest/ and rsync src dest/ produce different results: the first puts files directly in dest, the second creates dest/src.
- What does
tar -czfstand for?
Show answer
c = create, z = compress with gzip, f = the next argument is the filename. So tar -czf archive.tar.gz dir/ creates a gzip-compressed tar archive named archive.tar.gz from dir/. To extract: tar -xzf archive.tar.gz.
- Why use
find -print0 | xargs -0instead of aforloop orfind -exec?
Show answer
find -print0 | xargs -0 handles filenames with spaces, newlines, and special characters correctly by using null bytes as delimiters. It is also more efficient than -exec ... \; (which spawns a process per file) — xargs -0 batches multiple files per invocation. A for loop over $(find ...) breaks on filenames with spaces.