Introduction: The Importance of Dockerfile

In Part 3 of the Docker series, we deeply cover the Dockerfile, which is the core element for creating container images. A Dockerfile is a blueprint for containerizing applications, and writing it well is the first step to efficient and secure container operations.

In this article, we will learn about Dockerfile basics, key instructions, multi-stage builds, build cache optimization, and security-focused best practices in detail.

1. What is a Dockerfile?

1.1 Definition of Dockerfile

A Dockerfile is a text file for building Docker images. It defines all the instructions needed to create an image sequentially, from selecting the base image to installing applications, configuring environments, and specifying execution commands.

Key features of Dockerfile:

  • Reproducibility: The same image can be built anywhere anytime with the same Dockerfile
  • Version Control: As a text file, change history can be tracked with version control systems like Git
  • Automation: Images can be automatically built and deployed in CI/CD pipelines
  • Documentation: The Dockerfile itself serves as documentation for the application's runtime environment

1.2 Basic Structure

Example Dockerfile for a simple Node.js application:

# Specify base image
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy package files and install dependencies
COPY package*.json ./
RUN npm install

# Copy source code
COPY . .

# Set environment variable
ENV NODE_ENV=production

# Expose port
EXPOSE 3000

# Execution command
CMD ["node", "server.js"]

2. Key Dockerfile Instructions

2.1 FROM - Specify Base Image

FROM is the first instruction in every Dockerfile, specifying the base image for the new image.

# Basic format
FROM <image>:<tag>

# Examples
FROM ubuntu:22.04
FROM python:3.11-slim
FROM node:20-alpine
FROM scratch  # Start from empty image

Considerations when selecting base image:

  • Alpine images: Very small size (about 5MB), high security, some compatibility issues possible
  • Slim images: Lightweight versions with unnecessary packages removed
  • Official images: Recommended to use official images verified on Docker Hub

2.2 RUN - Execute Commands

RUN executes commands during the image build process and saves the results as a new layer.

# Shell format
RUN apt-get update && apt-get install -y curl

# Exec format
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "curl"]

# Combine multiple commands into one RUN (minimize layers)
RUN apt-get update \
    && apt-get install -y \
        curl \
        vim \
        git \
    && rm -rf /var/lib/apt/lists/*

2.3 COPY vs ADD - Copy Files

Both COPY and ADD copy files to the image, but they have different purposes.

# COPY - Copy local files/directories to image
COPY src/ /app/src/
COPY package.json package-lock.json ./

# ADD - COPY features + additional capabilities
# 1. Download files from URL
ADD https://example.com/file.tar.gz /tmp/

# 2. Auto-extract compressed files (tar, gzip, bzip2, xz)
ADD archive.tar.gz /app/

Recommendation: Use COPY in most cases. Use ADD only when URL download or automatic extraction is needed.

2.4 WORKDIR - Set Working Directory

WORKDIR sets the working directory for subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.

# Set working directory
WORKDIR /app

# Auto-created if doesn't exist
WORKDIR /app/src

# Relative path also possible (based on previous WORKDIR)
WORKDIR subdir  # /app/src/subdir

2.5 ENV - Set Environment Variables

ENV sets environment variables to be used when the container runs.

# Single environment variable
ENV NODE_ENV=production

# Multiple environment variables (one line)
ENV NODE_ENV=production PORT=3000

# Multiple environment variables (multiple lines)
ENV NODE_ENV=production \
    PORT=3000 \
    DB_HOST=localhost

2.6 EXPOSE - Document Ports

EXPOSE documents the ports the container will use. Actual port binding is done with docker run -p.

# Single port
EXPOSE 3000

# Multiple ports
EXPOSE 80 443

# UDP port
EXPOSE 53/udp

2.7 CMD vs ENTRYPOINT - Execution Commands

CMD and ENTRYPOINT define the command to run when the container starts.

# CMD - Default execution command (can be overridden by docker run)
CMD ["node", "server.js"]
CMD ["npm", "start"]

# ENTRYPOINT - Fixed execution command
ENTRYPOINT ["python", "app.py"]

# CMD and ENTRYPOINT combination
ENTRYPOINT ["python"]
CMD ["app.py"]  # Can be changed with docker run image other.py
Instruction Role Override
CMD Default command/arguments Can be overridden by docker run arguments
ENTRYPOINT Fixed executable Can only be changed with --entrypoint option

2.8 Other Useful Instructions

# ARG - Build-time variable (not stored in image)
ARG VERSION=1.0
ARG BUILD_DATE

# LABEL - Add metadata
LABEL maintainer="dev@example.com"
LABEL version="1.0"
LABEL description="My application"

# USER - Change execution user
USER node
USER 1001:1001

# VOLUME - Declare volume mount point
VOLUME /data
VOLUME ["/var/log", "/var/db"]

# HEALTHCHECK - Container health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
    CMD curl -f http://localhost:3000/health || exit 1

3. Multi-Stage Builds

3.1 What is Multi-Stage Build?

Multi-stage build is a technique that defines multiple stages in a single Dockerfile to separate tools needed for building from files needed for final execution. This significantly reduces the final image size.

3.2 Practical Example: Go Application

# Build stage
FROM golang:1.21 AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .

# Run stage
FROM alpine:latest

RUN apk --no-cache add ca-certificates
WORKDIR /root/

# Copy only the binary from build stage
COPY --from=builder /app/main .

CMD ["./main"]

3.3 Practical Example: Node.js Application

# Dependencies stage
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Run stage
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production

# Copy only production dependencies
COPY --from=deps /app/node_modules ./node_modules
# Copy only build output
COPY --from=builder /app/dist ./dist
COPY package.json ./

USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

3.4 Advantages of Multi-Stage Builds

  • Reduced Image Size: Only includes files needed for execution, without build tools
  • Enhanced Security: Source code and credentials used during build are not included in the final image
  • Build Cache Utilization: Each stage can be cached independently

4. .dockerignore File

4.1 What is .dockerignore?

The .dockerignore file specifies files and directories to exclude from the Docker build context. It uses syntax similar to .gitignore.

4.2 .dockerignore Example

# Git related
.git
.gitignore

# Node.js
node_modules
npm-debug.log

# Build artifacts (if not using multi-stage)
dist
build

# Environment configuration
.env
.env.local
*.env

# IDE
.vscode
.idea
*.swp

# Tests
coverage
__tests__
*.test.js

# Documentation
README.md
docs

# Docker related
Dockerfile*
docker-compose*
.docker

4.3 Importance of .dockerignore

  • Faster Build: Unnecessary files are not included in the build context, reducing transfer time
  • Smaller Image Size: Prevents copying unnecessary files when using COPY instruction
  • Security: Prevents sensitive files like .env and certificates from being included in the image

5. Image Building (docker build)

5.1 Basic Build Command

# Build with Dockerfile in current directory
docker build -t myapp:1.0 .

# Specify different Dockerfile
docker build -f Dockerfile.prod -t myapp:prod .

# Pass build arguments
docker build --build-arg VERSION=2.0 -t myapp:2.0 .

# Ignore build cache
docker build --no-cache -t myapp:latest .

# Build only up to specific stage
docker build --target builder -t myapp:builder .

5.2 Build Context

The build context is the set of files and directories passed to the docker build command. The last argument (.) specifies the build context path.

# Use current directory as build context
docker build -t myapp .

# Use specific directory as build context
docker build -t myapp /path/to/context

# Build directly from Git repository
docker build -t myapp https://github.com/user/repo.git

6. Utilizing Build Cache

6.1 How Cache Works

Docker caches the result of each instruction as a layer during build. Instructions identical to previous builds reuse cached layers.

Conditions that invalidate cache:

  • When the instruction is changed
  • When source files for COPY/ADD are changed
  • When cache for a parent layer is invalidated

6.2 Tips for Improving Cache Efficiency

# Bad example: npm install runs again when source code changes
COPY . .
RUN npm install

# Good example: Uses cache if package.json hasn't changed
COPY package*.json ./
RUN npm install
COPY . .

Place frequently changing layers at the bottom of the Dockerfile to improve cache efficiency.

7. Image Tagging and Version Management

7.1 Tag Strategy

# Version tags
docker build -t myapp:1.0.0 .
docker build -t myapp:1.0 .
docker build -t myapp:1 .

# Environment-specific tags
docker build -t myapp:prod .
docker build -t myapp:staging .
docker build -t myapp:dev .

# Git commit hash tag
docker build -t myapp:$(git rev-parse --short HEAD) .

# Date tag
docker build -t myapp:$(date +%Y%m%d) .

8. Dockerfile Best Practices

8.1 Minimize Layers

# Bad example: Creates unnecessary layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y vim
RUN rm -rf /var/lib/apt/lists/*

# Good example: Combined into one RUN
RUN apt-get update \
    && apt-get install -y curl vim \
    && rm -rf /var/lib/apt/lists/*

8.2 Security Considerations

# 1. Run as non-root user
FROM node:20-alpine
RUN addgroup -g 1001 -S nodejs \
    && adduser -S nodejs -u 1001
USER nodejs

# 2. Use trusted base images
FROM node:20-alpine@sha256:abc123...

# 3. Don't install unnecessary packages
RUN apt-get install --no-install-recommends -y curl

# 4. Pass sensitive information at runtime instead of ENV
# Bad example
ENV DATABASE_PASSWORD=secret123

# Good example: Use docker run -e DATABASE_PASSWORD=xxx

# 5. Copy only specific files instead of COPY .
COPY src/ /app/src/
COPY package.json /app/

8.3 Image Size Optimization

  • Use smallest possible base image (Alpine, distroless)
  • Utilize multi-stage builds
  • Delete unnecessary files (cache, temporary files)
  • Actively use .dockerignore

Conclusion

Dockerfile writing is a core skill for containerization. Summary of topics covered:

  • Understanding Basic Instructions: Roles and usage of FROM, RUN, COPY, WORKDIR, ENV, EXPOSE, CMD, ENTRYPOINT
  • Multi-Stage Builds: Separate build and runtime environments to minimize image size
  • .dockerignore: Exclude unnecessary files from builds
  • Build Cache: Optimize layer order to improve build speed
  • Best Practices: Write Dockerfiles considering security, size, and maintainability

In Part 4, we will learn how to efficiently manage multiple containers using Docker Compose.