Introduction: Object Storage in the Self-Hosting Era

Since Amazon S3 (Simple Storage Service) launched in 2006, object storage has become a cornerstone of modern infrastructure. For storing and managing unstructured data such as images, videos, backups, and log files, the S3 API has become the de facto standard, with countless applications and tools supporting S3-compatible interfaces out of the box.

However, reliance on public cloud services brings fundamental challenges: skyrocketing costs, data sovereignty concerns, and vendor lock-in. In response, demand for self-hosting S3-compatible storage on on-premises or personal servers has been steadily growing. MinIO, long the leading self-hosted option, earned widespread adoption thanks to its strong performance and simple deployment. But its transition to an AGPLv3 + commercial dual-licensing model prompted many in the open-source community to look elsewhere, and the recent shift of its GitHub repository to maintenance mode has only intensified the search for alternatives.

Against this backdrop, one project has been drawing considerable attention: Garage. Developed by Deuxfleurs, a French nonprofit organization, Garage positions itself as "an S3 object store that can run reliably even outside the data center," offering a uniquely tailored approach for small-scale self-hosting environments. In this article, we will take a comprehensive look at Garage's core philosophy, internal architecture, installation and operations guide, performance tuning, and how it stacks up against competing solutions.

1. What Is Garage?

1.1 Garage's Philosophy and Design Goals

Garage is an S3-compatible distributed object storage system that has been actively developed since its first release in 2020 by Deuxfleurs, a small nonprofit hosting provider based in France. Deuxfleurs uses Garage in their own production environment, operating a multi-site cluster consisting of 9 nodes distributed across 3 physical locations.

Garage's design is built on four core principles:

  • Internet-enabled: Designed for multi-site environments connected via standard internet connections (e.g., FTTH) spanning data centers, offices, and homes. It works perfectly well over residential internet -- no dedicated high-speed links required.
  • Self-contained and lightweight: Has no external dependencies (no etcd, ZooKeeper, PostgreSQL, etc.) and ships as a single binary that can run anywhere. It integrates naturally into hyperconverged infrastructure.
  • Highly resilient: Provides strong resilience against network failures, network latency, disk failures, and even operator mistakes.
  • Simple: Aims to be a system that is easy to understand, easy to operate, and easy to debug.

One particularly noteworthy detail is that the project has secured stable development funding through EU grants through 2025. In a landscape where the sustainability of open-source projects is frequently called into question, this bolsters confidence in Garage's long-term viability.

1.2 Key Features at a Glance

Here are the main features that Garage provides:

  • S3-compatible API: Supports core S3 operations including GetObject, PutObject, DeleteObject, ListObjects, and Multipart Upload. However, some advanced features such as object versioning and object lock are not yet supported.
  • Geo-distributed replication: Recognizes the physical location of each node and distributes data replicas across geographically separate sites. This is a major advantage for disaster recovery (DR) scenarios.
  • Static website hosting: Garage can serve the contents of a bucket as a static website without a separate web server. Domain names map directly to bucket names for a simple setup. This is a unique feature of Garage that neither MinIO nor Ceph offers.
  • Admin REST API: Provides a full REST API for programmatic cluster management.
  • Monitoring support: Supports internal metrics in Prometheus format and trace exporting in OpenTelemetry format.
  • K2V (experimental): An experimental API for storing and querying large numbers of small key-value pairs, complementing the S3 API for metadata management use cases.
  • Zstd compression and data deduplication: Optional Zstd compression can be applied to all stored data, and identical data blocks are automatically deduplicated.

At the same time, Garage explicitly declares several non-goals:

  • No pursuit of feature expansion beyond the S3 API
  • No erasure coding or other storage optimization techniques (simple replication only)
  • No POSIX filesystem compatibility

2. Deep Dive into Garage Architecture

2.1 Distributed Systems Foundations

Garage's architecture draws heavily from cutting-edge research in distributed systems. Its three foundational technologies are:

  • Amazon Dynamo pattern: Borrows the design principles of highly available key-value stores. Following Dynamo's "Always Writable" philosophy, Garage is designed so that reads and writes can continue even when some nodes are down.
  • CRDT (Conflict-Free Replicated Data Types): Uses data structures that converge to a consistent final state without coordination, even when multiple nodes modify data simultaneously. This allows each node to operate independently during network partitions.
  • Maglev load balancing: Utilizes Google's Maglev hashing algorithm to distribute requests efficiently. Consistent hashing ensures that only minimal data redistribution occurs when nodes are added or removed.

2.2 Data Replication and Consistency Model

Garage adopts an eventual consistency model while providing pragmatically strong guarantees:

  • Quorum-based reads and writes: Instead of a single source of truth, both reads and writes are processed through quorum consensus. For example, with replication_factor = 3, a write operation is sent to 3 nodes but returns success once 2 nodes acknowledge it; the third node's write completes asynchronously.
  • Repair mechanism: When inconsistencies are detected, an automatic repair mechanism restores consistency.
  • Tombstone markers: Delete operations in the distributed environment use tombstones (logical deletion markers). This prevents the "deleted data reappearing from another node" problem.

The replication factor is user-configurable, with replication_factor = 3 being recommended in most cases. This means the system can tolerate the complete failure of one node while maintaining data availability. For storage-constrained environments, replication_factor = 2 is an option, but caution is advised due to reduced resilience.

2.3 Node Communication and Cluster Management

Garage's approach to cluster management differs significantly from traditional distributed storage systems:

  • Gossip protocol: Instead of relying on a central coordinator (etcd, ZooKeeper, Consul, etc.), nodes share cluster state through direct peer-to-peer communication. This dramatically reduces operational complexity and eliminates single points of failure (SPOF).
  • Rust single binary: The entire system is written in Rust and compiles to a single binary. It is highly memory-efficient and can run on hardware as modest as a Raspberry Pi. No JVM, Python runtime, or other external dependencies required.
  • Heterogeneous hardware support: While most distributed storage systems assume uniform high-performance hardware, Garage allows you to mix machines with different specs in the same cluster. It uses each node's storage capacity and physical location information to automatically compute an optimal data placement plan.
  • Automatic rebalancing: When nodes are added or removed, Garage automatically redistributes data to maintain the desired replication level.

3. Installation and Cluster Setup Guide

3.1 Prerequisites

The minimum requirements for building a Garage cluster are as follows:

  • Number of nodes: A minimum of 3 machines is required when using replication_factor = 3. Single-node test environments are possible, but multiple nodes are strongly recommended for production.
  • Network: All nodes must be able to communicate via IP. Nodes behind NAT may not have public IPv4 addresses, so using IPv6 is preferred when available.
  • VPN solutions: If IPv6 is unavailable in NAT environments, mesh VPNs such as Nebula, Yggdrasil, or WireGuard can be used to establish direct communication between nodes.
  • Storage: Separating SSD for metadata storage and HDD for data storage is beneficial for performance.

3.2 Installation Methods

Garage can be installed in several ways:

Docker container deployment:

# docker-compose.yml example
version: '3'
services:
  garage:
    image: dxflrs/garage:v1.0
    ports:
      - "3900:3900"   # S3 API
      - "3901:3901"   # RPC
      - "3902:3902"   # Admin API
      - "3903:3903"   # Web (static site)
    volumes:
      - ./garage.toml:/etc/garage.toml
      - ./meta:/var/lib/garage/meta
      - ./data:/var/lib/garage/data
    restart: unless-stopped

Systemd service registration:

[Unit]
Description=Garage S3-compatible object store
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/garage server
Environment=GARAGE_CONFIG_FILE=/etc/garage.toml
Restart=always
RestartSec=5
StateDirectory=garage

[Install]
WantedBy=multi-user.target

Kubernetes deployment: Garage also supports Kubernetes deployment via Helm charts. It runs as a StatefulSet, with PersistentVolumes managing metadata and data storage.

3.3 Key Configuration Options (garage.toml)

Garage's configuration is managed through a single garage.toml file. The key settings are:

# Metadata storage path (SSD recommended)
metadata_dir = "/var/lib/garage/meta"

# Data block storage path (HDD acceptable)
data_dir = "/var/lib/garage/data"

# Database engine selection
db_engine = "lmdb"

# RPC communication settings
rpc_bind_addr = "[::]:3901"
rpc_public_addr = "192.168.1.10:3901"
rpc_secret = "enter_your_32_byte_hex_secret_here"

# Replication settings
replication_factor = 3

# Block size (default 1MB)
block_size = 1048576

# S3 API endpoint
[s3_api]
api_bind_addr = "[::]:3900"
s3_region = "garage"
root_domain = ".s3.garage.localhost"

# S3 Web (static site hosting)
[s3_web]
bind_addr = "[::]:3903"
root_domain = ".web.garage.localhost"

# Admin API
[admin]
api_bind_addr = "[::]:3902"
admin_token = "your_admin_token"
metrics_token = "your_metrics_token"

After writing the configuration file, cluster setup proceeds through the garage CLI:

# Check cluster status
garage status

# Assign storage capacity to a node (e.g., 100GB, zone 'dc1', tag 'node1')
garage layout assign -z dc1 -c 100G -t node1 <node-id>

# Apply layout changes
garage layout apply --version 1

# Create a bucket
garage bucket create my-bucket

# Create an access key
garage key create my-app-key

# Grant bucket permissions to the key
garage bucket allow --read --write my-bucket --key my-app-key

4. Tuning and Operational Optimization

4.1 Choosing a Metadata Engine

Since v0.8.0, Garage supports multiple metadata storage engines. Here are the key considerations for choosing one:

EngineCharacteristicsRecommended Environment
LMDB (default)Best performance, most extensively tested. Architecture-dependent; no 32-bit supportProduction clusters with replication_factor >= 2
SQLiteGood portability, resilient against unclean shutdowns. Slower than LMDBSingle-node setups or heterogeneous architecture environments
Fjall (experimental)LSM-tree based, theoretically high write throughputTesting environments only (not for production use)

An important caveat when using LMDB: the database file can become corrupted after an unclean shutdown, so regular metadata snapshots are essential. Garage v0.9.4 introduced a built-in automatic snapshot feature, and leveraging BTRFS or ZFS filesystem snapshots is also a good approach. By default, snapshots are stored in the snapshots/ subdirectory under metadata_dir. Since these occupy roughly the same amount of space as the original data, it is recommended to redirect them to a separate storage path with adequate capacity.

4.2 Performance Tuning

Here are the key tuning parameters for optimizing Garage's performance:

  • block_size adjustment: Garage splits stored objects into contiguous chunks of block_size (default 1MB). For workloads that primarily deal with large files, increasing this value reduces the number of chunks. For environments with many small files, the default is typically the best choice.
  • Write buffer: The memory region that buffers asynchronous writes to the third node in the quorum write process. The default is 256 MiB, and increasing this value can improve throughput in write-heavy environments.
  • SSD/HDD separation strategy: Placing metadata (metadata_dir) on SSD and data blocks (data_dir) on HDD provides the optimal cost-to-performance ratio. Metadata access speed has a significant impact on Garage's overall response time.
  • Filesystem selection: BTRFS or ZFS is recommended for metadata storage. Their checksum verification, snapshot capabilities, and self-healing features complement LMDB's stability. For data storage, ext4 is perfectly adequate.

4.3 Monitoring and Logging

Garage's monitoring and logging can be configured as follows:

  • Log level: Controlled via the RUST_LOG environment variable. Available levels are error, warn, info (default), debug, and trace. Setting RUST_LOG=debug when S3 API calls are not behaving as expected will provide detailed debugging information.
  • Prometheus metrics: Prometheus-formatted metrics can be scraped from the /metrics endpoint on the Admin API. Integration with Grafana dashboards enables visual monitoring of cluster health.
  • OpenTelemetry tracing: OpenTelemetry-compatible traces can be exported to track the path of distributed requests, enabling performance bottleneck analysis through tracing backends such as Jaeger or Zipkin.

5. Garage vs the Competition

5.1 Garage vs MinIO

MinIO has long been the go-to choice for self-hosted S3 storage, but recent licensing changes and open-source rollbacks have sparked controversy. Here are the key differences between the two:

  • Resource requirements: Garage is lightweight enough to run on a Raspberry Pi, whereas MinIO requires mid-to-high-end hardware for adequate performance.
  • Performance: In raw throughput benchmarks, MinIO has the edge. However, Garage offers superior resource efficiency -- better bang for your buck.
  • Geo-distribution: This is Garage's biggest differentiator. Setting up multi-site deployment in MinIO requires complex architectural design, while Garage supports geo-distribution as a core design principle out of the box.
  • Licensing: Both use AGPLv3, but MinIO has moved to a dual-licensing model that requires a commercial license, and has recently been scaling back features in the open-source version.

5.2 Garage vs Ceph RGW

Ceph is the most mature and feature-rich distributed storage system available, but that comes with proportional complexity:

  • Complexity: Ceph requires understanding numerous components and concepts -- OSD, MON, MDS, CRUSH algorithm, and more. It demands specialized expertise and has a steep learning curve. Garage can be configured with a single binary and one configuration file.
  • Storage efficiency: Ceph supports erasure coding, which can protect data with less overhead than replication. Garage currently supports simple replication only, so 3x replication requires three times the storage capacity.
  • Multi-tenancy: Ceph RGW provides enterprise-grade multi-tenancy with namespace isolation, resource quotas, and granular access controls. Garage uses a simpler per-bucket, per-key permission model.
  • Scope: Ceph is a unified solution offering object, block, and file storage, whereas Garage focuses exclusively on object storage.

5.3 Comparison Summary Table

CriteriaGarageMinIOCeph RGW
LanguageRustGoC++
LicenseAGPLv3AGPLv3 + CommercialLGPL (fully open source)
Deployment complexityVery low (single binary)Low to moderateHigh
S3 compatibilityCore features (expanding)Most completeExtensive
Erasure codingNot supported (3x replication)SupportedSupported (flexible K+M)
Geo-distributionCore featureRequires complex setupSupported (complex)
Resource requirementsVery lowMedium to highHigh
Static site hostingBuilt-inNot supportedNot supported
Best suited forSmall-to-medium self-hosting, geo-distributionHigh-performance single clusterLarge-scale unified storage

6. Practical Use Cases and Integrations

Since Garage provides an S3-compatible API, it integrates naturally with most applications that support S3. Here are the most common real-world scenarios:

  • Nextcloud external storage: Through Nextcloud's "External Storage" app, Garage can be connected as an S3-compatible backend. This stores user files on the Garage cluster instead of the Nextcloud server's local disk, providing both storage scalability and data resilience.
  • Mastodon/Misskey media storage: Stores media files (images, videos, etc.) uploaded by users on ActivityPub-based social media servers in Garage. Mastodon provides S3-compatible storage configuration by default, making integration straightforward.
  • Static website hosting: A unique Garage feature that lets you serve bucket contents as a website directly, without a separate Nginx or Apache server. Ideal for deploying build output from static site generators like Hugo, Jekyll, or Astro.
  • PeerTube video storage: Can be used as the video file storage backend for PeerTube, the decentralized video hosting platform.
  • Backup storage: Integrates with backup tools such as rclone, restic, and Duplicity to serve as a remote backup destination. Thanks to geo-distributed replication, Garage is well-suited as a core component of any disaster recovery (DR) strategy.
  • PowerDNS backend: There are reported cases of using Garage as a data backend for the PowerDNS DNS server.
  • GitLab/Gitea artifact storage: Build artifacts, container image layers, and other outputs generated by CI/CD pipelines can be stored in Garage, reducing the local disk burden on Git hosting servers.

Conclusion: When to Choose Garage

Garage is not a one-size-fits-all solution. However, under certain conditions, it can be a better choice than any other alternative. Garage shines in the following scenarios: when you want to minimize operational complexity in a small-to-medium self-hosting environment, when you need to unify machines distributed across multiple physical locations into a single storage cluster, when you want to make use of heterogeneous hardware (old PCs, Raspberry Pis, NAS devices, etc.), and when you need simple, robust object storage with zero external dependencies.

That said, Garage's current limitations are clear. It does not yet support object versioning, and the lack of erasure coding means that 3x replication is storage-inefficient. It does not provide a complete S3 API implementation, so compatibility issues may arise with workflows that rely on advanced S3 features. Moreover, its raw performance falls short of MinIO, making it unsuitable for high-throughput AI/ML pipelines or large-scale data analytics workloads.

Ultimately, choosing a storage solution is not a matter of comparing feature checklists alone. You must weigh workload characteristics, failure models, and operational constraints holistically. If you need large-scale enterprise unified storage, go with Ceph. If you need maximum performance from a single cluster, choose MinIO (or alternatives like RustFS, SeaweedFS). If you need simple, robust, geo-distributed storage for a small-scale self-hosting environment, choose Garage. As a "small but mighty" storage solution, Garage is steadily carving out its own distinct place in the self-hosting ecosystem.