ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites you care about offline. It helps you combat link rot and the degradation of online content by creating a personal, permanent archive where you retain full control over your data.
You can feed it URLs from a wide variety of sources, including:
For each link, it saves a comprehensive snapshot in multiple redundant formats to ensure long-term accessibility. It captures the page as HTML, a single-file archive, a PDF, and a screenshot. It also extracts key content like article text, clones git repositories, and downloads media from sites like YouTube. All archived data is stored in standard, open formats, making it easy to access and browse.
Manage your collection through a user-friendly web interface, a powerful command-line tool, or programmatically via its Python API. The goal is to ensure the parts of the internet important to you are preserved in durable formats, safe from future disappearance.
services:
archivebox:
image: archivebox/archivebox:master
command: server --quick-init 0.0.0.0:8000
ports:
- 8000:8000
environment:
- ALLOWED_HOSTS=*
- MEDIA_MAX_SIZE=750m
- SEARCH_BACKEND_ENGINE=sonic
- SEARCH_BACKEND_HOST_NAME=sonic
- SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
- ADMIN_USERNAME=${ADMIN_USERNAME}
- ADMIN_PASSWORD=${ADMIN_PASSWORD}
- SECRET_KEY=${SECRET_KEY}
- PUID=1000
- PGID=1000
volumes:
- ./data:/data
depends_on:
- sonic
networks:
- archivebox_network
archivebox_scheduler:
image: archivebox/archivebox:master
command: schedule --foreground
environment:
- ALLOWED_HOSTS=*
- MEDIA_MAX_SIZE=750m
- SEARCH_BACKEND_ENGINE=sonic
- SEARCH_BACKEND_HOST_NAME=sonic
- SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
- SECRET_KEY=${SECRET_KEY}
- PUID=1000
- PGID=1000
volumes:
- ./data:/data
depends_on:
- sonic
networks:
- archivebox_network
sonic:
image: valeriansaliou/sonic:v1.4.9
expose:
- 1491
environment:
- SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
volumes:
- ./etc/sonic.cfg:/etc/sonic.cfg
- ./data/sonic:/var/lib/sonic/store
networks:
- archivebox_network
networks:
archivebox_network:
driver: bridgeSONIC_PASSWORD=YourSecretSonicPassword123
ADMIN_USERNAME=admin
ADMIN_PASSWORD=YourSecureAdminPassword
SECRET_KEY=YourRandomSecretKeyHereAuto-fetched about 22 hours ago
Auto-fetched about 22 hours ago