ArchiveBox

A self-hosted solution to collect and save web pages offline. Import from browser history or bookmarks and save as HTML, PDF, media, and more.

ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites you care about offline. It helps you combat link rot and the degradation of online content by creating a personal, permanent archive where you retain full control over your data.

You can feed it URLs from a wide variety of sources, including:

  • Browser history and bookmarks
  • Link-saving services like Pocket or Pinboard
  • RSS feeds and social media
  • Manual URL lists

For each link, it saves a comprehensive snapshot in multiple redundant formats to ensure long-term accessibility. It captures the page as HTML, a single-file archive, a PDF, and a screenshot. It also extracts key content like article text, clones git repositories, and downloads media from sites like YouTube. All archived data is stored in standard, open formats, making it easy to access and browse.

Manage your collection through a user-friendly web interface, a powerful command-line tool, or programmatically via its Python API. The goal is to ensure the parts of the internet important to you are preserved in durable formats, safe from future disappearance.

Directory Structure

archivebox
data
etc
sonic.cfg
.env
docker-compose.yml

docker-compose.yml

services:
  archivebox:
    image: archivebox/archivebox:master
    command: server --quick-init 0.0.0.0:8000
    ports:
      - 8000:8000
    environment:
      - ALLOWED_HOSTS=*
      - MEDIA_MAX_SIZE=750m
      - SEARCH_BACKEND_ENGINE=sonic
      - SEARCH_BACKEND_HOST_NAME=sonic
      - SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
      - ADMIN_USERNAME=${ADMIN_USERNAME}
      - ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - SECRET_KEY=${SECRET_KEY}
      - PUID=1000
      - PGID=1000
    volumes:
      - ./data:/data
    depends_on:
      - sonic
    networks:
      - archivebox_network

  archivebox_scheduler:
    image: archivebox/archivebox:master
    command: schedule --foreground
    environment:
      - ALLOWED_HOSTS=*
      - MEDIA_MAX_SIZE=750m
      - SEARCH_BACKEND_ENGINE=sonic
      - SEARCH_BACKEND_HOST_NAME=sonic
      - SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
      - SECRET_KEY=${SECRET_KEY}
      - PUID=1000
      - PGID=1000
    volumes:
      - ./data:/data
    depends_on:
      - sonic
    networks:
      - archivebox_network

  sonic:
    image: valeriansaliou/sonic:v1.4.9
    expose:
      - 1491
    environment:
      - SEARCH_BACKEND_PASSWORD=${SONIC_PASSWORD}
    volumes:
      - ./etc/sonic.cfg:/etc/sonic.cfg
      - ./data/sonic:/var/lib/sonic/store
    networks:
      - archivebox_network

networks:
  archivebox_network:
    driver: bridge

.env

SONIC_PASSWORD=YourSecretSonicPassword123
ADMIN_USERNAME=admin
ADMIN_PASSWORD=YourSecureAdminPassword
SECRET_KEY=YourRandomSecretKeyHere
Categories:

Share:

Ad
Favicon

 

  
 

Similar to ArchiveBox

Favicon

 

  
  
Favicon

 

  
  
Favicon