webarchive

module
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2025 License: BSD-3-Clause

README

Own Webarchive

Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.

Supported store formats

  • headers — save all headers from response
  • pdf — save page in pdf
  • single_file — save html and all its resources (css,js,images) into one html file

Requirements

  • Golang 1.19 or higher
  • wkhtmltopdf binary in $PATH (to save pages in pdf)

Configuration

The service can be configured via environment variables. There is a list of available variables:

  • DB
    • DB_PATH — path for the database files (default ./db)
  • LOGGING
    • LOGGING_DEBUG — enable debug logs (default false)
  • API
    • API_ADDRESS — address the API server will listen (default 0.0.0.0:5001)
  • UI
    • UI_ENABLED — Enable builtin web UI (default true)
    • UI_PREFIX — Prefix for the web UI (default /)
    • UI_THEME — UI theme name (default basic). No other values available yet
  • PDF
    • PDF_LANDSCAPE — use landscape page orientation instead of portrait (default false)
    • PDF_GRAYSCALE — use grayscale filter for the output pdf (default false)
    • PDF_MEDIA_PRINT — use media type print for the request (default true)
    • PDF_ZOOM — zoom page (default 1.0 i.e. no actual zoom)
    • PDF_VIEWPORT — use specified viewport value (default 1280x720)
    • PDF_DPI — use specified DPI value for the output pdf (default 150)
    • PDF_FILENAME — use specified name for output pdf file (default page.pdf)

Note: Prefix WEBARCHIVE_ can be used with the environment variable names in case of any conflicts.

⚡ One-Click Deploy

Cloud Provider Deploy Button
AWS
DigitalOcean
Render

Generated by DeployStack.io

Usage

1. Start the server
Start without docker
go run ./cmd/server/main.go
Change API address
API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go
Start in docker
docker compose up -d webarchive
2. Add a page
curl -X POST --location "http://localhost:5001/api/v1/pages" \
    -H "Content-Type: application/json" \
    -d "{
          \"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
          \"formats\": [
            \"pdf\",
            \"headers\"
          ]
        }" | jq .

or

curl -X POST --location \
  "http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"
3. Get the page's info
curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .

where $page_id — value of the id field from previous command response. If status field in response is success (or with_errors) - the results field will contain all processed formats with ids of the stored files.

4. Open file in browser
xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"

Where $page_id — value of the id field from previous command response, and $file_id — the id of interesting file.

5. List all stored pages
curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .

Roadmap

  • Save page to pdf
  • Save URL headers
  • Save page to the single-page html
  • Save page to html with separate resource files (?)
  • Basic web UI
  • Optional authentication
  • Multi-user access
  • Support SQL database with or without separate files storage
  • Tags/Categories
  • Save page to markdown

Directories

Path Synopsis
adapters
api
openapi
Code generated by ogen, DO NOT EDIT.
Code generated by ogen, DO NOT EDIT.
cmd
service command
ports

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL