Commit Graph

4967 Commits

Author SHA1 Message Date
Nick Sweeting
311e4340ec Fix add CLI input handling and lint regressions 2026-03-15 19:04:13 -07:00
Nick Sweeting
5f0cfe5251 add new persona tests 2026-03-15 18:46:45 -07:00
Nick Sweeting
934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting
f97725d16f Mark version as 0.9.10rc0 (pre-release) per PEP 440
Uses rc suffix so docs system correctly identifies 0.9.x as pre-release.
Remove the rc suffix when ready to declare 0.9.x stable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:44:31 -07:00
Nick Sweeting
70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting
7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting
e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting
21a0a27091 Remove 7 dead functions and 4 unused imports from hooks.py
Dead functions: extract_step, run_hooks, is_parser_plugin,
get_all_plugin_icons, discover_plugin_templates, find_binary_for_cmd,
create_model_record, get_parser_plugins

Dead imports: re, signal, subprocess, django.utils.timezone

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 16:34:20 -07:00
Nick Sweeting
002de811e2 bump dep versions 2026-03-15 15:28:19 -07:00
Nick Sweeting
f0b255914d bump dep versions 2026-03-15 14:57:01 -07:00
Nick Sweeting
0ac83c8799 Wait for crawl hook records before advancing 2026-03-15 14:15:04 -07:00
Nick Sweeting
1d16038ceb Relax archive output readiness check 2026-03-15 13:31:05 -07:00
Nick Sweeting
2585ef5870 Use npm package for readability extractor installs 2026-03-15 13:09:18 -07:00
Nick Sweeting
957387fd88 Fix plugin hook env and extractor retries 2026-03-15 12:39:27 -07:00
Nick Sweeting
1fc860e901 Remove legacy binary override coercion 2026-03-15 11:45:04 -07:00
Nick Sweeting
f92ca93ae9 Skip puppeteer browser download during package install 2026-03-15 11:39:43 -07:00
Nick Sweeting
7c55259ed0 Update title HTML test for search export 2026-03-15 11:17:58 -07:00
Nick Sweeting
86fdc3be1e Refresh worker config from resolved plugin installs 2026-03-15 11:07:55 -07:00
Nick Sweeting
47f540c094 Resolve crawl provider dependencies lazily 2026-03-15 10:18:49 -07:00
Nick Sweeting
d4be507a6b Keep provider plugins enabled under whitelists 2026-03-15 09:49:45 -07:00
Nick Sweeting
82bfd7e655 Filter binary hooks by allowed providers 2026-03-15 09:32:32 -07:00
Nick Sweeting
941135d6d0 Bound URL fixture archive wait 2026-03-15 09:07:25 -07:00
Nick Sweeting
50901e5367 Align worker config propagation expectations 2026-03-15 08:47:00 -07:00
Nick Sweeting
9ef8a1b23a Stabilize secret-backed plugin CI 2026-03-15 08:37:41 -07:00
Nick Sweeting
31e883ec53 Stabilize plugin and crawl integration tests 2026-03-15 08:16:52 -07:00
Nick Sweeting
bfc1e76ff5 Update extractor tests for plugin output dirs 2026-03-15 07:32:11 -07:00
Nick Sweeting
b62064f63e Avoid recursive crawl timeout regressions 2026-03-15 07:09:15 -07:00
Nick Sweeting
5fb3709281 Run recursive crawl tests to completion 2026-03-15 06:55:35 -07:00
Nick Sweeting
68b9f75dab Stabilize recursive crawl CI coverage 2026-03-15 06:49:40 -07:00
Nick Sweeting
760cf9d6b2 Stabilize CI against expanded plugin surface 2026-03-15 06:31:41 -07:00
Nick Sweeting
1f792d7199 Restore CLI compat and plugin dependency handling 2026-03-15 06:06:18 -07:00
Nick Sweeting
6b482c62df Restore top-level list command compatibility 2026-03-15 05:04:31 -07:00
Nick Sweeting
c4d30a853f Restore index-only snapshot output links 2026-03-15 04:58:46 -07:00
Nick Sweeting
cc3e72b92f Preserve tags for index-only adds 2026-03-15 04:54:55 -07:00
Nick Sweeting
58f801c220 Fix update orphan import and host-aware tests 2026-03-15 04:51:06 -07:00
Nick Sweeting
ea94029759 Update deprecated GitHub Pages actions 2026-03-15 04:39:26 -07:00
Nick Sweeting
4fa701fafe Update abx dependencies and plugin test harness 2026-03-15 04:37:32 -07:00
Nick Sweeting
ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting
07dc880d0b Harden AddView config overrides to admin-only 2026-03-15 03:45:57 -07:00
Nick Sweeting
4cd50dbc58 Add Debian and Homebrew package build automation (#1771)
## Summary

This PR adds automated build and release workflows for Debian (.deb) and
Homebrew packages, replacing the legacy PPA-based distribution model. It
includes build scripts, packaging configurations, and updated GitHub
Actions workflows to generate and publish packages on release.

## Related issues

Improves package distribution and installation experience for Linux and
macOS users.

## Changes these areas

- [x] Internal architecture
- [x] Feature behavior (installation methods)

## Details

### New Build Infrastructure

**Debian Packages:**
- Added `bin/build_deb.sh` - Builds .deb packages using nfpm with
support for multiple architectures (amd64, arm64)
- Added `pkg/debian/nfpm.yaml` - nfpm configuration defining package
metadata, dependencies, and contents
- Added `pkg/debian/archivebox` - Wrapper script that activates the
virtualenv and runs archivebox CLI
- Added `pkg/debian/install.sh` - Post-install setup script that creates
virtualenv and installs dependencies
- Added `pkg/debian/archivebox.service` - Systemd service file for
running archivebox as a service
- Added `pkg/debian/scripts/postinstall.sh` and `preremove.sh` - Package
lifecycle scripts
- Added `bin/release_deb.sh` - Uploads built .deb packages to GitHub
Releases

**Homebrew Packages:**
- Added `bin/build_brew.sh` - Generates Homebrew formula using
homebrew-pypi-poet, auto-populating dependencies and checksums
- Added `brew_dist/archivebox.rb` - Homebrew formula template with
virtualenv support and service definition
- Added `bin/release_brew.sh` - Pushes updated formula to the
homebrew-archivebox tap repository

### Updated Workflows

**`.github/workflows/debian.yml`:**
- Changed trigger from `push` to `release: [published]` for
release-based builds
- Updated to Ubuntu 24.04 and added multi-architecture build matrix
(amd64, arm64)
- Replaced legacy stdeb-based build with nfpm
- Added artifact upload and GitHub Release integration
- Added test job that installs and verifies the .deb package

**`.github/workflows/homebrew.yml`:**
- Changed trigger from `push` to `release: [published]`
- Replaced bottle building with formula generation using
homebrew-pypi-poet
- Added artifact upload and automatic push to homebrew-archivebox tap
- Added test job that installs and verifies the formula

### Updated Installation Instructions

- Modified `bin/setup.sh` to download and install from GitHub Releases
.deb instead of PPA
- Updated `README.md` to document the new .deb installation method
- Updated `bin/build.sh` and `bin/release.sh` to include new build and
release scripts

## Test Plan

- GitHub Actions workflows will test package builds on release events
- Debian workflow includes automated test job that installs the .deb and
verifies `archivebox version`
- Homebrew workflow includes automated test job that installs the
formula and verifies the CLI
- Manual testing can verify installation via: `sudo apt install
./dist/archivebox*.deb` or `brew install archivebox` (after tap setup)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1771"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds automated Debian (.deb) and Homebrew packaging with CI + release
orchestration, replacing the PPA and making installs via `apt` and
`brew` straightforward on Linux and macOS. Packages are thin wrappers
around pip to reduce dependencies; also fixes Docker release tags when
run via the orchestrator.

- **New Features**
- Debian `.deb` via `nfpm` (amd64/arm64): venv at
`/opt/archivebox/venv`, `/usr/bin/archivebox` wrapper (exports PATH)
used by the `systemd` service (no `--setup`); postinstall creates the
`archivebox` user, pins to the package version, enforces Python ≥ 3.13
in `install.sh`, and restarts the service after upgrades. Package
depends only on `python3`/`pip`/`venv`, recommends `git`/`curl`/`wget`.
CI pre-seeds the venv with a local wheel, runs init/status/add on native
`amd64`/`arm64`, and uploads to Releases only after tests pass; release
upload now fails loudly if the Release is missing when triggered by a
release event (skips gracefully on manual runs). `preremove.sh` stops
the service during upgrades to avoid stale venv binaries, and only
disables/removes the venv on full removal.
- Homebrew formula auto-generated with `homebrew-pypi-poet`: virtualenv
install with a single dependency on `python@3.13`, a `post_install` that
initializes in `var/archivebox` with `DATA_DIR` set, and a `caveats`
message showing the data directory. CI generates from a local sdist on
push and from PyPI on release, tests on macOS and Linuxbrew, then pushes
to the `homebrew-archivebox` tap.
- GitHub Actions: `release.yml` orchestrates `pip` → `.deb`/brew →
docker; Docker workflow treats `workflow_call` as a release so
semver/latest tags are published. Tests hardened (e.g., `tests/out`),
README `.deb` install detects arch via `dpkg` and fetches the latest
version via the GitHub API.

- **Migration**
- Ubuntu/Debian: install from Releases (`sudo apt install
/tmp/archivebox.deb`), replacing the PPA.
- macOS/Linuxbrew: `brew tap archivebox/archivebox && brew install
archivebox`.

<sup>Written for commit 36b4055304.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-14 22:43:27 -07:00
Nick Sweeting
c64e14c656 Merge branch 'dev' into claude/restore-package-managers-GlgBJ 2026-03-14 22:43:15 -07:00
Claude
36b4055304 Add caveats block to Homebrew formula showing data directory
The post_install initializes var/archivebox as the data directory,
but users need to know where it is for subsequent commands. The
caveats block is shown after brew install/upgrade to guide users.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:56:25 +00:00
Claude
2eee9d95a3 Fix postinstall.sh: restart service after upgrade
preremove.sh stops the service during upgrades (to avoid stale venv
binaries), but postinstall.sh wasn't restarting it. Now postinstall
checks if the service was enabled and restarts it after the venv is
rebuilt, so upgrades don't leave a previously running service down.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:52:32 +00:00
Claude
fbde2dee03 Fix remaining PR review comments across packaging files
- preremove.sh: Stop the systemd service during upgrades (not just
  remove/purge) so the running process doesn't use stale venv binaries
  while postinstall replaces them. Only disable + remove venv on full
  removal.

- debian.yml: Fail loudly when release lookup fails during a release
  event (exit 1), but still skip gracefully for workflow_dispatch
  manual testing (exit 0). Prevents silently broken .deb publication.

- archivebox.rb + build_brew.sh + homebrew.yml: Add post_install that
  initializes ArchiveBox in var/archivebox with DATA_DIR set, using
  correct Homebrew system() syntax (no env hash as first arg).

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:42:01 +00:00
Claude
c319b417c3 Fix Docker workflow missing semver/latest tags when called from release.yml
docker.yml uses github.event_name to decide between full release tags
(semver, latest) vs CI-only tags (branch, sha). When release.yml calls
it via `uses:`, the child sees event_name='workflow_call' which was
hitting the non-release path. Now workflow_call is treated the same as
workflow_dispatch so published releases get proper Docker tags.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:32:15 +00:00
Claude
7300892b08 Fix PR review comments: version check, runtime deps, CI test deps, release guard
- install.sh: Replace awk float comparison with integer major/minor check
  so Python 3.9 is correctly rejected as < 3.13
- nfpm.yaml: Add git/curl/wget as recommends so users have basic archiving
  tools out of the box without needing `archivebox install`
- debian.yml: Restore git/curl/wget in CI smoke test to match what the
  package recommends, instead of relying on runner preinstalls
- debian.yml: Guard release upload to skip gracefully when no GitHub
  Release exists (fixes workflow_dispatch failures)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:28:46 +00:00
Claude
3501b393eb Deduplicate homebrew.yml release job by reusing build_brew.sh
The release job was duplicating ~70 lines of PyPI fetching + formula
generation that build_brew.sh already handles. Now it just calls
./bin/build_brew.sh. Also merged the two Linux-only brew setup steps.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:15:23 +00:00
Claude
82932812ae Simplify deb/brew packages to thin wrappers around pip install
Remove all non-essential dependencies from both package formats.
The .deb now only depends on python3, pip, and venv. The brew
formula only depends on python@3.13. All other runtime deps
(node, chrome, yt-dlp, wget, ripgrep, etc.) are installed
on-demand by `archivebox install` at runtime.

Removed from brew formula:
- 6 depends_on entries (node, git, wget, curl, ripgrep, yt-dlp)
- on_linux block (pkg-config, openssl, libffi)
- post_install hook (was running archivebox install)
- service block (users run archivebox server directly)

Removed from .deb:
- 6 depends entries (nodejs, npm, git, wget, curl, ripgrep)
- recommends section (yt-dlp, ffmpeg, chromium)

Net: -126 lines across packaging files.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:56:39 +00:00
Claude
0fac8a7346 Fix remaining PR review comments from all review rounds
- systemd service: use /usr/bin/archivebox wrapper (exports venv PATH
  for bundled tools like yt-dlp) instead of direct venv binary
- install.sh: prefer python3.13, fail early with clear error if < 3.13,
  add comment clarifying the manual (unpinned) fallback behavior
- debian.yml release job: fall back to pyproject.toml version when
  github.event.release.tag_name is empty (workflow_dispatch path)
- nfpm.yaml: clarify that install.sh enforces the real >= 3.13 constraint
- CI pre-seed step: expanded comment explaining why we pre-seed with
  python3.13 and how it relates to real installs

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:44:49 +00:00
Claude
68fea71933 Address remaining PR review comments
- Pin cache-apt-pkgs-action to commit SHA for supply-chain safety
- Fix Homebrew post_install to use with_env block instead of env hash
  in system() call (idiomatic Homebrew pattern)
- Add clarifying comments to service file, preremove.sh, and nfpm.yaml
  explaining user/group creation, directory ownership, and upgrade handling

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:39:33 +00:00