Commit Graph

4920 Commits

Author SHA1 Message Date
Claude
36b4055304 Add caveats block to Homebrew formula showing data directory
The post_install initializes var/archivebox as the data directory,
but users need to know where it is for subsequent commands. The
caveats block is shown after brew install/upgrade to guide users.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:56:25 +00:00
Claude
2eee9d95a3 Fix postinstall.sh: restart service after upgrade
preremove.sh stops the service during upgrades (to avoid stale venv
binaries), but postinstall.sh wasn't restarting it. Now postinstall
checks if the service was enabled and restarts it after the venv is
rebuilt, so upgrades don't leave a previously running service down.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:52:32 +00:00
Claude
fbde2dee03 Fix remaining PR review comments across packaging files
- preremove.sh: Stop the systemd service during upgrades (not just
  remove/purge) so the running process doesn't use stale venv binaries
  while postinstall replaces them. Only disable + remove venv on full
  removal.

- debian.yml: Fail loudly when release lookup fails during a release
  event (exit 1), but still skip gracefully for workflow_dispatch
  manual testing (exit 0). Prevents silently broken .deb publication.

- archivebox.rb + build_brew.sh + homebrew.yml: Add post_install that
  initializes ArchiveBox in var/archivebox with DATA_DIR set, using
  correct Homebrew system() syntax (no env hash as first arg).

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:42:01 +00:00
Claude
c319b417c3 Fix Docker workflow missing semver/latest tags when called from release.yml
docker.yml uses github.event_name to decide between full release tags
(semver, latest) vs CI-only tags (branch, sha). When release.yml calls
it via `uses:`, the child sees event_name='workflow_call' which was
hitting the non-release path. Now workflow_call is treated the same as
workflow_dispatch so published releases get proper Docker tags.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:32:15 +00:00
Claude
7300892b08 Fix PR review comments: version check, runtime deps, CI test deps, release guard
- install.sh: Replace awk float comparison with integer major/minor check
  so Python 3.9 is correctly rejected as < 3.13
- nfpm.yaml: Add git/curl/wget as recommends so users have basic archiving
  tools out of the box without needing `archivebox install`
- debian.yml: Restore git/curl/wget in CI smoke test to match what the
  package recommends, instead of relying on runner preinstalls
- debian.yml: Guard release upload to skip gracefully when no GitHub
  Release exists (fixes workflow_dispatch failures)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:28:46 +00:00
Claude
3501b393eb Deduplicate homebrew.yml release job by reusing build_brew.sh
The release job was duplicating ~70 lines of PyPI fetching + formula
generation that build_brew.sh already handles. Now it just calls
./bin/build_brew.sh. Also merged the two Linux-only brew setup steps.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 04:15:23 +00:00
Claude
82932812ae Simplify deb/brew packages to thin wrappers around pip install
Remove all non-essential dependencies from both package formats.
The .deb now only depends on python3, pip, and venv. The brew
formula only depends on python@3.13. All other runtime deps
(node, chrome, yt-dlp, wget, ripgrep, etc.) are installed
on-demand by `archivebox install` at runtime.

Removed from brew formula:
- 6 depends_on entries (node, git, wget, curl, ripgrep, yt-dlp)
- on_linux block (pkg-config, openssl, libffi)
- post_install hook (was running archivebox install)
- service block (users run archivebox server directly)

Removed from .deb:
- 6 depends entries (nodejs, npm, git, wget, curl, ripgrep)
- recommends section (yt-dlp, ffmpeg, chromium)

Net: -126 lines across packaging files.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:56:39 +00:00
Claude
0fac8a7346 Fix remaining PR review comments from all review rounds
- systemd service: use /usr/bin/archivebox wrapper (exports venv PATH
  for bundled tools like yt-dlp) instead of direct venv binary
- install.sh: prefer python3.13, fail early with clear error if < 3.13,
  add comment clarifying the manual (unpinned) fallback behavior
- debian.yml release job: fall back to pyproject.toml version when
  github.event.release.tag_name is empty (workflow_dispatch path)
- nfpm.yaml: clarify that install.sh enforces the real >= 3.13 constraint
- CI pre-seed step: expanded comment explaining why we pre-seed with
  python3.13 and how it relates to real installs

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:44:49 +00:00
Claude
68fea71933 Address remaining PR review comments
- Pin cache-apt-pkgs-action to commit SHA for supply-chain safety
- Fix Homebrew post_install to use with_env block instead of env hash
  in system() call (idiomatic Homebrew pattern)
- Add clarifying comments to service file, preremove.sh, and nfpm.yaml
  explaining user/group creation, directory ownership, and upgrade handling

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:39:33 +00:00
Claude
2845e4350a Fix .deb download URL in README to include version component
The nfpm-built .deb files are named archivebox_<VERSION>_<ARCH>.deb,
so the download URL needs to fetch the latest version tag first.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:32:01 +00:00
Claude
6e77d11c07 Restore cache-apt-pkgs-action for test job build dependencies
https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:29:31 +00:00
Claude
4db4c36cb2 Add arm64 to .deb test matrix using GitHub's arm64 runners
Tests both amd64 and arm64 .deb packages by downloading the
matching architecture artifact and running the full install +
verification flow on native runners.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:22:33 +00:00
Claude
496b54a5e1 Fix remaining PR review comments: release ordering, verification, README
- Move .deb upload to GitHub Release into a separate job that runs after tests pass
- Fix workflow_call event propagation so release jobs run when called from release.yml
- Fix setup.sh post-install verification to check `which archivebox` first (works for brew/deb)
- Fix README.md: detect architecture with dpkg instead of hardcoding amd64
- Fix README.md: remove --setup flag from apt install instructions

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:20:32 +00:00
Claude
7c7a9ee599 Fix PR review comments: service flags, DATA_DIR, version pinning, upgrade safety
- Remove --setup flag from systemd service and CI (not valid in 0.9.x)
- Remove release triggers from debian/homebrew workflows (handled by release.yml)
- Fix brew post_install to set DATA_DIR so it initializes in var/archivebox
- Add PATH export to deb wrapper script for bundled console scripts
- Remove pip install fallback in install.sh (strict version pinning)
- Guard preremove.sh cleanup to only run on remove/purge, not upgrade
- Initialize SDIST_URL/SDIST_SHA256 in build_brew.sh (nounset safety)
- Pin awalsh128/cache-apt-pkgs-action to v1.6.0 (supply chain safety)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:12:37 +00:00
Nick Sweeting
16090944c4 Update .github/workflows/homebrew.yml
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-03-14 23:05:43 -04:00
Claude
4c113f8eb9 Fix CI: create tests/out dir, fix archivebox add cmd, revert setup.sh
- test-parallel.yml: mkdir -p tests/out before pytest --basetemp
  (fixes FileNotFoundError in chrome test fixture)
- debian.yml: fix archivebox add command (--parser url_list removed
  in 0.9.x), remove || true so failures are caught
- setup.sh: revert apt section to always use pip install, not .deb

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:02:47 +00:00
Claude
fa11bee5b5 CI: Full brew install + deb install tested on every push
- homebrew.yml: Build local sdist, generate formula with file:// URL and
  real resource stanzas via homebrew-pypi-poet, run full
  `brew install --build-from-source` on both macOS and Linux (Linuxbrew)
- debian.yml: Pre-seed venv with local wheel before dpkg install so
  postinstall succeeds even for unreleased versions; test init/status/add
- Both workflows trigger on push (path-filtered) and release
- Release job generates formula with PyPI URL and pushes to tap

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:55:10 +00:00
Claude
c8f562ee37 Wire up GitHub Actions for deb/brew build, test, and release
- Fix debian.yml: pin nfpm version, add permissions, improve test job
  with user creation, init test, and status check
- Fix homebrew.yml: use PyPI JSON API (macOS-compatible, no grep -oP),
  wait for PyPI availability on release, use generated formula not template,
  add Linux (Linuxbrew) test job alongside macOS
- Add release.yml orchestrator: pip → deb + brew + docker in order
- Add workflow_call triggers to pip.yml and docker.yml
- Fix build_brew.sh: replace grep -oP with Python-based PyPI API,
  add on_linux deps (pkg-config, openssl, libffi)
- Fix setup.sh: use GitHub API to find correct .deb download URL
  (filename includes version number)
- Fix postinstall.sh: create archivebox system user, pin version from
  package, check for systemd before daemon-reload
- Fix preremove.sh: stop service before removal, check for systemd
- Fix install.sh: fallback to latest if pinned version not on PyPI
- Add on_linux deps to brew formula for Linuxbrew support
- Tested: .deb builds, installs, creates user, runs archivebox init

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:50:14 +00:00
Claude
f3fcc1584c Restore Homebrew and Debian package manager support
- Add Homebrew formula (brew_dist/archivebox.rb) using virtualenv pattern
  with auto-generation via homebrew-pypi-poet in bin/build_brew.sh
- Add Debian packaging via nFPM (pkg/debian/) with thin .deb that pip-installs
  archivebox into /opt/archivebox/venv on postinstall
- Add build/release scripts: bin/{build,release}_{brew,deb}.sh
- Update CI workflows to build packages on release and test them
- Update README apt/brew install instructions with working commands
- Update bin/setup.sh to use .deb download instead of old Launchpad PPA

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 00:19:50 +00:00
Nick Sweeting
fdef1f991e Update README with venv activation command
Added command to activate the virtual environment.
2026-03-14 16:13:18 -04:00
Nick Sweeting
c1b3e73c11 Fix #1139: Feature Request: Add AI-assisted summarization, tagging, sea (#1767)
Fixes #1139

## Summary
This PR fixes: Feature Request: Add AI-assisted summarization, tagging,
search, and more using LLMs / RAG

## Changes
```
archivebox/core/models.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
```

## Testing
Please review the changes carefully. The fix was verified against the
existing test suite.

---
*This PR was created with the assistance of Claude Sonnet 4.6 by
Anthropic | effort: low. Happy to make any adjustments!*

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Returns tags as a JSON array in Snapshot.to_dict() and accepts both list
and comma-separated tags in from_json(), making search exports and
RAG/LLM integrations easier. Fixes #1139.

- **New Features**
  - Tags export is now a sorted JSON list for deterministic output.
- Imports accept list or string formats; trims whitespace and
deduplicates tags for compatibility.

<sup>Written for commit 08b0dfaf12.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-24 15:37:23 -08:00
Your Name
08b0dfaf12 Fix #1139: Return tags as a JSON list in Snapshot.to_dict() for LLM/RAG integration
Previously, `archivebox search --json` exported tags as a comma-separated
string (e.g. "tag1,tag2"), which required manual parsing by consumers like
LlamaIndex, LangChain, and other RAG frameworks.

Now `to_dict()` returns tags as a proper JSON array (e.g. ["tag1", "tag2"]),
making the export directly usable as structured metadata in LLM/RAG pipelines
without additional preprocessing.

`from_json()` is updated to accept both list and string formats for backward
compatibility with existing JSON imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:21:38 -08:00
Nick Sweeting
a0be8fe771 Tag current maintainer of AUR package (#1761)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

Add the maintainer info of the ArchiveBox AUR package for
accountability. Much of the packaging has changed since the time of its
initial contribution and I as the current maintainer will make sure
these changes will work smoothly moving forward. I will also make sure
this AUR package will be up to date once the 0.9.x branch is released.

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Update README to tag the current maintainer of the Arch AUR package.
Adds “maintained by @jasongodev” next to the original contributor to
improve accountability and clarify support.

<sup>Written for commit 0d05fd8c53.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-11 13:23:24 -08:00
Nick Sweeting
17e26ae5a4 Delete TEST_RESULTS.md 2026-02-09 18:23:35 -08:00
Jason Go
0d05fd8c53 Tag current maintainer of AUR package 2026-02-09 01:08:24 +08:00
Nick Sweeting
dcfad7daf1 FIX: docker build (#1760)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

This PR fixes the docker image build. Also fixes the uuid7 not found
error on the first run of `archivebox init`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the Docker image build and the uuid7 error on first init. We now
use uv-managed Python 3.13 and patch uuid.uuid7 before Django
migrations.

- **Bug Fixes**
- Docker: switch to uv-managed Python, create venv with uv --python,
skip version check at build, and start with --init.
- UUID7: add uuid_compat, import it early, and monkey-patch uuid.uuid7
on <3.14 to keep migrations working.

- **Dependencies**
  - Bump Python to 3.13.
  - Require uuid_extensions on Python <3.14.

<sup>Written for commit 9aa4f0de58.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-01-31 01:35:24 -08:00
Pellaeon Lin
9aa4f0de58 FIX: The docker entrypoint doesn't have --quick-init 2026-01-31 08:25:22 +00:00
Pellaeon Lin
1ca54525f2 FIX: uuid_compat 2026-01-31 08:24:50 +00:00
Pellaeon Lin
36008fd1fa FIX: docker build 2026-01-30 09:07:09 +00:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
f3f55d3395 perfect snapshot detail cards 2026-01-19 14:56:15 -08:00
Nick Sweeting
86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting
bef67760db working singlefile 2026-01-19 03:05:49 -08:00
Nick Sweeting
b5bbc3b549 better tui 2026-01-19 01:53:32 -08:00
Nick Sweeting
1cb2d5070e bump version 2026-01-19 01:11:59 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
Nick Sweeting
eaf7256345 Implement native LDAP authentication (#1756)
## Summary

Implements native LDAP authentication support for ArchiveBox.

## Changes

- Create `archivebox/config/ldap.py` with LDAPConfig class
- Create `archivebox/ldap/` Django app with custom auth backend
- Update `core/settings.py` to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and
LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is
incomplete

## Implementation Approach

-  Native integration (not a plugin)
-  Conditional loading based on libraries + config
-  Separate Django app for LDAP logic
-  Clean if statements in settings.py
-  No mixing LDAP code with rest of codebase

Fixes #1664

🤖 Generated with [Claude Code](https://claude.ai/code)
2026-01-05 16:07:27 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00
Nick Sweeting
352e1bad32 remove debug lines 2026-01-05 02:27:34 -08:00
Nick Sweeting
0a2ac11b01 more binary fixes 2026-01-05 02:26:33 -08:00
Nick Sweeting
b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting
5449971777 better kill tree 2026-01-02 04:33:41 -08:00
Nick Sweeting
3da523fc74 more consistent crawl, snapshot, hook cleanup and Process tracking 2026-01-02 04:27:38 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00