Commit Graph

4913 Commits

Author SHA1 Message Date
Claude
0fac8a7346 Fix remaining PR review comments from all review rounds
- systemd service: use /usr/bin/archivebox wrapper (exports venv PATH
  for bundled tools like yt-dlp) instead of direct venv binary
- install.sh: prefer python3.13, fail early with clear error if < 3.13,
  add comment clarifying the manual (unpinned) fallback behavior
- debian.yml release job: fall back to pyproject.toml version when
  github.event.release.tag_name is empty (workflow_dispatch path)
- nfpm.yaml: clarify that install.sh enforces the real >= 3.13 constraint
- CI pre-seed step: expanded comment explaining why we pre-seed with
  python3.13 and how it relates to real installs

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:44:49 +00:00
Claude
68fea71933 Address remaining PR review comments
- Pin cache-apt-pkgs-action to commit SHA for supply-chain safety
- Fix Homebrew post_install to use with_env block instead of env hash
  in system() call (idiomatic Homebrew pattern)
- Add clarifying comments to service file, preremove.sh, and nfpm.yaml
  explaining user/group creation, directory ownership, and upgrade handling

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:39:33 +00:00
Claude
2845e4350a Fix .deb download URL in README to include version component
The nfpm-built .deb files are named archivebox_<VERSION>_<ARCH>.deb,
so the download URL needs to fetch the latest version tag first.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:32:01 +00:00
Claude
6e77d11c07 Restore cache-apt-pkgs-action for test job build dependencies
https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:29:31 +00:00
Claude
4db4c36cb2 Add arm64 to .deb test matrix using GitHub's arm64 runners
Tests both amd64 and arm64 .deb packages by downloading the
matching architecture artifact and running the full install +
verification flow on native runners.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:22:33 +00:00
Claude
496b54a5e1 Fix remaining PR review comments: release ordering, verification, README
- Move .deb upload to GitHub Release into a separate job that runs after tests pass
- Fix workflow_call event propagation so release jobs run when called from release.yml
- Fix setup.sh post-install verification to check `which archivebox` first (works for brew/deb)
- Fix README.md: detect architecture with dpkg instead of hardcoding amd64
- Fix README.md: remove --setup flag from apt install instructions

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:20:32 +00:00
Claude
7c7a9ee599 Fix PR review comments: service flags, DATA_DIR, version pinning, upgrade safety
- Remove --setup flag from systemd service and CI (not valid in 0.9.x)
- Remove release triggers from debian/homebrew workflows (handled by release.yml)
- Fix brew post_install to set DATA_DIR so it initializes in var/archivebox
- Add PATH export to deb wrapper script for bundled console scripts
- Remove pip install fallback in install.sh (strict version pinning)
- Guard preremove.sh cleanup to only run on remove/purge, not upgrade
- Initialize SDIST_URL/SDIST_SHA256 in build_brew.sh (nounset safety)
- Pin awalsh128/cache-apt-pkgs-action to v1.6.0 (supply chain safety)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:12:37 +00:00
Nick Sweeting
16090944c4 Update .github/workflows/homebrew.yml
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-03-14 23:05:43 -04:00
Claude
4c113f8eb9 Fix CI: create tests/out dir, fix archivebox add cmd, revert setup.sh
- test-parallel.yml: mkdir -p tests/out before pytest --basetemp
  (fixes FileNotFoundError in chrome test fixture)
- debian.yml: fix archivebox add command (--parser url_list removed
  in 0.9.x), remove || true so failures are caught
- setup.sh: revert apt section to always use pip install, not .deb

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:02:47 +00:00
Claude
fa11bee5b5 CI: Full brew install + deb install tested on every push
- homebrew.yml: Build local sdist, generate formula with file:// URL and
  real resource stanzas via homebrew-pypi-poet, run full
  `brew install --build-from-source` on both macOS and Linux (Linuxbrew)
- debian.yml: Pre-seed venv with local wheel before dpkg install so
  postinstall succeeds even for unreleased versions; test init/status/add
- Both workflows trigger on push (path-filtered) and release
- Release job generates formula with PyPI URL and pushes to tap

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:55:10 +00:00
Claude
c8f562ee37 Wire up GitHub Actions for deb/brew build, test, and release
- Fix debian.yml: pin nfpm version, add permissions, improve test job
  with user creation, init test, and status check
- Fix homebrew.yml: use PyPI JSON API (macOS-compatible, no grep -oP),
  wait for PyPI availability on release, use generated formula not template,
  add Linux (Linuxbrew) test job alongside macOS
- Add release.yml orchestrator: pip → deb + brew + docker in order
- Add workflow_call triggers to pip.yml and docker.yml
- Fix build_brew.sh: replace grep -oP with Python-based PyPI API,
  add on_linux deps (pkg-config, openssl, libffi)
- Fix setup.sh: use GitHub API to find correct .deb download URL
  (filename includes version number)
- Fix postinstall.sh: create archivebox system user, pin version from
  package, check for systemd before daemon-reload
- Fix preremove.sh: stop service before removal, check for systemd
- Fix install.sh: fallback to latest if pinned version not on PyPI
- Add on_linux deps to brew formula for Linuxbrew support
- Tested: .deb builds, installs, creates user, runs archivebox init

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:50:14 +00:00
Claude
f3fcc1584c Restore Homebrew and Debian package manager support
- Add Homebrew formula (brew_dist/archivebox.rb) using virtualenv pattern
  with auto-generation via homebrew-pypi-poet in bin/build_brew.sh
- Add Debian packaging via nFPM (pkg/debian/) with thin .deb that pip-installs
  archivebox into /opt/archivebox/venv on postinstall
- Add build/release scripts: bin/{build,release}_{brew,deb}.sh
- Update CI workflows to build packages on release and test them
- Update README apt/brew install instructions with working commands
- Update bin/setup.sh to use .deb download instead of old Launchpad PPA

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 00:19:50 +00:00
Nick Sweeting
fdef1f991e Update README with venv activation command
Added command to activate the virtual environment.
2026-03-14 16:13:18 -04:00
Nick Sweeting
c1b3e73c11 Fix #1139: Feature Request: Add AI-assisted summarization, tagging, sea (#1767)
Fixes #1139

## Summary
This PR fixes: Feature Request: Add AI-assisted summarization, tagging,
search, and more using LLMs / RAG

## Changes
```
archivebox/core/models.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
```

## Testing
Please review the changes carefully. The fix was verified against the
existing test suite.

---
*This PR was created with the assistance of Claude Sonnet 4.6 by
Anthropic | effort: low. Happy to make any adjustments!*

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Returns tags as a JSON array in Snapshot.to_dict() and accepts both list
and comma-separated tags in from_json(), making search exports and
RAG/LLM integrations easier. Fixes #1139.

- **New Features**
  - Tags export is now a sorted JSON list for deterministic output.
- Imports accept list or string formats; trims whitespace and
deduplicates tags for compatibility.

<sup>Written for commit 08b0dfaf12.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-24 15:37:23 -08:00
Your Name
08b0dfaf12 Fix #1139: Return tags as a JSON list in Snapshot.to_dict() for LLM/RAG integration
Previously, `archivebox search --json` exported tags as a comma-separated
string (e.g. "tag1,tag2"), which required manual parsing by consumers like
LlamaIndex, LangChain, and other RAG frameworks.

Now `to_dict()` returns tags as a proper JSON array (e.g. ["tag1", "tag2"]),
making the export directly usable as structured metadata in LLM/RAG pipelines
without additional preprocessing.

`from_json()` is updated to accept both list and string formats for backward
compatibility with existing JSON imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:21:38 -08:00
Nick Sweeting
a0be8fe771 Tag current maintainer of AUR package (#1761)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

Add the maintainer info of the ArchiveBox AUR package for
accountability. Much of the packaging has changed since the time of its
initial contribution and I as the current maintainer will make sure
these changes will work smoothly moving forward. I will also make sure
this AUR package will be up to date once the 0.9.x branch is released.

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Update README to tag the current maintainer of the Arch AUR package.
Adds “maintained by @jasongodev” next to the original contributor to
improve accountability and clarify support.

<sup>Written for commit 0d05fd8c53.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-11 13:23:24 -08:00
Nick Sweeting
17e26ae5a4 Delete TEST_RESULTS.md 2026-02-09 18:23:35 -08:00
Jason Go
0d05fd8c53 Tag current maintainer of AUR package 2026-02-09 01:08:24 +08:00
Nick Sweeting
dcfad7daf1 FIX: docker build (#1760)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

This PR fixes the docker image build. Also fixes the uuid7 not found
error on the first run of `archivebox init`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the Docker image build and the uuid7 error on first init. We now
use uv-managed Python 3.13 and patch uuid.uuid7 before Django
migrations.

- **Bug Fixes**
- Docker: switch to uv-managed Python, create venv with uv --python,
skip version check at build, and start with --init.
- UUID7: add uuid_compat, import it early, and monkey-patch uuid.uuid7
on <3.14 to keep migrations working.

- **Dependencies**
  - Bump Python to 3.13.
  - Require uuid_extensions on Python <3.14.

<sup>Written for commit 9aa4f0de58.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-01-31 01:35:24 -08:00
Pellaeon Lin
9aa4f0de58 FIX: The docker entrypoint doesn't have --quick-init 2026-01-31 08:25:22 +00:00
Pellaeon Lin
1ca54525f2 FIX: uuid_compat 2026-01-31 08:24:50 +00:00
Pellaeon Lin
36008fd1fa FIX: docker build 2026-01-30 09:07:09 +00:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
f3f55d3395 perfect snapshot detail cards 2026-01-19 14:56:15 -08:00
Nick Sweeting
86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting
bef67760db working singlefile 2026-01-19 03:05:49 -08:00
Nick Sweeting
b5bbc3b549 better tui 2026-01-19 01:53:32 -08:00
Nick Sweeting
1cb2d5070e bump version 2026-01-19 01:11:59 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
Nick Sweeting
eaf7256345 Implement native LDAP authentication (#1756)
## Summary

Implements native LDAP authentication support for ArchiveBox.

## Changes

- Create `archivebox/config/ldap.py` with LDAPConfig class
- Create `archivebox/ldap/` Django app with custom auth backend
- Update `core/settings.py` to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and
LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is
incomplete

## Implementation Approach

-  Native integration (not a plugin)
-  Conditional loading based on libraries + config
-  Separate Django app for LDAP logic
-  Clean if statements in settings.py
-  No mixing LDAP code with rest of codebase

Fixes #1664

🤖 Generated with [Claude Code](https://claude.ai/code)
2026-01-05 16:07:27 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00
Nick Sweeting
352e1bad32 remove debug lines 2026-01-05 02:27:34 -08:00
Nick Sweeting
0a2ac11b01 more binary fixes 2026-01-05 02:26:33 -08:00
Nick Sweeting
b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting
5449971777 better kill tree 2026-01-02 04:33:41 -08:00
Nick Sweeting
3da523fc74 more consistent crawl, snapshot, hook cleanup and Process tracking 2026-01-02 04:27:38 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting
c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting
9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
Nick Sweeting
6fadcf5168 remove model health stats from models that dont need it 2026-01-01 15:50:00 -08:00
Nick Sweeting
e903fa1d2b Fix: Make SingleFile use SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS (#1754)
Fixes #1445

This PR resolves the issue where SingleFile was not respecting Chrome
user data directory and other Chrome launch options that work for other
Chrome-based extractors (PDF, Screenshot, etc.).

## Changes
- Added `SINGLEFILE_CHROME_ARGS` config option with fallback to
`CHROME_ARGS`
- Updated SingleFile extractor to pass Chrome arguments via
`--browser-args`
- Updated documentation

This ensures SingleFile respects the same Chrome configuration as other
Chrome-based extractors.

Generated with [Claude Code](https://claude.ai/code)
2026-01-01 14:34:05 -08:00
Nick Sweeting
f7457b13ad more migrations fixes attempts 2025-12-31 17:46:10 -08:00