Commit Graph

4909 Commits

Author SHA1 Message Date
Claude
4db4c36cb2 Add arm64 to .deb test matrix using GitHub's arm64 runners
Tests both amd64 and arm64 .deb packages by downloading the
matching architecture artifact and running the full install +
verification flow on native runners.

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:22:33 +00:00
Claude
496b54a5e1 Fix remaining PR review comments: release ordering, verification, README
- Move .deb upload to GitHub Release into a separate job that runs after tests pass
- Fix workflow_call event propagation so release jobs run when called from release.yml
- Fix setup.sh post-install verification to check `which archivebox` first (works for brew/deb)
- Fix README.md: detect architecture with dpkg instead of hardcoding amd64
- Fix README.md: remove --setup flag from apt install instructions

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:20:32 +00:00
Claude
7c7a9ee599 Fix PR review comments: service flags, DATA_DIR, version pinning, upgrade safety
- Remove --setup flag from systemd service and CI (not valid in 0.9.x)
- Remove release triggers from debian/homebrew workflows (handled by release.yml)
- Fix brew post_install to set DATA_DIR so it initializes in var/archivebox
- Add PATH export to deb wrapper script for bundled console scripts
- Remove pip install fallback in install.sh (strict version pinning)
- Guard preremove.sh cleanup to only run on remove/purge, not upgrade
- Initialize SDIST_URL/SDIST_SHA256 in build_brew.sh (nounset safety)
- Pin awalsh128/cache-apt-pkgs-action to v1.6.0 (supply chain safety)

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:12:37 +00:00
Nick Sweeting
16090944c4 Update .github/workflows/homebrew.yml
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-03-14 23:05:43 -04:00
Claude
4c113f8eb9 Fix CI: create tests/out dir, fix archivebox add cmd, revert setup.sh
- test-parallel.yml: mkdir -p tests/out before pytest --basetemp
  (fixes FileNotFoundError in chrome test fixture)
- debian.yml: fix archivebox add command (--parser url_list removed
  in 0.9.x), remove || true so failures are caught
- setup.sh: revert apt section to always use pip install, not .deb

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 03:02:47 +00:00
Claude
fa11bee5b5 CI: Full brew install + deb install tested on every push
- homebrew.yml: Build local sdist, generate formula with file:// URL and
  real resource stanzas via homebrew-pypi-poet, run full
  `brew install --build-from-source` on both macOS and Linux (Linuxbrew)
- debian.yml: Pre-seed venv with local wheel before dpkg install so
  postinstall succeeds even for unreleased versions; test init/status/add
- Both workflows trigger on push (path-filtered) and release
- Release job generates formula with PyPI URL and pushes to tap

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:55:10 +00:00
Claude
c8f562ee37 Wire up GitHub Actions for deb/brew build, test, and release
- Fix debian.yml: pin nfpm version, add permissions, improve test job
  with user creation, init test, and status check
- Fix homebrew.yml: use PyPI JSON API (macOS-compatible, no grep -oP),
  wait for PyPI availability on release, use generated formula not template,
  add Linux (Linuxbrew) test job alongside macOS
- Add release.yml orchestrator: pip → deb + brew + docker in order
- Add workflow_call triggers to pip.yml and docker.yml
- Fix build_brew.sh: replace grep -oP with Python-based PyPI API,
  add on_linux deps (pkg-config, openssl, libffi)
- Fix setup.sh: use GitHub API to find correct .deb download URL
  (filename includes version number)
- Fix postinstall.sh: create archivebox system user, pin version from
  package, check for systemd before daemon-reload
- Fix preremove.sh: stop service before removal, check for systemd
- Fix install.sh: fallback to latest if pinned version not on PyPI
- Add on_linux deps to brew formula for Linuxbrew support
- Tested: .deb builds, installs, creates user, runs archivebox init

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 02:50:14 +00:00
Claude
f3fcc1584c Restore Homebrew and Debian package manager support
- Add Homebrew formula (brew_dist/archivebox.rb) using virtualenv pattern
  with auto-generation via homebrew-pypi-poet in bin/build_brew.sh
- Add Debian packaging via nFPM (pkg/debian/) with thin .deb that pip-installs
  archivebox into /opt/archivebox/venv on postinstall
- Add build/release scripts: bin/{build,release}_{brew,deb}.sh
- Update CI workflows to build packages on release and test them
- Update README apt/brew install instructions with working commands
- Update bin/setup.sh to use .deb download instead of old Launchpad PPA

https://claude.ai/code/session_01Vx1EsNrNySgsc8Y67dGzCn
2026-03-15 00:19:50 +00:00
Nick Sweeting
fdef1f991e Update README with venv activation command
Added command to activate the virtual environment.
2026-03-14 16:13:18 -04:00
Nick Sweeting
c1b3e73c11 Fix #1139: Feature Request: Add AI-assisted summarization, tagging, sea (#1767)
Fixes #1139

## Summary
This PR fixes: Feature Request: Add AI-assisted summarization, tagging,
search, and more using LLMs / RAG

## Changes
```
archivebox/core/models.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
```

## Testing
Please review the changes carefully. The fix was verified against the
existing test suite.

---
*This PR was created with the assistance of Claude Sonnet 4.6 by
Anthropic | effort: low. Happy to make any adjustments!*

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Returns tags as a JSON array in Snapshot.to_dict() and accepts both list
and comma-separated tags in from_json(), making search exports and
RAG/LLM integrations easier. Fixes #1139.

- **New Features**
  - Tags export is now a sorted JSON list for deterministic output.
- Imports accept list or string formats; trims whitespace and
deduplicates tags for compatibility.

<sup>Written for commit 08b0dfaf12.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-24 15:37:23 -08:00
Your Name
08b0dfaf12 Fix #1139: Return tags as a JSON list in Snapshot.to_dict() for LLM/RAG integration
Previously, `archivebox search --json` exported tags as a comma-separated
string (e.g. "tag1,tag2"), which required manual parsing by consumers like
LlamaIndex, LangChain, and other RAG frameworks.

Now `to_dict()` returns tags as a proper JSON array (e.g. ["tag1", "tag2"]),
making the export directly usable as structured metadata in LLM/RAG pipelines
without additional preprocessing.

`from_json()` is updated to accept both list and string formats for backward
compatibility with existing JSON imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:21:38 -08:00
Nick Sweeting
a0be8fe771 Tag current maintainer of AUR package (#1761)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

Add the maintainer info of the ArchiveBox AUR package for
accountability. Much of the packaging has changed since the time of its
initial contribution and I as the current maintainer will make sure
these changes will work smoothly moving forward. I will also make sure
this AUR package will be up to date once the 0.9.x branch is released.

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Update README to tag the current maintainer of the Arch AUR package.
Adds “maintained by @jasongodev” next to the original contributor to
improve accountability and clarify support.

<sup>Written for commit 0d05fd8c53.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-02-11 13:23:24 -08:00
Nick Sweeting
17e26ae5a4 Delete TEST_RESULTS.md 2026-02-09 18:23:35 -08:00
Jason Go
0d05fd8c53 Tag current maintainer of AUR package 2026-02-09 01:08:24 +08:00
Nick Sweeting
dcfad7daf1 FIX: docker build (#1760)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

This PR fixes the docker image build. Also fixes the uuid7 not found
error on the first run of `archivebox init`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the Docker image build and the uuid7 error on first init. We now
use uv-managed Python 3.13 and patch uuid.uuid7 before Django
migrations.

- **Bug Fixes**
- Docker: switch to uv-managed Python, create venv with uv --python,
skip version check at build, and start with --init.
- UUID7: add uuid_compat, import it early, and monkey-patch uuid.uuid7
on <3.14 to keep migrations working.

- **Dependencies**
  - Bump Python to 3.13.
  - Require uuid_extensions on Python <3.14.

<sup>Written for commit 9aa4f0de58.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-01-31 01:35:24 -08:00
Pellaeon Lin
9aa4f0de58 FIX: The docker entrypoint doesn't have --quick-init 2026-01-31 08:25:22 +00:00
Pellaeon Lin
1ca54525f2 FIX: uuid_compat 2026-01-31 08:24:50 +00:00
Pellaeon Lin
36008fd1fa FIX: docker build 2026-01-30 09:07:09 +00:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
f3f55d3395 perfect snapshot detail cards 2026-01-19 14:56:15 -08:00
Nick Sweeting
86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting
bef67760db working singlefile 2026-01-19 03:05:49 -08:00
Nick Sweeting
b5bbc3b549 better tui 2026-01-19 01:53:32 -08:00
Nick Sweeting
1cb2d5070e bump version 2026-01-19 01:11:59 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
Nick Sweeting
eaf7256345 Implement native LDAP authentication (#1756)
## Summary

Implements native LDAP authentication support for ArchiveBox.

## Changes

- Create `archivebox/config/ldap.py` with LDAPConfig class
- Create `archivebox/ldap/` Django app with custom auth backend
- Update `core/settings.py` to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and
LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is
incomplete

## Implementation Approach

-  Native integration (not a plugin)
-  Conditional loading based on libraries + config
-  Separate Django app for LDAP logic
-  Clean if statements in settings.py
-  No mixing LDAP code with rest of codebase

Fixes #1664

🤖 Generated with [Claude Code](https://claude.ai/code)
2026-01-05 16:07:27 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00
Nick Sweeting
352e1bad32 remove debug lines 2026-01-05 02:27:34 -08:00
Nick Sweeting
0a2ac11b01 more binary fixes 2026-01-05 02:26:33 -08:00
Nick Sweeting
b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting
5449971777 better kill tree 2026-01-02 04:33:41 -08:00
Nick Sweeting
3da523fc74 more consistent crawl, snapshot, hook cleanup and Process tracking 2026-01-02 04:27:38 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting
c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting
9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
Nick Sweeting
6fadcf5168 remove model health stats from models that dont need it 2026-01-01 15:50:00 -08:00
Nick Sweeting
e903fa1d2b Fix: Make SingleFile use SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS (#1754)
Fixes #1445

This PR resolves the issue where SingleFile was not respecting Chrome
user data directory and other Chrome launch options that work for other
Chrome-based extractors (PDF, Screenshot, etc.).

## Changes
- Added `SINGLEFILE_CHROME_ARGS` config option with fallback to
`CHROME_ARGS`
- Updated SingleFile extractor to pass Chrome arguments via
`--browser-args`
- Updated documentation

This ensures SingleFile respects the same Chrome configuration as other
Chrome-based extractors.

Generated with [Claude Code](https://claude.ai/code)
2026-01-01 14:34:05 -08:00
Nick Sweeting
f7457b13ad more migrations fixes attempts 2025-12-31 17:46:10 -08:00
Nick Sweeting
b08f60a267 Add thumbnail previews to live progress header (#1753)
Show small thumbnails of recently completed ArchiveResult content in the
progress header. The thumbnail strip appears below the stats bar and
shows the last 20 successfully archived items with embeddable content
(screenshots, favicons, DOM snapshots, etc.).

Features:
- API returns recent_thumbnails with embed paths for succeeded results
- Thumbnails display with plugin-specific icons as fallback
- New thumbnails animate in with a pop effect
- Clicking a thumbnail navigates to the snapshot admin page
- Horizontal scrollable strip with custom scrollbar styling

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a thumbnail strip to the live progress header. It shows previews of
the last 20 successful archived items for quick visual feedback and
one-click navigation.

- **New Features**
- API returns recent_thumbnails with embed paths for succeeded results.
  - Horizontal, scrollable thumbnail strip under the header.
  - Uses preview images when available; plugin icons as fallback.
  - New thumbnails animate in with a pop effect.
  - Clicking a thumbnail opens the snapshot admin page.

<sup>Written for commit 17029ba8b8.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-31 17:46:00 -08:00
Nick Sweeting
2e2bc31e6c Remove redundant chrome_validate hook, rename wget_validate to wget_i… (#1752)
…nstall

- Delete chrome/on_Crawl__10_chrome_validate.py (duplicates
chrome_install)
- Rename wget/on_Crawl__11_wget_validate.py →
on_Crawl__06_wget_install.py

All hooks now follow consistent naming: install, launch, or config

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the redundant Chrome validate hook, renamed the Wget validate
hook to wget_install, and standardized hook names and priorities to
match the install/launch/config lifecycle. This removes duplicate logic
and fixes priority conflicts across Crawl, Binary, and Snapshot hooks.

- **Refactors**
- Deleted chrome/on_Crawl__10_chrome_validate.py (dup of chrome_install)
  - Renamed wget validate to on_Crawl__06_wget_install.py
- Standardized on_Binary hook priorities: npm 10, pip 11, brew 12, apt
13, custom 14, env 15
- Fixed on_Snapshot order: staticfile 32, readability 56, mercury 57,
htmltotext 58

<sup>Written for commit 09a1ca3134.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-31 17:42:21 -08:00
Claude
09a1ca3134 Fix hook priority conflicts and standardize on_Binary naming
on_Snapshot priority fixes:
- redirects.bg.js stays at 31, staticfile.bg.js → 32
- headers.js stays at 55, readability.py → 56
- mercury.py → 57, htmltotext.py → 58

on_Binary hooks now have numeric priorities:
- 10: npm_install.py
- 11: pip_install.py
- 12: brew_install.py
- 13: apt_install.py
- 14: custom_install.py
- 15: env_install.py
2026-01-01 01:31:52 +00:00
Nick Sweeting
1c7b0cb2d3 working migrations again 2025-12-31 16:19:50 -08:00