84 Commits

Author SHA1 Message Date
Nick Sweeting
3e7b83ac91 bump versions 2026-04-04 23:10:12 -07:00
Nick Sweeting
f3622d8cd3 update working changes 2026-03-25 05:36:07 -07:00
Nick Sweeting
80243accfd Fix archivebox CI regressions 2026-03-24 15:36:23 -07:00
Nick Sweeting
68d9e30c5f Fix pytest basetemp handling in test harness 2026-03-24 14:46:05 -07:00
Nick Sweeting
ed1ddbc95e Fix CI workflows and migration tests 2026-03-24 13:37:02 -07:00
Nick Sweeting
50286d3c38 Reuse cached binaries in archivebox runtime 2026-03-24 11:03:43 -07:00
Nick Sweeting
e1eb5693c9 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:16:47 -07:00
Nick Sweeting
25f935b9d1 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:15:41 -07:00
Nick Sweeting
8a25704aac add harness tests 2026-03-23 04:12:46 -07:00
Nick Sweeting
1d94645abd test fixes 2026-03-23 04:12:31 -07:00
Nick Sweeting
b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting
f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting
a6548df8d0 Add configurable server security modes (#1773)
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/239

## Summary
- add `SERVER_SECURITY_MODE` presets for safe subdomain replay, safe
one-domain no-JS replay, unsafe one-domain no-admin, and dangerous
one-domain full replay
- make host routing, replay URLs, static serving, and control-plane
access mode-aware
- add strict routing/header coverage plus a browser-backed
Chrome/Puppeteer test that verifies real same-origin behavior in all
four modes

## Testing
- `uv run pytest archivebox/tests/test_urls.py -v`
- `uv run pytest archivebox/tests/test_admin_views.py -v`
- `uv run pytest archivebox/tests/test_server_security_browser.py -v`

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1773"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds configurable server security modes to isolate admin/API from
archived content, with a safe subdomain default and single-domain
fallbacks. Routing, replay endpoints, headers, and middleware are
mode-aware, with browser tests validating same-origin behavior.

- New Features
- Introduced SERVER_SECURITY_MODE with presets:
safe-subdomains-fullreplay (default), safe-onedomain-nojsreplay,
unsafe-onedomain-noadmin, danger-onedomain-fullreplay.
- Mode-aware routing and base URLs; one-domain modes use path-based
replay: /snapshot/<id>/... and /original/<domain>/....
- Control plane gate: block admin/API and non-GET methods in
unsafe-onedomain-noadmin; allow full access in
danger-onedomain-fullreplay.
- Safer replay: detect risky HTML/SVG and apply CSP sandbox (no scripts)
in safe-onedomain-nojsreplay; add X-ArchiveBox-Security-Mode and
X-Content-Type-Options: nosniff on replay responses.
- Middleware and serving: added ServerSecurityModeMiddleware, improved
HostRouting, and static server byte-range/CSP handling.
- Tests: added Chrome/Puppeteer browser tests and stricter URL routing
tests covering all modes.

- Migration
- Default requires wildcard subdomains for full isolation (admin., web.,
api., and snapshot-id.<base>).
- To run on one domain, set SERVER_SECURITY_MODE to a one-domain preset;
URLs switch to /snapshot/<id>/ and /original/<domain>/ paths.
- For production, prefer safe-subdomains-fullreplay; lower-security
modes print a startup warning.

<sup>Written for commit ad41b15581.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-22 20:17:21 -07:00
Nick Sweeting
c87079aa0a Refactor ArchiveBox onto abx-dl bus runner 2026-03-21 11:47:57 -07:00
Nick Sweeting
ad41b15581 Add configurable server security modes 2026-03-15 23:34:40 -07:00
Nick Sweeting
57e11879ec cleanup archivebox tests 2026-03-15 22:09:56 -07:00
Nick Sweeting
bc21d4bfdb type and test fixes 2026-03-15 20:12:27 -07:00
Nick Sweeting
44cabac8d0 fix typing 2026-03-15 19:47:36 -07:00
Nick Sweeting
f932054915 add stricter locking around stage machine models 2026-03-15 19:21:41 -07:00
Nick Sweeting
311e4340ec Fix add CLI input handling and lint regressions 2026-03-15 19:04:13 -07:00
Nick Sweeting
5f0cfe5251 add new persona tests 2026-03-15 18:46:45 -07:00
Nick Sweeting
934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting
70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting
7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting
e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting
1d16038ceb Relax archive output readiness check 2026-03-15 13:31:05 -07:00
Nick Sweeting
957387fd88 Fix plugin hook env and extractor retries 2026-03-15 12:39:27 -07:00
Nick Sweeting
f92ca93ae9 Skip puppeteer browser download during package install 2026-03-15 11:39:43 -07:00
Nick Sweeting
7c55259ed0 Update title HTML test for search export 2026-03-15 11:17:58 -07:00
Nick Sweeting
86fdc3be1e Refresh worker config from resolved plugin installs 2026-03-15 11:07:55 -07:00
Nick Sweeting
47f540c094 Resolve crawl provider dependencies lazily 2026-03-15 10:18:49 -07:00
Nick Sweeting
d4be507a6b Keep provider plugins enabled under whitelists 2026-03-15 09:49:45 -07:00
Nick Sweeting
82bfd7e655 Filter binary hooks by allowed providers 2026-03-15 09:32:32 -07:00
Nick Sweeting
941135d6d0 Bound URL fixture archive wait 2026-03-15 09:07:25 -07:00
Nick Sweeting
50901e5367 Align worker config propagation expectations 2026-03-15 08:47:00 -07:00
Nick Sweeting
31e883ec53 Stabilize plugin and crawl integration tests 2026-03-15 08:16:52 -07:00
Nick Sweeting
bfc1e76ff5 Update extractor tests for plugin output dirs 2026-03-15 07:32:11 -07:00
Nick Sweeting
b62064f63e Avoid recursive crawl timeout regressions 2026-03-15 07:09:15 -07:00
Nick Sweeting
5fb3709281 Run recursive crawl tests to completion 2026-03-15 06:55:35 -07:00
Nick Sweeting
68b9f75dab Stabilize recursive crawl CI coverage 2026-03-15 06:49:40 -07:00
Nick Sweeting
760cf9d6b2 Stabilize CI against expanded plugin surface 2026-03-15 06:31:41 -07:00
Nick Sweeting
1f792d7199 Restore CLI compat and plugin dependency handling 2026-03-15 06:06:18 -07:00
Nick Sweeting
6b482c62df Restore top-level list command compatibility 2026-03-15 05:04:31 -07:00
Nick Sweeting
58f801c220 Fix update orphan import and host-aware tests 2026-03-15 04:51:06 -07:00
Nick Sweeting
4fa701fafe Update abx dependencies and plugin test harness 2026-03-15 04:37:32 -07:00
Nick Sweeting
ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00