Commit Graph

4997 Commits

Author SHA1 Message Date
Nick Sweeting
f3622d8cd3 update working changes 2026-03-25 05:36:07 -07:00
Nick Sweeting
80243accfd Fix archivebox CI regressions 2026-03-24 15:36:23 -07:00
Nick Sweeting
68d9e30c5f Fix pytest basetemp handling in test harness 2026-03-24 14:46:05 -07:00
Nick Sweeting
ed1ddbc95e Fix CI workflows and migration tests 2026-03-24 13:37:02 -07:00
Nick Sweeting
50286d3c38 Reuse cached binaries in archivebox runtime 2026-03-24 11:03:43 -07:00
Nick Sweeting
39450111dd Update CI uv handling and runner changes 2026-03-23 13:27:23 -07:00
Nick Sweeting
e1eb5693c9 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:16:47 -07:00
Nick Sweeting
25f935b9d1 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:15:41 -07:00
Nick Sweeting
f2c81142e1 tweak release script 2026-03-23 04:21:12 -07:00
Nick Sweeting
8a25704aac add harness tests 2026-03-23 04:12:46 -07:00
Nick Sweeting
1d94645abd test fixes 2026-03-23 04:12:31 -07:00
Nick Sweeting
b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting
268856bcfb Preserve common config console handling after rebase 2026-03-22 20:25:53 -07:00
Nick Sweeting
f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting
a6548df8d0 Add configurable server security modes (#1773)
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/239

## Summary
- add `SERVER_SECURITY_MODE` presets for safe subdomain replay, safe
one-domain no-JS replay, unsafe one-domain no-admin, and dangerous
one-domain full replay
- make host routing, replay URLs, static serving, and control-plane
access mode-aware
- add strict routing/header coverage plus a browser-backed
Chrome/Puppeteer test that verifies real same-origin behavior in all
four modes

## Testing
- `uv run pytest archivebox/tests/test_urls.py -v`
- `uv run pytest archivebox/tests/test_admin_views.py -v`
- `uv run pytest archivebox/tests/test_server_security_browser.py -v`

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1773"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds configurable server security modes to isolate admin/API from
archived content, with a safe subdomain default and single-domain
fallbacks. Routing, replay endpoints, headers, and middleware are
mode-aware, with browser tests validating same-origin behavior.

- New Features
- Introduced SERVER_SECURITY_MODE with presets:
safe-subdomains-fullreplay (default), safe-onedomain-nojsreplay,
unsafe-onedomain-noadmin, danger-onedomain-fullreplay.
- Mode-aware routing and base URLs; one-domain modes use path-based
replay: /snapshot/<id>/... and /original/<domain>/....
- Control plane gate: block admin/API and non-GET methods in
unsafe-onedomain-noadmin; allow full access in
danger-onedomain-fullreplay.
- Safer replay: detect risky HTML/SVG and apply CSP sandbox (no scripts)
in safe-onedomain-nojsreplay; add X-ArchiveBox-Security-Mode and
X-Content-Type-Options: nosniff on replay responses.
- Middleware and serving: added ServerSecurityModeMiddleware, improved
HostRouting, and static server byte-range/CSP handling.
- Tests: added Chrome/Puppeteer browser tests and stricter URL routing
tests covering all modes.

- Migration
- Default requires wildcard subdomains for full isolation (admin., web.,
api., and snapshot-id.<base>).
- To run on one domain, set SERVER_SECURITY_MODE to a one-domain preset;
URLs switch to /snapshot/<id>/ and /original/<domain>/ paths.
- For production, prefer safe-subdomains-fullreplay; lower-security
modes print a startup warning.

<sup>Written for commit ad41b15581.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-22 20:17:21 -07:00
Nick Sweeting
c87079aa0a Refactor ArchiveBox onto abx-dl bus runner 2026-03-21 11:47:57 -07:00
Nick Sweeting
ee9ed440d1 bump dependencies 2026-03-21 10:23:59 -07:00
Nick Sweeting
ad41b15581 Add configurable server security modes 2026-03-15 23:34:40 -07:00
Nick Sweeting
26f6d68cf5 bump dep version 2026-03-15 22:41:06 -07:00
Nick Sweeting
6b0cfbc522 revert docker to use pip again 2026-03-15 22:10:04 -07:00
Nick Sweeting
57e11879ec cleanup archivebox tests 2026-03-15 22:09:56 -07:00
Nick Sweeting
9de084da65 bump package versions 2026-03-15 20:47:28 -07:00
Nick Sweeting
bc21d4bfdb type and test fixes 2026-03-15 20:12:27 -07:00
Nick Sweeting
3889eb4efa Tighten config and admin typing 2026-03-15 19:49:52 -07:00
Nick Sweeting
44cabac8d0 fix typing 2026-03-15 19:47:36 -07:00
Nick Sweeting
4756697a17 Use ruff pyright and ty for linting 2026-03-15 19:43:59 -07:00
Nick Sweeting
49436af869 Tighten CLI and admin typing 2026-03-15 19:33:15 -07:00
Nick Sweeting
5381f7584c Tighten API typing and add return values 2026-03-15 19:24:54 -07:00
Nick Sweeting
95a105feb9 small fixes 2026-03-15 19:22:06 -07:00
Nick Sweeting
f932054915 add stricter locking around stage machine models 2026-03-15 19:21:41 -07:00
Nick Sweeting
311e4340ec Fix add CLI input handling and lint regressions 2026-03-15 19:04:13 -07:00
Nick Sweeting
5f0cfe5251 add new persona tests 2026-03-15 18:46:45 -07:00
Nick Sweeting
934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting
f97725d16f Mark version as 0.9.10rc0 (pre-release) per PEP 440
Uses rc suffix so docs system correctly identifies 0.9.x as pre-release.
Remove the rc suffix when ready to declare 0.9.x stable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:44:31 -07:00
Nick Sweeting
70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting
7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting
e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting
21a0a27091 Remove 7 dead functions and 4 unused imports from hooks.py
Dead functions: extract_step, run_hooks, is_parser_plugin,
get_all_plugin_icons, discover_plugin_templates, find_binary_for_cmd,
create_model_record, get_parser_plugins

Dead imports: re, signal, subprocess, django.utils.timezone

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 16:34:20 -07:00
Nick Sweeting
002de811e2 bump dep versions 2026-03-15 15:28:19 -07:00
Nick Sweeting
f0b255914d bump dep versions 2026-03-15 14:57:01 -07:00
Nick Sweeting
0ac83c8799 Wait for crawl hook records before advancing 2026-03-15 14:15:04 -07:00
Nick Sweeting
1d16038ceb Relax archive output readiness check 2026-03-15 13:31:05 -07:00
Nick Sweeting
2585ef5870 Use npm package for readability extractor installs 2026-03-15 13:09:18 -07:00
Nick Sweeting
957387fd88 Fix plugin hook env and extractor retries 2026-03-15 12:39:27 -07:00
Nick Sweeting
1fc860e901 Remove legacy binary override coercion 2026-03-15 11:45:04 -07:00
Nick Sweeting
f92ca93ae9 Skip puppeteer browser download during package install 2026-03-15 11:39:43 -07:00
Nick Sweeting
7c55259ed0 Update title HTML test for search export 2026-03-15 11:17:58 -07:00
Nick Sweeting
86fdc3be1e Refresh worker config from resolved plugin installs 2026-03-15 11:07:55 -07:00
Nick Sweeting
47f540c094 Resolve crawl provider dependencies lazily 2026-03-15 10:18:49 -07:00
Nick Sweeting
d4be507a6b Keep provider plugins enabled under whitelists 2026-03-15 09:49:45 -07:00