606 Commits

Author SHA1 Message Date
Nick Sweeting
f3622d8cd3 update working changes 2026-03-25 05:36:07 -07:00
Nick Sweeting
50286d3c38 Reuse cached binaries in archivebox runtime 2026-03-24 11:03:43 -07:00
Nick Sweeting
39450111dd Update CI uv handling and runner changes 2026-03-23 13:27:23 -07:00
Nick Sweeting
25f935b9d1 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:15:41 -07:00
Nick Sweeting
b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting
f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting
a6548df8d0 Add configurable server security modes (#1773)
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/239

## Summary
- add `SERVER_SECURITY_MODE` presets for safe subdomain replay, safe
one-domain no-JS replay, unsafe one-domain no-admin, and dangerous
one-domain full replay
- make host routing, replay URLs, static serving, and control-plane
access mode-aware
- add strict routing/header coverage plus a browser-backed
Chrome/Puppeteer test that verifies real same-origin behavior in all
four modes

## Testing
- `uv run pytest archivebox/tests/test_urls.py -v`
- `uv run pytest archivebox/tests/test_admin_views.py -v`
- `uv run pytest archivebox/tests/test_server_security_browser.py -v`

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1773"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds configurable server security modes to isolate admin/API from
archived content, with a safe subdomain default and single-domain
fallbacks. Routing, replay endpoints, headers, and middleware are
mode-aware, with browser tests validating same-origin behavior.

- New Features
- Introduced SERVER_SECURITY_MODE with presets:
safe-subdomains-fullreplay (default), safe-onedomain-nojsreplay,
unsafe-onedomain-noadmin, danger-onedomain-fullreplay.
- Mode-aware routing and base URLs; one-domain modes use path-based
replay: /snapshot/<id>/... and /original/<domain>/....
- Control plane gate: block admin/API and non-GET methods in
unsafe-onedomain-noadmin; allow full access in
danger-onedomain-fullreplay.
- Safer replay: detect risky HTML/SVG and apply CSP sandbox (no scripts)
in safe-onedomain-nojsreplay; add X-ArchiveBox-Security-Mode and
X-Content-Type-Options: nosniff on replay responses.
- Middleware and serving: added ServerSecurityModeMiddleware, improved
HostRouting, and static server byte-range/CSP handling.
- Tests: added Chrome/Puppeteer browser tests and stricter URL routing
tests covering all modes.

- Migration
- Default requires wildcard subdomains for full isolation (admin., web.,
api., and snapshot-id.<base>).
- To run on one domain, set SERVER_SECURITY_MODE to a one-domain preset;
URLs switch to /snapshot/<id>/ and /original/<domain>/ paths.
- For production, prefer safe-subdomains-fullreplay; lower-security
modes print a startup warning.

<sup>Written for commit ad41b15581.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-22 20:17:21 -07:00
Nick Sweeting
c87079aa0a Refactor ArchiveBox onto abx-dl bus runner 2026-03-21 11:47:57 -07:00
Nick Sweeting
ad41b15581 Add configurable server security modes 2026-03-15 23:34:40 -07:00
Nick Sweeting
57e11879ec cleanup archivebox tests 2026-03-15 22:09:56 -07:00
Nick Sweeting
9de084da65 bump package versions 2026-03-15 20:47:28 -07:00
Nick Sweeting
bc21d4bfdb type and test fixes 2026-03-15 20:12:27 -07:00
Nick Sweeting
3889eb4efa Tighten config and admin typing 2026-03-15 19:49:52 -07:00
Nick Sweeting
44cabac8d0 fix typing 2026-03-15 19:47:36 -07:00
Nick Sweeting
934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting
70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting
7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting
e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting
31e883ec53 Stabilize plugin and crawl integration tests 2026-03-15 08:16:52 -07:00
Nick Sweeting
1f792d7199 Restore CLI compat and plugin dependency handling 2026-03-15 06:06:18 -07:00
Nick Sweeting
c4d30a853f Restore index-only snapshot output links 2026-03-15 04:58:46 -07:00
Nick Sweeting
4fa701fafe Update abx dependencies and plugin test harness 2026-03-15 04:37:32 -07:00
Nick Sweeting
ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting
07dc880d0b Harden AddView config overrides to admin-only 2026-03-15 03:45:57 -07:00
Your Name
08b0dfaf12 Fix #1139: Return tags as a JSON list in Snapshot.to_dict() for LLM/RAG integration
Previously, `archivebox search --json` exported tags as a comma-separated
string (e.g. "tag1,tag2"), which required manual parsing by consumers like
LlamaIndex, LangChain, and other RAG frameworks.

Now `to_dict()` returns tags as a proper JSON array (e.g. ["tag1", "tag2"]),
making the export directly usable as structured metadata in LLM/RAG pipelines
without additional preprocessing.

`from_json()` is updated to accept both list and string formats for backward
compatibility with existing JSON imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:21:38 -08:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
f3f55d3395 perfect snapshot detail cards 2026-01-19 14:56:15 -08:00
Nick Sweeting
86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting
3da523fc74 more consistent crawl, snapshot, hook cleanup and Process tracking 2026-01-02 04:27:38 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting
9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
Nick Sweeting
6fadcf5168 remove model health stats from models that dont need it 2026-01-01 15:50:00 -08:00
Nick Sweeting
f7457b13ad more migrations fixes attempts 2025-12-31 17:46:10 -08:00
Nick Sweeting
b08f60a267 Add thumbnail previews to live progress header (#1753)
Show small thumbnails of recently completed ArchiveResult content in the
progress header. The thumbnail strip appears below the stats bar and
shows the last 20 successfully archived items with embeddable content
(screenshots, favicons, DOM snapshots, etc.).

Features:
- API returns recent_thumbnails with embed paths for succeeded results
- Thumbnails display with plugin-specific icons as fallback
- New thumbnails animate in with a pop effect
- Clicking a thumbnail navigates to the snapshot admin page
- Horizontal scrollable strip with custom scrollbar styling

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a thumbnail strip to the live progress header. It shows previews of
the last 20 successful archived items for quick visual feedback and
one-click navigation.

- **New Features**
- API returns recent_thumbnails with embed paths for succeeded results.
  - Horizontal, scrollable thumbnail strip under the header.
  - Uses preview images when available; plugin icons as fallback.
  - New thumbnails animate in with a pop effect.
  - Clicking a thumbnail opens the snapshot admin page.

<sup>Written for commit 17029ba8b8.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-31 17:46:00 -08:00
Nick Sweeting
1c7b0cb2d3 working migrations again 2025-12-31 16:19:50 -08:00
Nick Sweeting
6521e7ddda more migrations fixes 2025-12-31 16:10:56 -08:00
Nick Sweeting
a04e4a7345 cleanup migrations, json, jsonl 2025-12-31 15:36:43 -08:00
Claude
17029ba8b8 Add thumbnail strip to live progress monitor
Show small thumbnails of recently completed ArchiveResult content in the
progress header. The thumbnail strip appears below the stats bar and shows
the last 20 successfully archived items with embeddable content (screenshots,
favicons, DOM snapshots, etc.).

Features:
- API returns recent_thumbnails with embed paths for succeeded results
- Thumbnails display with plugin-specific icons as fallback
- New thumbnails animate in with a pop effect
- Clicking a thumbnail navigates to the snapshot admin page
- Horizontal scrollable strip with custom scrollbar styling
2025-12-31 20:38:55 +00:00
Nick Sweeting
73fde81fce more migrations tweaks 2025-12-31 12:34:31 -08:00
Nick Sweeting
72f6a91b31 more progress bar and migrations fixes 2025-12-31 12:34:31 -08:00
Nick Sweeting
d5c0c64dcd fix progress bars 2025-12-31 12:34:29 -08:00