Commit Graph

2261 Commits

Author SHA1 Message Date
Nick Sweeting
d4be507a6b Keep provider plugins enabled under whitelists 2026-03-15 09:49:45 -07:00
Nick Sweeting
82bfd7e655 Filter binary hooks by allowed providers 2026-03-15 09:32:32 -07:00
Nick Sweeting
941135d6d0 Bound URL fixture archive wait 2026-03-15 09:07:25 -07:00
Nick Sweeting
50901e5367 Align worker config propagation expectations 2026-03-15 08:47:00 -07:00
Nick Sweeting
31e883ec53 Stabilize plugin and crawl integration tests 2026-03-15 08:16:52 -07:00
Nick Sweeting
bfc1e76ff5 Update extractor tests for plugin output dirs 2026-03-15 07:32:11 -07:00
Nick Sweeting
b62064f63e Avoid recursive crawl timeout regressions 2026-03-15 07:09:15 -07:00
Nick Sweeting
5fb3709281 Run recursive crawl tests to completion 2026-03-15 06:55:35 -07:00
Nick Sweeting
68b9f75dab Stabilize recursive crawl CI coverage 2026-03-15 06:49:40 -07:00
Nick Sweeting
760cf9d6b2 Stabilize CI against expanded plugin surface 2026-03-15 06:31:41 -07:00
Nick Sweeting
1f792d7199 Restore CLI compat and plugin dependency handling 2026-03-15 06:06:18 -07:00
Nick Sweeting
6b482c62df Restore top-level list command compatibility 2026-03-15 05:04:31 -07:00
Nick Sweeting
c4d30a853f Restore index-only snapshot output links 2026-03-15 04:58:46 -07:00
Nick Sweeting
cc3e72b92f Preserve tags for index-only adds 2026-03-15 04:54:55 -07:00
Nick Sweeting
58f801c220 Fix update orphan import and host-aware tests 2026-03-15 04:51:06 -07:00
Nick Sweeting
4fa701fafe Update abx dependencies and plugin test harness 2026-03-15 04:37:32 -07:00
Nick Sweeting
ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting
07dc880d0b Harden AddView config overrides to admin-only 2026-03-15 03:45:57 -07:00
Your Name
08b0dfaf12 Fix #1139: Return tags as a JSON list in Snapshot.to_dict() for LLM/RAG integration
Previously, `archivebox search --json` exported tags as a comma-separated
string (e.g. "tag1,tag2"), which required manual parsing by consumers like
LlamaIndex, LangChain, and other RAG frameworks.

Now `to_dict()` returns tags as a proper JSON array (e.g. ["tag1", "tag2"]),
making the export directly usable as structured metadata in LLM/RAG pipelines
without additional preprocessing.

`from_json()` is updated to accept both list and string formats for backward
compatibility with existing JSON imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:21:38 -08:00
Nick Sweeting
17e26ae5a4 Delete TEST_RESULTS.md 2026-02-09 18:23:35 -08:00
Pellaeon Lin
1ca54525f2 FIX: uuid_compat 2026-01-31 08:24:50 +00:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
f3f55d3395 perfect snapshot detail cards 2026-01-19 14:56:15 -08:00
Nick Sweeting
86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting
bef67760db working singlefile 2026-01-19 03:05:49 -08:00
Nick Sweeting
b5bbc3b549 better tui 2026-01-19 01:53:32 -08:00
Nick Sweeting
1cb2d5070e bump version 2026-01-19 01:11:59 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00
Nick Sweeting
352e1bad32 remove debug lines 2026-01-05 02:27:34 -08:00
Nick Sweeting
0a2ac11b01 more binary fixes 2026-01-05 02:26:33 -08:00
Nick Sweeting
b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting
5449971777 better kill tree 2026-01-02 04:33:41 -08:00
Nick Sweeting
3da523fc74 more consistent crawl, snapshot, hook cleanup and Process tracking 2026-01-02 04:27:38 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting
c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting
9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
Nick Sweeting
6fadcf5168 remove model health stats from models that dont need it 2026-01-01 15:50:00 -08:00
Nick Sweeting
e903fa1d2b Fix: Make SingleFile use SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS (#1754)
Fixes #1445

This PR resolves the issue where SingleFile was not respecting Chrome
user data directory and other Chrome launch options that work for other
Chrome-based extractors (PDF, Screenshot, etc.).

## Changes
- Added `SINGLEFILE_CHROME_ARGS` config option with fallback to
`CHROME_ARGS`
- Updated SingleFile extractor to pass Chrome arguments via
`--browser-args`
- Updated documentation

This ensures SingleFile respects the same Chrome configuration as other
Chrome-based extractors.

Generated with [Claude Code](https://claude.ai/code)
2026-01-01 14:34:05 -08:00
Nick Sweeting
f7457b13ad more migrations fixes attempts 2025-12-31 17:46:10 -08:00
Nick Sweeting
b08f60a267 Add thumbnail previews to live progress header (#1753)
Show small thumbnails of recently completed ArchiveResult content in the
progress header. The thumbnail strip appears below the stats bar and
shows the last 20 successfully archived items with embeddable content
(screenshots, favicons, DOM snapshots, etc.).

Features:
- API returns recent_thumbnails with embed paths for succeeded results
- Thumbnails display with plugin-specific icons as fallback
- New thumbnails animate in with a pop effect
- Clicking a thumbnail navigates to the snapshot admin page
- Horizontal scrollable strip with custom scrollbar styling

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a thumbnail strip to the live progress header. It shows previews of
the last 20 successful archived items for quick visual feedback and
one-click navigation.

- **New Features**
- API returns recent_thumbnails with embed paths for succeeded results.
  - Horizontal, scrollable thumbnail strip under the header.
  - Uses preview images when available; plugin icons as fallback.
  - New thumbnails animate in with a pop effect.
  - Clicking a thumbnail opens the snapshot admin page.

<sup>Written for commit 17029ba8b8.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-31 17:46:00 -08:00
Nick Sweeting
2e2bc31e6c Remove redundant chrome_validate hook, rename wget_validate to wget_i… (#1752)
…nstall

- Delete chrome/on_Crawl__10_chrome_validate.py (duplicates
chrome_install)
- Rename wget/on_Crawl__11_wget_validate.py →
on_Crawl__06_wget_install.py

All hooks now follow consistent naming: install, launch, or config

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the redundant Chrome validate hook, renamed the Wget validate
hook to wget_install, and standardized hook names and priorities to
match the install/launch/config lifecycle. This removes duplicate logic
and fixes priority conflicts across Crawl, Binary, and Snapshot hooks.

- **Refactors**
- Deleted chrome/on_Crawl__10_chrome_validate.py (dup of chrome_install)
  - Renamed wget validate to on_Crawl__06_wget_install.py
- Standardized on_Binary hook priorities: npm 10, pip 11, brew 12, apt
13, custom 14, env 15
- Fixed on_Snapshot order: staticfile 32, readability 56, mercury 57,
htmltotext 58

<sup>Written for commit 09a1ca3134.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-31 17:42:21 -08:00