Commit Graph

61 Commits

Author SHA1 Message Date
Nick Sweeting
7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting
e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting
1d16038ceb Relax archive output readiness check 2026-03-15 13:31:05 -07:00
Nick Sweeting
957387fd88 Fix plugin hook env and extractor retries 2026-03-15 12:39:27 -07:00
Nick Sweeting
f92ca93ae9 Skip puppeteer browser download during package install 2026-03-15 11:39:43 -07:00
Nick Sweeting
7c55259ed0 Update title HTML test for search export 2026-03-15 11:17:58 -07:00
Nick Sweeting
86fdc3be1e Refresh worker config from resolved plugin installs 2026-03-15 11:07:55 -07:00
Nick Sweeting
47f540c094 Resolve crawl provider dependencies lazily 2026-03-15 10:18:49 -07:00
Nick Sweeting
d4be507a6b Keep provider plugins enabled under whitelists 2026-03-15 09:49:45 -07:00
Nick Sweeting
82bfd7e655 Filter binary hooks by allowed providers 2026-03-15 09:32:32 -07:00
Nick Sweeting
941135d6d0 Bound URL fixture archive wait 2026-03-15 09:07:25 -07:00
Nick Sweeting
50901e5367 Align worker config propagation expectations 2026-03-15 08:47:00 -07:00
Nick Sweeting
31e883ec53 Stabilize plugin and crawl integration tests 2026-03-15 08:16:52 -07:00
Nick Sweeting
bfc1e76ff5 Update extractor tests for plugin output dirs 2026-03-15 07:32:11 -07:00
Nick Sweeting
b62064f63e Avoid recursive crawl timeout regressions 2026-03-15 07:09:15 -07:00
Nick Sweeting
5fb3709281 Run recursive crawl tests to completion 2026-03-15 06:55:35 -07:00
Nick Sweeting
68b9f75dab Stabilize recursive crawl CI coverage 2026-03-15 06:49:40 -07:00
Nick Sweeting
760cf9d6b2 Stabilize CI against expanded plugin surface 2026-03-15 06:31:41 -07:00
Nick Sweeting
1f792d7199 Restore CLI compat and plugin dependency handling 2026-03-15 06:06:18 -07:00
Nick Sweeting
6b482c62df Restore top-level list command compatibility 2026-03-15 05:04:31 -07:00
Nick Sweeting
58f801c220 Fix update orphan import and host-aware tests 2026-03-15 04:51:06 -07:00
Nick Sweeting
4fa701fafe Update abx dependencies and plugin test harness 2026-03-15 04:37:32 -07:00
Nick Sweeting
ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
claude[bot]
c2bb4b25cb Implement native LDAP authentication support
- Create archivebox/config/ldap.py with LDAPConfig class
- Create archivebox/ldap/ Django app with custom auth backend
- Update core/settings.py to conditionally load LDAP when enabled
- Add LDAP_CREATE_SUPERUSER support to auto-grant superuser privileges
- Add comprehensive tests in test_auth_ldap.py (no mocks, no skips)
- LDAP only activates if django-auth-ldap is installed and LDAP_ENABLED=True
- Helpful error messages when LDAP libraries are missing or config is incomplete

Fixes #1664

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2026-01-05 21:30:26 +00:00
Nick Sweeting
28b980a84a higher timeout 2026-01-05 09:07:59 -08:00
Nick Sweeting
352e1bad32 remove debug lines 2026-01-05 02:27:34 -08:00
Nick Sweeting
b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
20690fabbf Fix CLI tests to use subprocess and remove mocks (#1746) 2025-12-31 10:20:50 -08:00
Claude
2e6dcb2b87 Improve admin snapshot list/grid views with better UX
- Add prominent view mode switcher with List/Grid toggle buttons
- Improve filter sidebar CSS with modern styling, rounded corners
- Add live progress bar for in-progress snapshots showing hooks status
- Show plugin icons only when output directory has content
- Display archive result output_size sum from new field
- Show hooks succeeded/total count in size column
- Add get_progress_stats() method to Snapshot model
- Add CSS for progress spinner and status badges
- Update grid view template with progress indicator for archiving cards
- Add tests for admin views, search, and progress stats
2025-12-31 11:28:03 +00:00
Claude
b87bbbbecb Fix CLI tests to use subprocess and remove mocks
- Fix conftest.py: use subprocess for init, remove unused cli_env fixture
- Update all test files to use data_dir parameter instead of env
- Remove mock-based TestJSONLOutput class from tests_piping.py
- Remove unused imports (MagicMock, patch)
- Fix file permissions for cli_utils.py

All tests now use real subprocess calls per CLAUDE.md guidelines:
- NO MOCKS - tests exercise real code paths
- NO SKIPS - every test runs
2025-12-31 10:53:45 +00:00
Claude
bb52b5902a Add unit tests for JSONL CLI pipeline commands (Phase 5 & 6)
Add comprehensive unit tests for the CLI piping architecture:
- test_cli_crawl.py: crawl create/list/update/delete tests
- test_cli_snapshot.py: snapshot create/list/update/delete tests
- test_cli_archiveresult.py: archiveresult create/list/update/delete tests
- test_cli_run.py: run command create-or-update and pass-through tests

Extend tests_piping.py with:
- TestPassThroughBehavior: tests for pass-through behavior in all commands
- TestPipelineAccumulation: tests for accumulating records through pipeline

All tests use pytest fixtures from conftest.py with isolated DATA_DIR.
2025-12-31 10:21:05 +00:00
Claude
f3e11b61fd Implement JSONL CLI pipeline architecture (Phases 1-4, 6)
Phase 1: Model Prerequisites
- Add ArchiveResult.from_json() and from_jsonl() methods
- Fix Snapshot.to_json() to use tags_str (consistent with Crawl)

Phase 2: Shared Utilities
- Create archivebox/cli/cli_utils.py with shared apply_filters()
- Update 7 CLI files to import from cli_utils.py instead of duplicating

Phase 3: Pass-Through Behavior
- Add pass-through to crawl create (non-Crawl records pass unchanged)
- Add pass-through to snapshot create (Crawl records + others pass through)
- Add pass-through to archiveresult create (Snapshot records + others)
- Add create-or-update behavior to run command:
  - Records WITHOUT id: Create via Model.from_json()
  - Records WITH id: Lookup existing, re-queue
  - Outputs JSONL of all processed records for chaining

Phase 4: Test Infrastructure
- Create archivebox/tests/conftest.py with pytest-django fixtures
- Include CLI helpers, output assertions, database assertions

Phase 6: Config Update
- Update supervisord_util.py: orchestrator -> run command

This enables Unix-style piping:
  archivebox crawl create URL | archivebox run
  archivebox archiveresult list --status=failed | archivebox run
  curl API | jq transform | archivebox crawl create | archivebox run
2025-12-31 10:07:14 +00:00
Claude
a5654e877f rename media plugin to ytdlp with backwards-compatible aliases
- Rename archivebox/plugins/media/ → archivebox/plugins/ytdlp/
- Rename hook script on_Snapshot__63_media.bg.py → on_Snapshot__63_ytdlp.bg.py
- Update config.json: YTDLP_* as primary keys, MEDIA_* as x-aliases
- Update templates CSS classes: media-* → ytdlp-*
- Fix gallerydl bug: remove incorrect dependency on media plugin output
- Update all codebase references to use YTDLP_* and SAVE_YTDLP
- Add backwards compatibility test for MEDIA_ENABLED alias
2025-12-29 19:09:05 +00:00
Nick Sweeting
30c60eef76 much better tests and add page ui 2025-12-29 04:02:11 -08:00
Nick Sweeting
f4e7820533 use full dotted paths for all archivebox imports, add migrations and more fixes 2025-12-29 00:47:08 -08:00
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Nick Sweeting
bd265c0083 rename extractor to plugin everywhere 2025-12-28 04:43:15 -08:00
Nick Sweeting
50e527ec65 way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Nick Sweeting
a38624a4dd Improve filesystem based hook architecture (#1720)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-27 13:03:21 -08:00
Claude
d65eb587d9 Add hook architecture unit tests + mark remaining work complete
- Add test_hooks.py with 31 unit tests covering:
  - Background hook detection (.bg. suffix)
  - JSONL parsing (clean format and legacy RESULT_JSON= format)
  - Install hook XYZ_BINARY env var handling
  - Hook discovery and sorting
  - get_extractor_name() function
  - Hook execution with real subprocesses
  - Install hook output format compliance
  - Snapshot hook output format compliance
  - Plugin metadata addition

- Update TODO_hook_architecture.md to mark all tasks complete:
  - Tests: 31 tests in archivebox/tests/test_hooks.py
  - Migrations: 0029 and 0030 applied successfully

All phases of the hook architecture implementation are now complete.
2025-12-27 20:05:09 +00:00
Nick Sweeting
cffbef84ed make Claude.md stricter and improve migration tests 2025-12-27 00:33:51 -08:00
Nick Sweeting
35dd9acafe implement fs_version migrations 2025-12-27 00:25:35 -08:00
Claude
779040db1b Split migration tests into separate files and tighten assertions
- Split tests_migrations.py into focused test modules:
  - test_migrations_helpers.py: schemas, seeding functions, verification helpers
  - test_migrations_fresh.py: fresh install tests (12 tests)
  - test_migrations_04_to_09.py: 0.4.x migration tests (9 tests)
  - test_migrations_07_to_09.py: 0.7.x migration tests (19 tests)
  - test_migrations_08_to_09.py: 0.8.x migration tests (21 tests)

- Tighten all assertions:
  - init command now requires returncode == 0 (not [0, 1])
  - verify_all_snapshots_in_output checks ALL snapshots appear (not just one)
  - verify_tag_count uses exact match (not >=)
  - verify_snapshot_titles checks all URLs exist

- All 61 tests pass with strict assertions
- No mocks, no skips - real subprocess tests against real sqlite databases
2025-12-27 05:09:36 +00:00