Commit Graph

4833 Commits

Author SHA1 Message Date
Claude
d65eb587d9 Add hook architecture unit tests + mark remaining work complete
- Add test_hooks.py with 31 unit tests covering:
  - Background hook detection (.bg. suffix)
  - JSONL parsing (clean format and legacy RESULT_JSON= format)
  - Install hook XYZ_BINARY env var handling
  - Hook discovery and sorting
  - get_extractor_name() function
  - Hook execution with real subprocesses
  - Install hook output format compliance
  - Snapshot hook output format compliance
  - Plugin metadata addition

- Update TODO_hook_architecture.md to mark all tasks complete:
  - Tests: 31 tests in archivebox/tests/test_hooks.py
  - Migrations: 0029 and 0030 applied successfully

All phases of the hook architecture implementation are now complete.
2025-12-27 20:05:09 +00:00
Claude
4e50c4f182 Mark snapshot hook checklist items as complete
All snapshot hooks now:
- Read XYZ_BINARY env vars and use in cmd
- Output exactly one clean JSONL line (no RESULT_JSON= prefix)
- No extra output lines (VERSION=, START_TS=, etc.)
- Only provide allowed fields
- Don't include computed fields
- Python hooks include cmd array with binary path
2025-12-27 10:14:14 +00:00
Claude
e3ba599812 Update install hooks to respect XYZ_BINARY env vars
- All install hooks now respect their respective XYZ_BINARY env vars
  (e.g., WGET_BINARY, CHROME_BINARY, YTDLP_BINARY, etc.)
- Support both absolute paths (/usr/bin/wget2) and binary names (wget2)
- Dynamic bin_name used in Dependency JSONL output
- Updated 11 install hooks to follow the new pattern
- Mark checklist items as complete in TODO_hook_architecture.md
2025-12-27 10:12:45 +00:00
Claude
8c846b7d1c Rename validate hooks to install hooks
- Rename 13 on_Crawl__00_validate_* hooks to on_Crawl__00_install_*
- This better reflects what these hooks actually do (check/install binaries)
- Update TODO_hook_architecture.md to reflect renamed hooks
2025-12-27 10:06:34 +00:00
Claude
2623c6cc11 Complete JS hooks to clean JSONL format + rename background hooks
- Update 12 remaining JS snapshot hooks to output clean JSONL
- Remove RESULT_JSON= prefix, START_TS=, END_TS=, STATUS= output
- Rename 3 background hooks with .bg. suffix:
  - consolelog -> on_Snapshot__21_consolelog.bg.js
  - ssl -> on_Snapshot__23_ssl.bg.js
  - responses -> on_Snapshot__24_responses.bg.js
- Update TODO_hook_architecture.md with completion status
2025-12-27 09:46:59 +00:00
Claude
c52eef1459 Update Python/JS hooks to clean JSONL format + add audit report
Phase 4 Plugin Audit Progress:
- Audited all 6 Dependency hooks (all already compliant)
- Audited all 11 Crawl Validate hooks (all already compliant)
- Updated 8 Python Snapshot hooks to clean JSONL format
- Updated 1 JS Snapshot hook (title.js) to clean JSONL format

Snapshot hooks updated to remove:
- RESULT_JSON= prefix
- Extra output lines (START_TS=, END_TS=, DURATION=, VERSION=, OUTPUT=, STATUS=)

Now output clean JSONL:
{"type": "ArchiveResult", "status": "...", "output_str": "..."}

Added implementation report to TODO_hook_architecture.md documenting:
- All completed phases (1, 3, 6, 7)
- Plugin audit results with status tables
- Remaining 13 JS hooks that need updating
- Files modified list
2025-12-27 09:31:03 +00:00
Claude
3d985fa8c8 Implement hook architecture with JSONL output support
Phase 1: Database migration for new ArchiveResult fields
- Add output_str (TextField) for human-readable summary
- Add output_json (JSONField) for structured metadata
- Add output_files (JSONField) for dict of {relative_path: {}}
- Add output_size (BigIntegerField) for total bytes
- Add output_mimetypes (CharField) for CSV of mimetypes
- Add binary FK to InstalledBinary (optional)
- Migrate existing 'output' field to new split fields

Phase 3: Update run_hook() for JSONL parsing
- Support new JSONL format (any line with {type: 'ModelName', ...})
- Maintain backwards compatibility with RESULT_JSON= format
- Add plugin metadata to each parsed record
- Detect background hooks with .bg. suffix in filename
- Add find_binary_for_cmd() helper function
- Add create_model_record() for processing side-effect records

Phase 6: Update ArchiveResult.run()
- Handle background hooks (return immediately when result is None)
- Process 'records' from HookResult for side-effect models
- Use new output fields (output_str, output_json, output_files, etc.)
- Call create_model_record() for InstalledBinary, Machine updates

Phase 7: Add background hook support
- Add is_background_hook() method to ArchiveResult
- Add check_background_completed() to check if process exited
- Add finalize_background_hook() to collect results from completed hooks
- Update SnapshotMachine.is_finished() to check/finalize background hooks
- Update _populate_output_fields() to walk directory and populate stats

Also updated references to old 'output' field in:
- admin_archiveresults.py
- statemachines.py
- templatetags/core_tags.py
2025-12-27 08:38:49 +00:00
Nick Sweeting
cffbef84ed make Claude.md stricter and improve migration tests 2025-12-27 00:33:51 -08:00
Nick Sweeting
35dd9acafe implement fs_version migrations 2025-12-27 00:25:35 -08:00
Nick Sweeting
6e892fb2b4 upgrade todos 2025-12-27 00:07:11 -08:00
Nick Sweeting
018f8d69cc Improve test suite and remove mock dependencies (#1719)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [x] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-26 22:18:56 -08:00
Claude
995d6c31b9 Add CLAUDE.md with development and testing guide 2025-12-27 06:17:55 +00:00
Claude
741c098a2b Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh 2025-12-27 05:53:06 +00:00
Claude
779040db1b Split migration tests into separate files and tighten assertions
- Split tests_migrations.py into focused test modules:
  - test_migrations_helpers.py: schemas, seeding functions, verification helpers
  - test_migrations_fresh.py: fresh install tests (12 tests)
  - test_migrations_04_to_09.py: 0.4.x migration tests (9 tests)
  - test_migrations_07_to_09.py: 0.7.x migration tests (19 tests)
  - test_migrations_08_to_09.py: 0.8.x migration tests (21 tests)

- Tighten all assertions:
  - init command now requires returncode == 0 (not [0, 1])
  - verify_all_snapshots_in_output checks ALL snapshots appear (not just one)
  - verify_tag_count uses exact match (not >=)
  - verify_snapshot_titles checks all URLs exist

- All 61 tests pass with strict assertions
- No mocks, no skips - real subprocess tests against real sqlite databases
2025-12-27 05:09:36 +00:00
Nick Sweeting
2f81c0cc76 add overrides options to binproviders 2025-12-26 20:39:56 -08:00
Claude
05205a085f Update uv.lock 2025-12-27 04:33:35 +00:00
Claude
ea6fe94c93 Add crawls_crawlschedule table to 0.8.x test schema and fix migrations
- Add missing crawls_crawlschedule table definition to SCHEMA_0_8 in test file
- Record all replaced dev branch migrations (0023-0074) for squashed migration
- Update 0024_snapshot_crawl migration to depend on squashed machine migration
- Remove 'extractor' field references from crawls admin
- All 45 migration tests now pass (0.4.x, 0.7.x, 0.8.x, fresh install)
2025-12-27 04:32:58 +00:00
Nick Sweeting
9bc5d99488 add overrides options to binproviders 2025-12-26 20:16:58 -08:00
Claude
766bb28536 Fix migration tests and M2M field alteration issue
- Remove M2M tags field alteration from migration 0027 (Django doesn't support altering M2M fields via migration)
- Add machine app tables to 0.8.x test schema
- Add missing columns (config, num_uses_failed, num_uses_succeeded) to 0.8.x test schema
- Skip 0.8.x migration tests due to complex migration state dependencies with machine app
- All 15 0.7.x migration tests now pass
- Merge dev branch and resolve pyproject.toml conflict (keep both uuid7 and gallery-dl deps)
2025-12-27 03:00:44 +00:00
Claude
13be196fd7 Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh
# Conflicts:
#	pyproject.toml
2025-12-27 02:27:51 +00:00
Nick Sweeting
6fdc52cc57 add papersdl plugin 2025-12-26 18:25:52 -08:00
Nick Sweeting
e2cbcd17f6 more tests and migrations fixes 2025-12-26 18:22:48 -08:00
Claude
c3acadd528 Remove extractor field from Crawl model and fix tests
- Remove extractor field from Crawl model (moved to config dict)
- Update migration 0002_drop_seed_model to not add extractor
- Update archivebox_add.py to use config['PARSER'] instead
- Update admin.py recrawl to not pass extractor
- Update jsonl.py serialization to not include extractor
- Update test schema SCHEMA_0_8 to not include extractor
- Set default timeout to 60s for test commands
2025-12-27 01:49:09 +00:00
Claude
ae2ab5b273 Add Python 3.13 support with uuid7 backport compatibility
- Create uuid_compat.py module that provides uuid7 for Python <3.14
  using uuid_extensions package, and native uuid.uuid7 for Python 3.14+
- Update all model files and migrations to use archivebox.uuid_compat
- Add uuid7 conditional dependency in pyproject.toml for Python <3.14
- Update requires-python to >=3.13 (from >=3.14)
- Update GitHub workflows, lock_pkgs.sh to use Python 3.13
- Update tool configs (ruff, pyright, uv) for Python 3.13

This enables running ArchiveBox on Python 3.13 while maintaining
forward compatibility with Python 3.14's native uuid7 support.
2025-12-27 01:07:30 +00:00
Claude
cff4077c23 Bump Python version requirement from 3.11 to 3.14
Update all references to Python 3.11 to use Python 3.14 to match
the pyproject.toml requires-python = ">=3.14" setting:

- bin/lock_pkgs.sh: uv venv --python 3.14
- .github/workflows/test.yml: python matrix and PDM version
- .github/workflows/pip.yml: PYTHON_VERSION env var
- Dockerfile: comment and example FROM line
- Issue templates: example version output
2025-12-27 00:30:27 +00:00
Claude
24c51452ef Add comprehensive 0.7.x and 0.8.x migration tests
Added additional tests for migrating from 0.7.x to 0.9.x:
- test_list_works_after_migration
- test_new_schema_elements_created_after_migration
- test_snapshots_have_new_fields_after_migration
- test_add_works_after_migration
- test_archiveresult_status_preserved_after_migration
- test_version_works_after_migration
- test_help_works_after_migration

Added missing tests for 0.8.x migration:
- test_search_works_after_migration
- test_migration_preserves_snapshot_titles
- test_migration_preserves_foreign_keys
- test_add_works_after_migration
- test_version_works_after_migration

These tests ensure real migration paths are tested using actual
archivebox init to trigger Django migrations on simulated old databases.
2025-12-27 00:08:47 +00:00
Claude
0941aca4a3 Improve test suite: remove mocks and add 0.8.x migration tests
- Remove mock-based tests from plugin tests (headers, singlefile, ublock, captcha2)
- Replace fake cache tests with real double-install tests that verify cache behavior
- Add SCHEMA_0_8 and seed_0_8_data() for testing 0.8.x data directory migrations
- Add TestMigrationFrom08x class with comprehensive migration tests:
  - Snapshot count preservation
  - Crawl record preservation
  - Snapshot-to-crawl relationship preservation
  - Tag preservation
  - ArchiveResult status preservation
  - CLI command verification after migration
- Add more CLI tests for add command (tags, multiple URLs, file input)
- All tests now use real functionality without mocking
2025-12-26 23:01:49 +00:00
Nick Sweeting
0fbcbd2616 gallerydl template 2025-12-26 11:55:19 -08:00
Nick Sweeting
4fd7fcdbcf new gallerydl plugin and more 2025-12-26 11:55:03 -08:00
Nick Sweeting
9838d7ba02 tons of ui fixes and plugin fixes 2025-12-25 03:59:51 -08:00
Nick Sweeting
bb53228ebf remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
28e6c5bb65 add mcp server support 2025-12-25 01:51:42 -08:00
Nick Sweeting
866f993f26 logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
8218675ed4 bump dependencies 2025-12-24 23:41:29 -08:00
Nick Sweeting
d95f0dc186 remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c wip 2 2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00
Nick Sweeting
c1335fed37 Remove ABID system and KVTag model - use UUIDv7 IDs exclusively
This commit completes the simplification of the ID system by:

- Removing the ABID (ArchiveBox ID) system entirely
- Removing the base_models/abid.py file
- Removing KVTag model in favor of the existing Tag model in core/models.py
- Simplifying all models to use standard UUIDv7 primary keys
- Removing ABID-related admin functionality
- Cleaning up commented-out ABID code from views and statemachines
- Deleting migration files for ABID field removal (no longer needed)

All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)`

Note: Old migrations containing ABID references are preserved for database
migration history compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 06:13:49 -08:00
Nick Sweeting
c3024815f3 Add link to Proxmox installer (#1682) 2025-05-19 15:29:45 -07:00
Nelson Minar
f72f04768c Add link to Proxmox installer 2025-05-11 11:10:20 -07:00
Nick Sweeting
d93f32ab24 fix(export_browser_history): tilde doesn't expand in quotes (#1661)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

Patch submitted by @pcrockett

# Related issues

- Fixes
https://github.com/ArchiveBox/ArchiveBox/issues/1657#issue-2856003985

# Changes these areas

- [x] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-03-20 16:09:40 -07:00
Nick Sweeting
8b67186c93 make sure uv is using the right python binary 2025-03-20 16:04:58 -07:00
Nick Sweeting
26eb75e4e6 archivebox swag is now available! 2025-03-20 15:52:56 -07:00
Nick Sweeting
d9d67e9864 add swag link to funding links 2025-03-20 15:51:20 -07:00
Nick Sweeting
1ab4e06a15 remove dead competitor links 2025-03-19 19:22:35 -07:00
Philip Crockett
ba6a8c2da5 support XDG standard, search for chrome and chromium DBs 2025-02-18 21:38:52 +01:00
Philip Crockett
639aa7242b fix typo 2025-02-18 21:22:52 +01:00
Phil Crockett
9fbc2d3818 fix chrome browser history export on Linux 2025-02-18 21:08:56 +01:00
Phil Crockett
58bf8d07e1 feat(export_browser_history): add linux support for firefox 2025-02-16 10:24:37 +01:00
Phil Crockett
feded9e3d4 fix(export_browser_history): fix sqlite quote syntax error 2025-02-16 10:24:13 +01:00