Claude
1b5a816022
Implement hook step-based concurrency system
...
This implements the hook concurrency plan from TODO_hook_concurrency.md:
## Schema Changes
- Add Snapshot.current_step (IntegerField 0-9, default=0)
- Create migration 0034_snapshot_current_step.py
- Fix uuid_compat imports in migrations 0032 and 0003
## Core Logic
- Add extract_step(hook_name) utility - extracts step from __XX_ pattern
- Add is_background_hook(hook_name) utility - checks for .bg. suffix
- Update Snapshot.create_pending_archiveresults() to create one AR per hook
- Update ArchiveResult.run() to handle hook_name field
- Add Snapshot.advance_step_if_ready() method for step advancement
- Integrate with SnapshotMachine.is_finished() to call advance_step_if_ready()
## Worker Coordination
- Update ArchiveResultWorker.get_queue() for step-based filtering
- ARs are only claimable when their step <= snapshot.current_step
## Hook Renumbering
- Step 5 (DOM extraction): singlefile→50, screenshot→51, pdf→52, dom→53,
title→54, readability→55, headers→55, mercury→56, htmltotext→57
- Step 6 (post-DOM): wget→61, git→62, media→63.bg, gallerydl→64.bg,
forumdl→65.bg, papersdl→66.bg
- Step 7 (URL extraction): parse_* hooks moved to 70-75
Background hooks (.bg suffix) don't block step advancement, enabling
long-running downloads to continue while other hooks proceed.
2025-12-28 13:47:25 +00:00
Claude
2623c6cc11
Complete JS hooks to clean JSONL format + rename background hooks
...
- Update 12 remaining JS snapshot hooks to output clean JSONL
- Remove RESULT_JSON= prefix, START_TS=, END_TS=, STATUS= output
- Rename 3 background hooks with .bg. suffix:
- consolelog -> on_Snapshot__21_consolelog.bg.js
- ssl -> on_Snapshot__23_ssl.bg.js
- responses -> on_Snapshot__24_responses.bg.js
- Update TODO_hook_architecture.md with completion status
2025-12-27 09:46:59 +00:00
Claude
0941aca4a3
Improve test suite: remove mocks and add 0.8.x migration tests
...
- Remove mock-based tests from plugin tests (headers, singlefile, ublock, captcha2)
- Replace fake cache tests with real double-install tests that verify cache behavior
- Add SCHEMA_0_8 and seed_0_8_data() for testing 0.8.x data directory migrations
- Add TestMigrationFrom08x class with comprehensive migration tests:
- Snapshot count preservation
- Crawl record preservation
- Snapshot-to-crawl relationship preservation
- Tag preservation
- ArchiveResult status preservation
- CLI command verification after migration
- Add more CLI tests for add command (tags, multiple URLs, file input)
- All tests now use real functionality without mocking
2025-12-26 23:01:49 +00:00