Phase 1: Model Prerequisites
- Add ArchiveResult.from_json() and from_jsonl() methods
- Fix Snapshot.to_json() to use tags_str (consistent with Crawl)
Phase 2: Shared Utilities
- Create archivebox/cli/cli_utils.py with shared apply_filters()
- Update 7 CLI files to import from cli_utils.py instead of duplicating
Phase 3: Pass-Through Behavior
- Add pass-through to crawl create (non-Crawl records pass unchanged)
- Add pass-through to snapshot create (Crawl records + others pass through)
- Add pass-through to archiveresult create (Snapshot records + others)
- Add create-or-update behavior to run command:
- Records WITHOUT id: Create via Model.from_json()
- Records WITH id: Lookup existing, re-queue
- Outputs JSONL of all processed records for chaining
Phase 4: Test Infrastructure
- Create archivebox/tests/conftest.py with pytest-django fixtures
- Include CLI helpers, output assertions, database assertions
Phase 6: Config Update
- Update supervisord_util.py: orchestrator -> run command
This enables Unix-style piping:
archivebox crawl create URL | archivebox run
archivebox archiveresult list --status=failed | archivebox run
curl API | jq transform | archivebox crawl create | archivebox run