ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2026-04-06 07:47:53 +10:00

Files

Claude 69965a2782 fix: correct CLI pipeline data flow for crawl -> snapshot -> extract

- archivebox crawl: creates Crawl records, outputs Crawl JSONL
- archivebox snapshot: accepts Crawl JSONL, creates Snapshots, outputs Snapshot JSONL
- archivebox extract: accepts Snapshot JSONL, runs extractors, outputs ArchiveResult JSONL

Changes:
- Add Crawl.from_jsonl() method for creating Crawl from JSONL records
- Rewrite archivebox_crawl.py to create Crawl jobs without immediately starting them
- Update archivebox_snapshot.py to accept both Crawl JSONL and plain URLs
- Update jsonl.py docstring to document the pipeline

2025-12-30 19:42:41 +00:00

migrations

more migration fixes

2025-12-29 22:12:57 -08:00

__init__.py

wip major changes

2025-12-24 20:10:38 -08:00

admin.py

wip

2025-12-28 17:51:54 -08:00

apps.py

much better tests and add page ui

2025-12-29 04:02:11 -08:00

models.py

fix: correct CLI pipeline data flow for crawl -> snapshot -> extract

2025-12-30 19:42:41 +00:00