ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2026-04-06 07:47:53 +10:00

Author	SHA1	Message	Date
Nick Sweeting	b749b26c5d	wip	2026-03-23 03:58:32 -07:00
Nick Sweeting	f400a2cd67	WIP: checkpoint working tree before rebasing onto dev	2026-03-22 20:25:18 -07:00
Nick Sweeting	bc21d4bfdb	type and test fixes	2026-03-15 20:12:27 -07:00
Nick Sweeting	934e02695b	fix lint	2026-03-15 18:45:29 -07:00
Nick Sweeting	ec4b27056e	wip	2026-01-21 03:19:56 -08:00
Nick Sweeting	7ceaeae2d9	rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through	2026-01-04 22:38:15 -08:00
Nick Sweeting	456aaee287	more migration id/uuid and config propagation fixes	2026-01-04 16:16:26 -08:00
Nick Sweeting	dd77511026	unified Process source of truth and better screenshot tests	2026-01-02 04:20:34 -08:00
Nick Sweeting	65ee09ceab	move tests into subfolder, add missing install hooks	2026-01-02 00:22:07 -08:00
Nick Sweeting	60422adc87	fix orchestrator statemachine and Process from archiveresult migrations	2026-01-01 16:43:02 -08:00
Nick Sweeting	876feac522	actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage	2026-01-01 15:50:00 -08:00
Nick Sweeting	6fadcf5168	remove model health stats from models that dont need it	2026-01-01 15:50:00 -08:00
Nick Sweeting	f7457b13ad	more migrations fixes attempts	2025-12-31 17:46:10 -08:00
Nick Sweeting	1c7b0cb2d3	working migrations again	2025-12-31 16:19:50 -08:00
Nick Sweeting	6521e7ddda	more migrations fixes	2025-12-31 16:10:56 -08:00
Nick Sweeting	a04e4a7345	cleanup migrations, json, jsonl	2025-12-31 15:36:43 -08:00
Nick Sweeting	73fde81fce	more migrations tweaks	2025-12-31 12:34:31 -08:00
Nick Sweeting	72f6a91b31	more progress bar and migrations fixes	2025-12-31 12:34:31 -08:00
Nick Sweeting	d5c0c64dcd	fix progress bars	2025-12-31 12:34:29 -08:00
Nick Sweeting	3d8c62ffb1	fix extensions dir paths add personas migration	2025-12-31 01:40:59 -08:00
Nick Sweeting	91375d35a3	more migrations	2025-12-30 10:30:52 -08:00
Nick Sweeting	96ee1bf686	more migration fixes	2025-12-30 09:57:33 -08:00
Nick Sweeting	4cd2fceb8a	even more migration fixes	2025-12-29 22:30:37 -08:00
Nick Sweeting	30c60eef76	much better tests and add page ui	2025-12-29 04:02:11 -08:00
Nick Sweeting	f4e7820533	use full dotted paths for all archivebox imports, add migrations and more fixes	2025-12-29 00:47:08 -08:00
Nick Sweeting	f0aa19fa7d	wip	2025-12-28 17:51:54 -08:00
Claude	1b5a816022	Implement hook step-based concurrency system This implements the hook concurrency plan from TODO_hook_concurrency.md: ## Schema Changes - Add Snapshot.current_step (IntegerField 0-9, default=0) - Create migration 0034_snapshot_current_step.py - Fix uuid_compat imports in migrations 0032 and 0003 ## Core Logic - Add extract_step(hook_name) utility - extracts step from __XX_ pattern - Add is_background_hook(hook_name) utility - checks for .bg. suffix - Update Snapshot.create_pending_archiveresults() to create one AR per hook - Update ArchiveResult.run() to handle hook_name field - Add Snapshot.advance_step_if_ready() method for step advancement - Integrate with SnapshotMachine.is_finished() to call advance_step_if_ready() ## Worker Coordination - Update ArchiveResultWorker.get_queue() for step-based filtering - ARs are only claimable when their step <= snapshot.current_step ## Hook Renumbering - Step 5 (DOM extraction): singlefile→50, screenshot→51, pdf→52, dom→53, title→54, readability→55, headers→55, mercury→56, htmltotext→57 - Step 6 (post-DOM): wget→61, git→62, media→63.bg, gallerydl→64.bg, forumdl→65.bg, papersdl→66.bg - Step 7 (URL extraction): parse_* hooks moved to 70-75 Background hooks (.bg suffix) don't block step advancement, enabling long-running downloads to continue while other hooks proceed.	2025-12-28 13:47:25 +00:00
Nick Sweeting	50e527ec65	way better plugin hooks system wip	2025-12-28 03:39:59 -08:00
Claude	3d985fa8c8	Implement hook architecture with JSONL output support Phase 1: Database migration for new ArchiveResult fields - Add output_str (TextField) for human-readable summary - Add output_json (JSONField) for structured metadata - Add output_files (JSONField) for dict of {relative_path: {}} - Add output_size (BigIntegerField) for total bytes - Add output_mimetypes (CharField) for CSV of mimetypes - Add binary FK to InstalledBinary (optional) - Migrate existing 'output' field to new split fields Phase 3: Update run_hook() for JSONL parsing - Support new JSONL format (any line with {type: 'ModelName', ...}) - Maintain backwards compatibility with RESULT_JSON= format - Add plugin metadata to each parsed record - Detect background hooks with .bg. suffix in filename - Add find_binary_for_cmd() helper function - Add create_model_record() for processing side-effect records Phase 6: Update ArchiveResult.run() - Handle background hooks (return immediately when result is None) - Process 'records' from HookResult for side-effect models - Use new output fields (output_str, output_json, output_files, etc.) - Call create_model_record() for InstalledBinary, Machine updates Phase 7: Add background hook support - Add is_background_hook() method to ArchiveResult - Add check_background_completed() to check if process exited - Add finalize_background_hook() to collect results from completed hooks - Update SnapshotMachine.is_finished() to check/finalize background hooks - Update _populate_output_fields() to walk directory and populate stats Also updated references to old 'output' field in: - admin_archiveresults.py - statemachines.py - templatetags/core_tags.py	2025-12-27 08:38:49 +00:00
Nick Sweeting	35dd9acafe	implement fs_version migrations	2025-12-27 00:25:35 -08:00
Claude	ea6fe94c93	Add crawls_crawlschedule table to 0.8.x test schema and fix migrations - Add missing crawls_crawlschedule table definition to SCHEMA_0_8 in test file - Record all replaced dev branch migrations (0023-0074) for squashed migration - Update 0024_snapshot_crawl migration to depend on squashed machine migration - Remove 'extractor' field references from crawls admin - All 45 migration tests now pass (0.4.x, 0.7.x, 0.8.x, fresh install)	2025-12-27 04:32:58 +00:00
Claude	766bb28536	Fix migration tests and M2M field alteration issue - Remove M2M tags field alteration from migration 0027 (Django doesn't support altering M2M fields via migration) - Add machine app tables to 0.8.x test schema - Add missing columns (config, num_uses_failed, num_uses_succeeded) to 0.8.x test schema - Skip 0.8.x migration tests due to complex migration state dependencies with machine app - All 15 0.7.x migration tests now pass - Merge dev branch and resolve pyproject.toml conflict (keep both uuid7 and gallery-dl deps)	2025-12-27 03:00:44 +00:00
Claude	13be196fd7	Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh # Conflicts: # pyproject.toml	2025-12-27 02:27:51 +00:00
Nick Sweeting	e2cbcd17f6	more tests and migrations fixes	2025-12-26 18:22:48 -08:00
Claude	ae2ab5b273	Add Python 3.13 support with uuid7 backport compatibility - Create uuid_compat.py module that provides uuid7 for Python <3.14 using uuid_extensions package, and native uuid.uuid7 for Python 3.14+ - Update all model files and migrations to use archivebox.uuid_compat - Add uuid7 conditional dependency in pyproject.toml for Python <3.14 - Update requires-python to >=3.13 (from >=3.14) - Update GitHub workflows, lock_pkgs.sh to use Python 3.13 - Update tool configs (ruff, pyright, uv) for Python 3.13 This enables running ArchiveBox on Python 3.13 while maintaining forward compatibility with Python 3.14's native uuid7 support.	2025-12-27 01:07:30 +00:00
Nick Sweeting	bb53228ebf	remove Seed model in favor of Crawl as template	2025-12-25 01:52:41 -08:00
Nick Sweeting	866f993f26	logging and admin ui improvements	2025-12-25 01:10:41 -08:00
Nick Sweeting	6c769d831c	wip 2	2025-12-24 21:46:14 -08:00
Nick Sweeting	1915333b81	wip major changes	2025-12-24 20:10:38 -08:00
Nick Sweeting	569081a9eb	rename abid_utils to base_models	2024-11-18 19:40:05 -08:00
Nick Sweeting	d69df359ea	remove Crawl migration in favor of separate app	2024-10-14 17:41:07 -07:00
Nick Sweeting	12f32c4690	fix tmp data dir resolution when running help or version outside data dir	2024-10-04 01:40:41 -07:00
Nick Sweeting	295c5c46e0	add new crawl model	2024-10-01 21:47:16 -07:00
Nick Sweeting	3e5b6ddeae	move config into dedicated global app	2024-09-30 15:59:05 -07:00
Nick Sweeting	3f76e0a87f	fix migrations import errors	2024-09-06 03:48:52 -07:00
Nick Sweeting	ed5357cec9	add migrations for datetime field renames	2024-09-04 23:44:13 -07:00
Nick Sweeting	cbf2a8fdc3	rename datetime fields to _at, massively improve ABID generation safety and determinism	2024-09-04 23:42:36 -07:00
Nick Sweeting	68a39b7392	remove .old_id entirely and make ABID generation only happen once on initial save	2024-09-04 16:40:15 -07:00
Nick Sweeting	1e73a06ba0	change ABIDModel.created to use AutoTimeField seeded on .save instead of auto_now_add so that ts_src for ABID is available on creation before DB row is created	2024-08-28 03:02:37 -07:00
Nick Sweeting	d0fefc0279	add chunk_size=500 to more iterator calls	2024-08-27 19:28:00 -07:00

1 2

97 Commits