ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2026-01-03 01:15:57 +10:00

Author	SHA1	Message	Date
Nick Sweeting	dd77511026	unified Process source of truth and better screenshot tests	2026-01-02 04:20:34 -08:00
Nick Sweeting	3672174dad	fix transition mid transition	2026-01-02 00:24:44 -08:00
Nick Sweeting	65ee09ceab	move tests into subfolder, add missing install hooks	2026-01-02 00:22:07 -08:00
Nick Sweeting	60422adc87	fix orchestrator statemachine and Process from archiveresult migrations	2026-01-01 16:43:02 -08:00
Nick Sweeting	876feac522	actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage	2026-01-01 15:50:00 -08:00
Nick Sweeting	6fadcf5168	remove model health stats from models that dont need it	2026-01-01 15:50:00 -08:00
Nick Sweeting	a04e4a7345	cleanup migrations, json, jsonl	2025-12-31 15:36:43 -08:00
claude[bot]	b2132d1f14	Fix cubic review issues: process_type detection, cmd storage, PID cleanup, and migration - Fix Process.current() to store psutil cmdline instead of sys.argv for accurate validation - Fix worker process_type detection: explicitly set to WORKER after registration - Fix ArchiveResultWorker.start() to use Process.TypeChoices.WORKER consistently - Fix migration to be explicitly irreversible (SQLite doesn't support DROP COLUMN) - Fix get_running_workers() to return process_id instead of incorrectly named worker_id - Fix safe_kill_process() to wait for termination and escalate to SIGKILL if needed - Fix migration to include all indexes in state_operations (parent_id, process_type) - Fix documentation to use Machine.current() scoping and StatusChoices constants Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:42:07 +00:00
claude[bot]	ee201a0f83	Fix code review issues in process management refactor - Add pwd validation in Process.launch() to prevent crashes - Fix psutil returncode handling (use wait() return value, not returncode attr) - Add None check for proc.pid in cleanup_stale_running() - Add stale process cleanup in Orchestrator.is_running() - Ensure orchestrator process_type is correctly set to ORCHESTRATOR - Fix KeyboardInterrupt handling (exit code 0 for graceful shutdown) - Throttle cleanup_stale_running() to once per 30 seconds for performance - Fix worker process_type to use TypeChoices.WORKER consistently - Fix get_running_workers() API to return list of dicts (not Process objects) - Only delete PID files after successful kill or confirmed stale - Fix migration index names to match between SQL and Django state - Remove db_index=True from process_type (index created manually) - Update documentation to reflect actual implementation - Add explanatory comments to empty except blocks - Fix exit codes to use Unix convention (128 + signal number) Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:14:47 +00:00
Claude	b822352fc3	Delete pid_utils.py and migrate to Process model DELETED: - workers/pid_utils.py (-192 lines) - replaced by Process model methods SIMPLIFIED: - crawls/models.py Crawl.cleanup() (80 lines -> 10 lines) - hooks.py: deleted process_is_alive() and kill_process() (-45 lines) UPDATED to use Process model: - core/models.py: Snapshot.cleanup() and has_running_background_hooks() - machine/models.py: Binary.cleanup() - workers/worker.py: Worker.on_startup/shutdown, get_running_workers, start - workers/orchestrator.py: Orchestrator.on_startup/shutdown, is_running All subprocess management now uses: - Process.current() for registering current process - Process.get_running() / get_running_count() for querying - Process.cleanup_stale_running() for cleanup - safe_kill_process() for validated PID killing Total line reduction: ~250 lines	2025-12-31 10:15:22 +00:00
Claude	69965a2782	fix: correct CLI pipeline data flow for crawl -> snapshot -> extract - archivebox crawl: creates Crawl records, outputs Crawl JSONL - archivebox snapshot: accepts Crawl JSONL, creates Snapshots, outputs Snapshot JSONL - archivebox extract: accepts Snapshot JSONL, runs extractors, outputs ArchiveResult JSONL Changes: - Add Crawl.from_jsonl() method for creating Crawl from JSONL records - Rewrite archivebox_crawl.py to create Crawl jobs without immediately starting them - Update archivebox_snapshot.py to accept both Crawl JSONL and plain URLs - Update jsonl.py docstring to document the pipeline	2025-12-30 19:42:41 +00:00
Claude	ae648c9bc1	refactor: move remaining JSONL methods to models, clean up jsonl.py - Add Tag.to_jsonl() method with schema_version - Add Crawl.to_jsonl() method with schema_version - Fix Tag.from_jsonl() to not depend on jsonl.py helper - Update tests to use Snapshot.from_jsonl() instead of non-existent get_or_create_snapshot Remove model-specific functions from misc/jsonl.py: - tag_to_jsonl() - use Tag.to_jsonl() instead - crawl_to_jsonl() - use Crawl.to_jsonl() instead - get_or_create_tag() - use Tag.from_jsonl() instead - process_jsonl_records() - use model from_jsonl() methods directly jsonl.py now only contains generic I/O utilities: - Type constants (TYPE_SNAPSHOT, etc.) - parse_line(), read_stdin(), read_file(), read_args_or_stdin() - write_record(), write_records() - filter_by_type(), process_records()	2025-12-30 19:30:18 +00:00
Nick Sweeting	80f75126c6	more fixes	2025-12-29 21:03:05 -08:00
Nick Sweeting	30c60eef76	much better tests and add page ui	2025-12-29 04:02:11 -08:00
Nick Sweeting	f4e7820533	use full dotted paths for all archivebox imports, add migrations and more fixes	2025-12-29 00:47:08 -08:00
Nick Sweeting	f0aa19fa7d	wip	2025-12-28 17:51:54 -08:00
Nick Sweeting	4ccb0863bb	continue renaming extractor to plugin, add plan for hook concurrency, add chrome kill helper script	2025-12-28 05:29:24 -08:00
Nick Sweeting	bd265c0083	rename extractor to plugin everywhere	2025-12-28 04:43:15 -08:00
Nick Sweeting	50e527ec65	way better plugin hooks system wip	2025-12-28 03:39:59 -08:00
Claude	c3acadd528	Remove extractor field from Crawl model and fix tests - Remove extractor field from Crawl model (moved to config dict) - Update migration 0002_drop_seed_model to not add extractor - Update archivebox_add.py to use config['PARSER'] instead - Update admin.py recrawl to not pass extractor - Update jsonl.py serialization to not include extractor - Update test schema SCHEMA_0_8 to not include extractor - Set default timeout to 60s for test commands	2025-12-27 01:49:09 +00:00
Claude	ae2ab5b273	Add Python 3.13 support with uuid7 backport compatibility - Create uuid_compat.py module that provides uuid7 for Python <3.14 using uuid_extensions package, and native uuid.uuid7 for Python 3.14+ - Update all model files and migrations to use archivebox.uuid_compat - Add uuid7 conditional dependency in pyproject.toml for Python <3.14 - Update requires-python to >=3.13 (from >=3.14) - Update GitHub workflows, lock_pkgs.sh to use Python 3.13 - Update tool configs (ruff, pyright, uv) for Python 3.13 This enables running ArchiveBox on Python 3.13 while maintaining forward compatibility with Python 3.14's native uuid7 support.	2025-12-27 01:07:30 +00:00
Nick Sweeting	9838d7ba02	tons of ui fixes and plugin fixes	2025-12-25 03:59:51 -08:00
Nick Sweeting	bb53228ebf	remove Seed model in favor of Crawl as template	2025-12-25 01:52:41 -08:00
Nick Sweeting	866f993f26	logging and admin ui improvements	2025-12-25 01:10:41 -08:00
Nick Sweeting	1915333b81	wip major changes	2025-12-24 20:10:38 -08:00
Nick Sweeting	c1335fed37	Remove ABID system and KVTag model - use UUIDv7 IDs exclusively This commit completes the simplification of the ID system by: - Removing the ABID (ArchiveBox ID) system entirely - Removing the base_models/abid.py file - Removing KVTag model in favor of the existing Tag model in core/models.py - Simplifying all models to use standard UUIDv7 primary keys - Removing ABID-related admin functionality - Cleaning up commented-out ABID code from views and statemachines - Deleting migration files for ABID field removal (no longer needed) All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)` Note: Old migrations containing ABID references are preserved for database migration history compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-24 06:13:49 -08:00
Nick Sweeting	f6d22a3cc4	tweak worker updated logic and add output_dir_template and symlinks logic	2024-12-13 06:03:52 -08:00
Nick Sweeting	2a1afcf6c2	move crawl models back into dedicated app	2024-12-12 21:45:55 -08:00
Nick Sweeting	b948e49013	add urls log to Crawl model	2024-11-19 06:32:33 -08:00
Nick Sweeting	569081a9eb	rename abid_utils to base_models	2024-11-18 19:40:05 -08:00
Nick Sweeting	65afd405b1	merge seeds and crawls apps	2024-11-18 19:23:14 -08:00
Nick Sweeting	e469c5a344	merge queues and actors apps into new workers app	2024-11-18 18:52:48 -08:00
Nick Sweeting	eb53145e4e	working state machine flow yay	2024-11-18 04:27:38 -08:00
Nick Sweeting	1ec2753664	fix statemachine create_root_snapshot and retry timing	2024-11-18 04:27:37 -08:00
Nick Sweeting	8f8fbbb7a2	API fixes and add actors endpoints	2024-11-17 20:09:06 -08:00
Nick Sweeting	c8e186f21b	fix plugin loading order, admin, abx-pkg	2024-11-16 06:44:12 -08:00
Nick Sweeting	ba26d75079	add notes and label fields, fix model getters	2024-11-16 02:47:35 -08:00
Nick Sweeting	a9a3b153b1	more StateMachine, Actor, and Orchestrator improvements	2024-11-04 07:08:39 -08:00
Nick Sweeting	48f8416762	add new core and crawsl statemachine manager	2024-11-03 00:41:11 -07:00
Nick Sweeting	a97cc82979	fix Crawl models	2024-10-21 00:38:29 -07:00
Nick Sweeting	35dd5911d8	WIP add new effects, reads, writes, to abx interface	2024-10-15 22:33:22 -07:00
Nick Sweeting	2ebd28aebd	fix ABID and uniqueness for new Seed models	2024-10-14 17:39:43 -07:00
Nick Sweeting	ad4657861f	add new crawls app	2024-10-11 19:47:05 -07:00

43 Commits