ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2026-04-06 07:47:53 +10:00

Author	SHA1	Message	Date
Nick Sweeting	39450111dd	Update CI uv handling and runner changes	2026-03-23 13:27:23 -07:00
Nick Sweeting	b749b26c5d	wip	2026-03-23 03:58:32 -07:00
Nick Sweeting	f400a2cd67	WIP: checkpoint working tree before rebasing onto dev	2026-03-22 20:25:18 -07:00
Nick Sweeting	c87079aa0a	Refactor ArchiveBox onto abx-dl bus runner	2026-03-21 11:47:57 -07:00
Nick Sweeting	57e11879ec	cleanup archivebox tests	2026-03-15 22:09:56 -07:00
Nick Sweeting	9de084da65	bump package versions	2026-03-15 20:47:28 -07:00
Nick Sweeting	bc21d4bfdb	type and test fixes	2026-03-15 20:12:27 -07:00
Nick Sweeting	44cabac8d0	fix typing	2026-03-15 19:47:36 -07:00
Nick Sweeting	4756697a17	Use ruff pyright and ty for linting	2026-03-15 19:43:59 -07:00
Nick Sweeting	5381f7584c	Tighten API typing and add return values	2026-03-15 19:24:54 -07:00
Nick Sweeting	f932054915	add stricter locking around stage machine models	2026-03-15 19:21:41 -07:00
Nick Sweeting	311e4340ec	Fix add CLI input handling and lint regressions	2026-03-15 19:04:13 -07:00
Nick Sweeting	934e02695b	fix lint	2026-03-15 18:45:29 -07:00
Nick Sweeting	70c9358cf9	Improve scheduling, runtime paths, and API behavior	2026-03-15 18:31:56 -07:00
Nick Sweeting	7d42c6c8b5	bump versions and fix docs	2026-03-15 17:43:07 -07:00
Nick Sweeting	957387fd88	Fix plugin hook env and extractor retries	2026-03-15 12:39:27 -07:00
Nick Sweeting	86fdc3be1e	Refresh worker config from resolved plugin installs	2026-03-15 11:07:55 -07:00
Nick Sweeting	760cf9d6b2	Stabilize CI against expanded plugin surface	2026-03-15 06:31:41 -07:00
Nick Sweeting	4fa701fafe	Update abx dependencies and plugin test harness	2026-03-15 04:37:32 -07:00
Nick Sweeting	ecb1764590	switch to external plugins	2026-03-15 03:46:23 -07:00
Nick Sweeting	ec4b27056e	wip	2026-01-21 03:19:56 -08:00
Nick Sweeting	86e7973334	cleanup tui, startup, card templtes, and more	2026-01-19 14:33:20 -08:00
Nick Sweeting	bef67760db	working singlefile	2026-01-19 03:05:49 -08:00
Nick Sweeting	b5bbc3b549	better tui	2026-01-19 01:53:32 -08:00
Nick Sweeting	c7b2217cd6	tons of fixes with codex	2026-01-19 01:00:53 -08:00
Nick Sweeting	352e1bad32	remove debug lines	2026-01-05 02:27:34 -08:00
Nick Sweeting	b80e80439d	more binary fixes	2026-01-05 02:18:38 -08:00
Nick Sweeting	7ceaeae2d9	rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through	2026-01-04 22:38:15 -08:00
Nick Sweeting	456aaee287	more migration id/uuid and config propagation fixes	2026-01-04 16:16:26 -08:00
Nick Sweeting	839ae744cf	simplify entrypoints for orchestrator and workers	2026-01-04 13:17:07 -08:00
Nick Sweeting	3da523fc74	more consistent crawl, snapshot, hook cleanup and Process tracking	2026-01-02 04:27:38 -08:00
Nick Sweeting	dd77511026	unified Process source of truth and better screenshot tests	2026-01-02 04:20:34 -08:00
Nick Sweeting	65ee09ceab	move tests into subfolder, add missing install hooks	2026-01-02 00:22:07 -08:00
Nick Sweeting	c2afb40350	fix lib bin dir and archivebox add hanging	2026-01-01 16:58:47 -08:00
Nick Sweeting	9008cefca2	codecov, migrations, orchestrator fixes	2026-01-01 16:57:04 -08:00
Nick Sweeting	60422adc87	fix orchestrator statemachine and Process from archiveresult migrations	2026-01-01 16:43:02 -08:00
Nick Sweeting	876feac522	actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage	2026-01-01 15:50:00 -08:00
Nick Sweeting	a04e4a7345	cleanup migrations, json, jsonl	2025-12-31 15:36:43 -08:00
Nick Sweeting	469932b469	more	2025-12-31 12:34:31 -08:00
Nick Sweeting	72f6a91b31	more progress bar and migrations fixes	2025-12-31 12:34:31 -08:00
Nick Sweeting	d5c0c64dcd	fix progress bars	2025-12-31 12:34:29 -08:00
Claude	9bf7a520a0	Update tests for new Process model-based architecture - Remove pid_utils tests (module deleted in dev) - Update orchestrator tests to use Process model for tracking - Add tests for Process.current(), cleanup_stale_running(), terminate() - Add tests for Process hierarchy (parent/child, root, depth) - Add tests for Process.get_running(), get_running_count() - Add tests for ProcessMachine state machine - Update machine model tests to match current API (from_jsonl vs from_json)	2025-12-31 11:51:42 +00:00
Claude	a063d8cd43	Merge remote-tracking branch 'origin/dev' into claude/analyze-test-coverage-mWgwv	2025-12-31 11:45:22 +00:00
claude[bot]	b2132d1f14	Fix cubic review issues: process_type detection, cmd storage, PID cleanup, and migration - Fix Process.current() to store psutil cmdline instead of sys.argv for accurate validation - Fix worker process_type detection: explicitly set to WORKER after registration - Fix ArchiveResultWorker.start() to use Process.TypeChoices.WORKER consistently - Fix migration to be explicitly irreversible (SQLite doesn't support DROP COLUMN) - Fix get_running_workers() to return process_id instead of incorrectly named worker_id - Fix safe_kill_process() to wait for termination and escalate to SIGKILL if needed - Fix migration to include all indexes in state_operations (parent_id, process_type) - Fix documentation to use Machine.current() scoping and StatusChoices constants Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:42:07 +00:00
Claude	0cb5f0712d	Add comprehensive tests for machine/process models, orchestrator, and search backends This adds new test coverage for previously untested areas: Machine module (archivebox/machine/tests/): - Machine, NetworkInterface, Binary, Process model tests - BinaryMachine and ProcessMachine state machine tests - JSONL serialization/deserialization tests - Manager method tests Workers module (archivebox/workers/tests/): - PID file utility tests (write, read, cleanup) - Orchestrator lifecycle and queue management tests - Worker spawning logic tests - Idle detection and exit condition tests Search backends: - SQLite FTS5 search tests with real indexed content - Phrase search, stemming, and unicode support - Ripgrep search tests with archive directory structure - Environment variable configuration tests Binary provider plugins: - pip provider hook tests - npm provider hook tests with PATH updates - apt provider hook tests	2025-12-31 11:33:27 +00:00
claude[bot]	5121b0e5f9	Merge branch 'dev' into claude/refactor-process-management-WcQyZ Resolved conflicts by keeping Process model changes and accepting dev changes for unrelated files. Ensured pid_utils.py remains deleted as intended by this PR. Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:28:47 +00:00
claude[bot]	ee201a0f83	Fix code review issues in process management refactor - Add pwd validation in Process.launch() to prevent crashes - Fix psutil returncode handling (use wait() return value, not returncode attr) - Add None check for proc.pid in cleanup_stale_running() - Add stale process cleanup in Orchestrator.is_running() - Ensure orchestrator process_type is correctly set to ORCHESTRATOR - Fix KeyboardInterrupt handling (exit code 0 for graceful shutdown) - Throttle cleanup_stale_running() to once per 30 seconds for performance - Fix worker process_type to use TypeChoices.WORKER consistently - Fix get_running_workers() API to return list of dicts (not Process objects) - Only delete PID files after successful kill or confirmed stale - Fix migration index names to match between SQL and Django state - Remove db_index=True from process_type (index created manually) - Update documentation to reflect actual implementation - Add explanatory comments to empty except blocks - Fix exit codes to use Unix convention (128 + signal number) Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:14:47 +00:00
Claude	b822352fc3	Delete pid_utils.py and migrate to Process model DELETED: - workers/pid_utils.py (-192 lines) - replaced by Process model methods SIMPLIFIED: - crawls/models.py Crawl.cleanup() (80 lines -> 10 lines) - hooks.py: deleted process_is_alive() and kill_process() (-45 lines) UPDATED to use Process model: - core/models.py: Snapshot.cleanup() and has_running_background_hooks() - machine/models.py: Binary.cleanup() - workers/worker.py: Worker.on_startup/shutdown, get_running_workers, start - workers/orchestrator.py: Orchestrator.on_startup/shutdown, is_running All subprocess management now uses: - Process.current() for registering current process - Process.get_running() / get_running_count() for querying - Process.cleanup_stale_running() for cleanup - safe_kill_process() for validated PID killing Total line reduction: ~250 lines	2025-12-31 10:15:22 +00:00
Claude	f3e11b61fd	Implement JSONL CLI pipeline architecture (Phases 1-4, 6) Phase 1: Model Prerequisites - Add ArchiveResult.from_json() and from_jsonl() methods - Fix Snapshot.to_json() to use tags_str (consistent with Crawl) Phase 2: Shared Utilities - Create archivebox/cli/cli_utils.py with shared apply_filters() - Update 7 CLI files to import from cli_utils.py instead of duplicating Phase 3: Pass-Through Behavior - Add pass-through to crawl create (non-Crawl records pass unchanged) - Add pass-through to snapshot create (Crawl records + others pass through) - Add pass-through to archiveresult create (Snapshot records + others) - Add create-or-update behavior to run command: - Records WITHOUT id: Create via Model.from_json() - Records WITH id: Lookup existing, re-queue - Outputs JSONL of all processed records for chaining Phase 4: Test Infrastructure - Create archivebox/tests/conftest.py with pytest-django fixtures - Include CLI helpers, output assertions, database assertions Phase 6: Config Update - Update supervisord_util.py: orchestrator -> run command This enables Unix-style piping: archivebox crawl create URL \| archivebox run archivebox archiveresult list --status=failed \| archivebox run curl API \| jq transform \| archivebox crawl create \| archivebox run	2025-12-31 10:07:14 +00:00
Nick Sweeting	30c60eef76	much better tests and add page ui	2025-12-29 04:02:11 -08:00

1 2

76 Commits