ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2026-04-06 07:47:53 +10:00

Author	SHA1	Message	Date
Nick Sweeting	f3622d8cd3	update working changes	2026-03-25 05:36:07 -07:00
Nick Sweeting	50286d3c38	Reuse cached binaries in archivebox runtime	2026-03-24 11:03:43 -07:00
Nick Sweeting	39450111dd	Update CI uv handling and runner changes	2026-03-23 13:27:23 -07:00
Nick Sweeting	25f935b9d1	split CrawlSetup into Install phase with new Binary + BinaryRequest events	2026-03-23 13:15:41 -07:00
Nick Sweeting	b749b26c5d	wip	2026-03-23 03:58:32 -07:00
Nick Sweeting	f400a2cd67	WIP: checkpoint working tree before rebasing onto dev	2026-03-22 20:25:18 -07:00
Nick Sweeting	c87079aa0a	Refactor ArchiveBox onto abx-dl bus runner	2026-03-21 11:47:57 -07:00
Nick Sweeting	57e11879ec	cleanup archivebox tests	2026-03-15 22:09:56 -07:00
Nick Sweeting	9de084da65	bump package versions	2026-03-15 20:47:28 -07:00
Nick Sweeting	bc21d4bfdb	type and test fixes	2026-03-15 20:12:27 -07:00
Nick Sweeting	4756697a17	Use ruff pyright and ty for linting	2026-03-15 19:43:59 -07:00
Nick Sweeting	49436af869	Tighten CLI and admin typing	2026-03-15 19:33:15 -07:00
Nick Sweeting	5381f7584c	Tighten API typing and add return values	2026-03-15 19:24:54 -07:00
Nick Sweeting	f932054915	add stricter locking around stage machine models	2026-03-15 19:21:41 -07:00
Nick Sweeting	311e4340ec	Fix add CLI input handling and lint regressions	2026-03-15 19:04:13 -07:00
Nick Sweeting	934e02695b	fix lint	2026-03-15 18:45:29 -07:00
Nick Sweeting	70c9358cf9	Improve scheduling, runtime paths, and API behavior	2026-03-15 18:31:56 -07:00
Nick Sweeting	7d42c6c8b5	bump versions and fix docs	2026-03-15 17:43:07 -07:00
Nick Sweeting	1f792d7199	Restore CLI compat and plugin dependency handling	2026-03-15 06:06:18 -07:00
Nick Sweeting	6b482c62df	Restore top-level list command compatibility	2026-03-15 05:04:31 -07:00
Nick Sweeting	c4d30a853f	Restore index-only snapshot output links	2026-03-15 04:58:46 -07:00
Nick Sweeting	cc3e72b92f	Preserve tags for index-only adds	2026-03-15 04:54:55 -07:00
Nick Sweeting	58f801c220	Fix update orphan import and host-aware tests	2026-03-15 04:51:06 -07:00
Nick Sweeting	ecb1764590	switch to external plugins	2026-03-15 03:46:23 -07:00
Nick Sweeting	ec4b27056e	wip	2026-01-21 03:19:56 -08:00
Nick Sweeting	86e7973334	cleanup tui, startup, card templtes, and more	2026-01-19 14:33:20 -08:00
Nick Sweeting	c7b2217cd6	tons of fixes with codex	2026-01-19 01:00:53 -08:00
Nick Sweeting	b80e80439d	more binary fixes	2026-01-05 02:18:38 -08:00
Nick Sweeting	7ceaeae2d9	rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through	2026-01-04 22:38:15 -08:00
Nick Sweeting	839ae744cf	simplify entrypoints for orchestrator and workers	2026-01-04 13:17:07 -08:00
Nick Sweeting	dd77511026	unified Process source of truth and better screenshot tests	2026-01-02 04:20:34 -08:00
Nick Sweeting	65ee09ceab	move tests into subfolder, add missing install hooks	2026-01-02 00:22:07 -08:00
Nick Sweeting	c2afb40350	fix lib bin dir and archivebox add hanging	2026-01-01 16:58:47 -08:00
Nick Sweeting	a04e4a7345	cleanup migrations, json, jsonl	2025-12-31 15:36:43 -08:00
Nick Sweeting	edc83bfac6	Add persona CLI command with browser cookie import (#1747 )	2025-12-31 10:56:40 -08:00
claude[bot]	3659adeb7e	Fix path traversal vulnerabilities in persona management Add input validation and path safety checks to prevent path traversal attacks in persona name handling: - Add validate_persona_name() to block dangerous characters (/, \, .., etc) - Add ensure_path_within_personas_dir() to verify resolved paths stay within PERSONAS_DIR - Apply validation at persona creation, renaming, and deletion operations Fixes security issues identified by cubic-dev-ai in PR review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 18:30:26 +00:00
Nick Sweeting	20690fabbf	Fix CLI tests to use subprocess and remove mocks (#1746 )	2025-12-31 10:20:50 -08:00
Claude	73425fa984	Add persona CLI command with browser cookie import - Add `archivebox persona create/list/update/delete` commands - Support `--import=chrome\|firefox\|brave` to copy browser profile - Extract cookies via CDP to generate cookies.txt for non-browser tools - Fix JSDoc comment parsing issue in chrome_utils.js	2025-12-31 12:13:07 +00:00
claude[bot]	5121b0e5f9	Merge branch 'dev' into claude/refactor-process-management-WcQyZ Resolved conflicts by keeping Process model changes and accepting dev changes for unrelated files. Ensured pid_utils.py remains deleted as intended by this PR. Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-31 11:28:47 +00:00
Claude	b87bbbbecb	Fix CLI tests to use subprocess and remove mocks - Fix conftest.py: use subprocess for init, remove unused cli_env fixture - Update all test files to use data_dir parameter instead of env - Remove mock-based TestJSONLOutput class from tests_piping.py - Remove unused imports (MagicMock, patch) - Fix file permissions for cli_utils.py All tests now use real subprocess calls per CLAUDE.md guidelines: - NO MOCKS - tests exercise real code paths - NO SKIPS - every test runs	2025-12-31 10:53:45 +00:00
Nick Sweeting	7dd2d65770	Add pluginmap management command (#1742 )	2025-12-31 02:29:24 -08:00
Claude	bb52b5902a	Add unit tests for JSONL CLI pipeline commands (Phase 5 & 6) Add comprehensive unit tests for the CLI piping architecture: - test_cli_crawl.py: crawl create/list/update/delete tests - test_cli_snapshot.py: snapshot create/list/update/delete tests - test_cli_archiveresult.py: archiveresult create/list/update/delete tests - test_cli_run.py: run command create-or-update and pass-through tests Extend tests_piping.py with: - TestPassThroughBehavior: tests for pass-through behavior in all commands - TestPipelineAccumulation: tests for accumulating records through pipeline All tests use pytest fixtures from conftest.py with isolated DATA_DIR.	2025-12-31 10:21:05 +00:00
Claude	672ccf918d	Add pluginmap management command Adds a new CLI command `archivebox pluginmap` that displays: - ASCII art diagrams of all core state machines (Crawl, Snapshot, ArchiveResult, Binary) - Lists all auto-detected on_Modelname_xyz hooks grouped by model/event - Shows hook execution order (step 0-9), plugin name, and background status Usage: archivebox pluginmap # Show all diagrams and hooks archivebox pluginmap -m Snapshot # Filter to specific model archivebox pluginmap -a # Include disabled plugins archivebox pluginmap -q # Output JSON only	2025-12-31 10:19:58 +00:00
Claude	f3e11b61fd	Implement JSONL CLI pipeline architecture (Phases 1-4, 6) Phase 1: Model Prerequisites - Add ArchiveResult.from_json() and from_jsonl() methods - Fix Snapshot.to_json() to use tags_str (consistent with Crawl) Phase 2: Shared Utilities - Create archivebox/cli/cli_utils.py with shared apply_filters() - Update 7 CLI files to import from cli_utils.py instead of duplicating Phase 3: Pass-Through Behavior - Add pass-through to crawl create (non-Crawl records pass unchanged) - Add pass-through to snapshot create (Crawl records + others pass through) - Add pass-through to archiveresult create (Snapshot records + others) - Add create-or-update behavior to run command: - Records WITHOUT id: Create via Model.from_json() - Records WITH id: Lookup existing, re-queue - Outputs JSONL of all processed records for chaining Phase 4: Test Infrastructure - Create archivebox/tests/conftest.py with pytest-django fixtures - Include CLI helpers, output assertions, database assertions Phase 6: Config Update - Update supervisord_util.py: orchestrator -> run command This enables Unix-style piping: archivebox crawl create URL \| archivebox run archivebox archiveresult list --status=failed \| archivebox run curl API \| jq transform \| archivebox crawl create \| archivebox run	2025-12-31 10:07:14 +00:00
Nick Sweeting	dd2302ad92	new jsonl cli interface	2025-12-30 16:12:53 -08:00
Nick Sweeting	08366cfa46	document chrome configs	2025-12-30 12:42:50 -08:00
claude[bot]	251fe33e49	fix: rename --plugin to --plugins for consistency Changed from singular --plugin to plural --plugins in both snapshot and extract commands to match the pattern in archivebox add command. Updated to accept comma-separated plugin names (e.g., --plugins=screenshot,singlefile,title). - Updated CLI option from --plugin to --plugins - Added parsing for comma-separated plugin names - Updated function signatures and logic to handle multiple plugins - Updated help text, docstrings, and examples Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-30 20:20:29 +00:00
claude[bot]	64db6deab3	fix: revert incorrect --extract renaming, restore --plugin parameter The --plugins parameter was incorrectly renamed to --extract (boolean). This restores --plugin (singular, matching extract command) with correct semantics: specify which plugin to run after creating snapshots. - Changed --extract/--no-extract back to --plugin (string parameter) - Updated function signature and logic to use plugin parameter - Added ArchiveResult creation for specific plugin when --plugin is passed - Updated docstring and examples Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-30 20:15:48 +00:00
claude[bot]	762cddc8c5	fix: address PR review comments from cubic-dev-ai - Add JSONL_INDEX_FILENAME to ALLOWED_IN_DATA_DIR for consistency - Fix fallback logic in legacy.py to try JSON when JSONL parsing fails - Replace bare except clauses with specific exception types - Fix stdin double-consumption in archivebox_crawl.py - Merge CLI --tag option with crawl tags in archivebox_snapshot.py - Remove tautological mock tests (covered by integration tests) Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>	2025-12-30 20:09:51 +00:00
Claude	cf387ed59f	refactor: batch all URLs into single Crawl, update tests - archivebox crawl now creates one Crawl with all URLs as newline-separated string - Updated tests to reflect new pipeline: crawl -> snapshot -> extract - Added tests for Crawl JSONL parsing and output - Tests verify Crawl.from_jsonl() handles multiple URLs correctly	2025-12-30 20:06:56 +00:00

1 2 3 4 5 ...

254 Commits