24 Commits

Author SHA1 Message Date
Nick Sweeting
9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Claude
4c77949197 Clean up on_Crawl hooks: remove duplicates and standardize naming
Deleted dead/duplicate hooks:
- wget/on_Crawl__10_install_wget.py (duplicate of __10_wget_validate_config.py)
- chrome/on_Crawl__00_chrome_install.py (simpler version, kept full one)
- chrome/on_Crawl__20_chrome_launch.bg.js (legacy, kept __30 version)
- singlefile/on_Crawl__20_install_singlefile_extension.js (disabled/dead)
- istilldontcareaboutcookies/on_Crawl__20_install_*.js (legacy)
- ublock/on_Crawl__03_ublock.js (legacy, kept __20 version)
- Entire captcha2/ plugin (legacy version of twocaptcha/)

Renamed hooks to follow consistent pattern: on_Crawl__XX_<plugin>_<action>.<ext>
Priority bands:
  00-09: Binary/extension installation
  10-19: Config validation
  20-29: Browser launch and post-launch config

Final hooks:
  00 ripgrep_install.py, 01 chrome_install.py
  02 istilldontcareaboutcookies_install.js
  03 ublock_install.js, 04 singlefile_install.js
  05 twocaptcha_install.js
  10 chrome_validate.py, 11 wget_validate.py
  20 chrome_launch.bg.js, 25 twocaptcha_config.js
2025-12-31 22:47:36 +00:00
Nick Sweeting
73fde81fce more migrations tweaks 2025-12-31 12:34:31 -08:00
Nick Sweeting
469932b469 more 2025-12-31 12:34:31 -08:00
Nick Sweeting
72f6a91b31 more progress bar and migrations fixes 2025-12-31 12:34:31 -08:00
Nick Sweeting
d5c0c64dcd fix progress bars 2025-12-31 12:34:29 -08:00
claude[bot]
5121b0e5f9 Merge branch 'dev' into claude/refactor-process-management-WcQyZ
Resolved conflicts by keeping Process model changes and accepting dev changes for unrelated files. Ensured pid_utils.py remains deleted as intended by this PR.

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-31 11:28:47 +00:00
Claude
65c839032a Consolidate Chrome test helpers across all plugin tests
- Add setup_test_env, launch_chromium_session, kill_chromium_session
  to chrome_test_helpers.py for extension tests
- Add chromium_session context manager for cleaner test code
- Refactor ublock, istilldontcareaboutcookies, twocaptcha tests to use
  shared helpers (~450 lines removed)
- Refactor screenshot, dom, pdf tests to use shared get_test_env
  and get_lib_dir (~60 lines removed)
- Net reduction: 228 lines of duplicate code
2025-12-31 08:30:14 +00:00
Claude
fd9ba86220 Reduce Chrome-related code duplication across JS and Python
This change consolidates duplicated logic between chrome_utils.js and
extension installer hooks, as well as between Python plugin tests:

JavaScript changes:
- Add getExtensionsDir() to centralize extension directory path calculation
- Add installExtensionWithCache() to handle extension install + cache workflow
- Add CLI commands for new utilities
- Refactor all 3 extension installers (ublock, istilldontcareaboutcookies,
  twocaptcha) to use shared utilities, reducing each from ~115 lines to ~60
- Update chrome_launch hook to use getExtensionsDir()

Python test changes:
- Add chrome_test_helpers.py with shared Chrome session management utilities
- Refactor infiniscroll and modalcloser tests to use shared helpers
- setup_chrome_session(), cleanup_chrome(), get_test_env() now centralized
- Add chrome_session() context manager for automatic cleanup

Net result: ~208 lines of code removed while maintaining same functionality.
2025-12-31 08:13:00 +00:00
Nick Sweeting
42d3fb7025 extension test fixes 2025-12-30 18:28:14 -08:00
Nick Sweeting
dd2302ad92 new jsonl cli interface 2025-12-30 16:12:53 -08:00
Nick Sweeting
64dccb7a19 passing 2025-12-29 18:55:57 -08:00
Nick Sweeting
5549a79869 more speed fixes 2025-12-29 18:55:37 -08:00
Nick Sweeting
abf5f44134 more debug logging 2025-12-29 18:53:52 -08:00
Nick Sweeting
bcf0513d05 more debug logging 2025-12-29 18:50:04 -08:00
Nick Sweeting
7e6e3be9e7 messing with chrome install process to reuse cached chromium with pinned version 2025-12-29 18:49:36 -08:00
Nick Sweeting
b670612685 centralize chrome pid and zombie logic in chrome_utils 2025-12-29 17:57:23 -08:00
Nick Sweeting
4ba3e8d120 fix extension loading and consolidate chromium logic 2025-12-29 17:47:37 -08:00
Nick Sweeting
1e4d3ffd11 improve plugin tests and config 2025-12-29 00:45:23 -08:00
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Nick Sweeting
50e527ec65 way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Claude
0941aca4a3 Improve test suite: remove mocks and add 0.8.x migration tests
- Remove mock-based tests from plugin tests (headers, singlefile, ublock, captcha2)
- Replace fake cache tests with real double-install tests that verify cache behavior
- Add SCHEMA_0_8 and seed_0_8_data() for testing 0.8.x data directory migrations
- Add TestMigrationFrom08x class with comprehensive migration tests:
  - Snapshot count preservation
  - Crawl record preservation
  - Snapshot-to-crawl relationship preservation
  - Tag preservation
  - ArchiveResult status preservation
  - CLI command verification after migration
- Add more CLI tests for add command (tags, multiple URLs, file input)
- All tests now use real functionality without mocking
2025-12-26 23:01:49 +00:00
Nick Sweeting
9838d7ba02 tons of ui fixes and plugin fixes 2025-12-25 03:59:51 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00