- Add `archivebox persona create/list/update/delete` commands
- Support `--import=chrome|firefox|brave` to copy browser profile
- Extract cookies via CDP to generate cookies.txt for non-browser tools
- Fix JSDoc comment parsing issue in chrome_utils.js
- Add getMachineType, getLibDir, getNodeModulesDir, getTestEnv CLI commands to chrome_utils.js
These are now the single source of truth for path calculations
- Update chrome_test_helpers.py with call_chrome_utils() dispatcher
- Add get_test_env_from_js(), get_machine_type_from_js(), kill_chrome_via_js() helpers
- Update cleanup_chrome and kill_chromium_session to use JS killChrome
- Remove unused Chrome binary search lists from singlefile hook (~25 lines)
- Update readability, mercury, favicon, title tests to use shared helpers
This change consolidates duplicated logic between chrome_utils.js and
extension installer hooks, as well as between Python plugin tests:
JavaScript changes:
- Add getExtensionsDir() to centralize extension directory path calculation
- Add installExtensionWithCache() to handle extension install + cache workflow
- Add CLI commands for new utilities
- Refactor all 3 extension installers (ublock, istilldontcareaboutcookies,
twocaptcha) to use shared utilities, reducing each from ~115 lines to ~60
- Update chrome_launch hook to use getExtensionsDir()
Python test changes:
- Add chrome_test_helpers.py with shared Chrome session management utilities
- Refactor infiniscroll and modalcloser tests to use shared helpers
- setup_chrome_session(), cleanup_chrome(), get_test_env() now centralized
- Add chrome_session() context manager for automatic cleanup
Net result: ~208 lines of code removed while maintaining same functionality.
Simplifies the comma-separated parsing logic to:
- If value contains '[', parse as JSON array
- Otherwise, parse as comma-separated values
This prevents incorrect splitting of arguments containing internal commas
when there's only one argument. For arguments with commas, users should
use JSON format: CHROME_ARGS='["--arg1,val", "--arg2"]'
Also exports getEnvArray in module.exports for consistency.
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
Multiple hooks in the same plugin directory were overwriting each
other's stdout.log, stderr.log, hook.pid, and cmd.sh files. Now
each hook uses filenames prefixed with its hook name:
- on_Snapshot__20_chrome_tab.bg.stdout.log
- on_Snapshot__20_chrome_tab.bg.stderr.log
- on_Snapshot__20_chrome_tab.bg.pid
- on_Snapshot__20_chrome_tab.bg.sh
Updated:
- hooks.py run_hook() to use hook-specific names
- core/models.py cleanup and update_from_output methods
- Plugin scripts to no longer write redundant hook.pid files
- Add comprehensive default CHROME_ARGS in config.json with 55+ flags
for deterministic rendering, security, performance, and UI suppression
- Update chrome_utils.js launchChromium() to read CHROME_ARGS and
CHROME_ARGS_EXTRA from environment variables (set by get_config())
- Add getEnvArray() helper to parse JSON arrays or comma-separated
strings from environment variables
- Separate args into three categories:
1. baseArgs: Static flags from CHROME_ARGS config (configurable)
2. dynamicArgs: Runtime-computed flags (port, sandbox, headless, etc.)
3. extraArgs: User overrides from CHROME_ARGS_EXTRA
- Add CHROME_SANDBOX config option to control --no-sandbox flag
Args are now configurable via:
- config.json defaults
- ArchiveBox.conf file
- Environment variables
- Per-crawl/snapshot config overrides
- Add _derive_persona_paths() in configset.py to automatically derive
CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from ACTIVE_PERSONA
when not explicitly set. This allows plugins to use these paths
without knowing about the persona system.
- Update chrome_utils.js launchChromium() to accept userDataDir option
and pass --user-data-dir to Chrome. Also cleans up SingletonLock
before launch.
- Update killZombieChrome() to clean up SingletonLock files from all
persona chrome_user_data directories after killing zombies.
- Update chrome_cleanup() in misc/util.py to handle persona-based
user data directories when cleaning up stale Chrome state.
- Simplify on_Crawl__20_chrome_launch.bg.js to use CHROME_USER_DATA_DIR
and CHROME_EXTENSIONS_DIR from env (derived by get_config()).
Config priority flow:
ACTIVE_PERSONA=WorkAccount (set on crawl/snapshot)
-> get_config() derives:
CHROME_USER_DATA_DIR = PERSONAS_DIR/WorkAccount/chrome_user_data
CHROME_EXTENSIONS_DIR = PERSONAS_DIR/WorkAccount/chrome_extensions
-> hooks receive these as env vars without needing persona logic