- Update Crawl.output_dir_parent to use username instead of user_id
for consistency with Snapshot paths
- Add domain from first URL to Crawl path structure for easier debugging:
users/{username}/crawls/YYYYMMDD/{domain}/{crawl_id}/
- Add CRAWL_OUTPUT_DIR to config passed to Snapshot hooks so chrome_tab
can find the shared Chrome session from the Crawl
- Update comment in chrome_tab hook to reflect new config source
- Convert Persona from plain Python class to Django model with ModelWithConfig
- Add config JSONField for persona-specific config overrides
- Add get_derived_config() method that returns config with derived paths:
- CHROME_USER_DATA_DIR, CHROME_EXTENSIONS_DIR, COOKIES_FILE, ACTIVE_PERSONA
- Update get_config() to accept persona parameter in merge chain:
get_config(persona=crawl.persona, crawl=crawl, snapshot=snapshot)
- Remove _derive_persona_paths() - derivation now happens in Persona model
- Merge order (highest to lowest priority):
1. snapshot.config
2. crawl.config
3. user.config
4. persona.get_derived_config() <- NEW
5. environment variables
6. ArchiveBox.conf file
7. plugin defaults
8. core defaults
Usage:
config = get_config(persona=crawl.persona, crawl=crawl)
config['CHROME_USER_DATA_DIR'] # derived from persona
- Add _derive_persona_paths() in configset.py to automatically derive
CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from ACTIVE_PERSONA
when not explicitly set. This allows plugins to use these paths
without knowing about the persona system.
- Update chrome_utils.js launchChromium() to accept userDataDir option
and pass --user-data-dir to Chrome. Also cleans up SingletonLock
before launch.
- Update killZombieChrome() to clean up SingletonLock files from all
persona chrome_user_data directories after killing zombies.
- Update chrome_cleanup() in misc/util.py to handle persona-based
user data directories when cleaning up stale Chrome state.
- Simplify on_Crawl__20_chrome_launch.bg.js to use CHROME_USER_DATA_DIR
and CHROME_EXTENSIONS_DIR from env (derived by get_config()).
Config priority flow:
ACTIVE_PERSONA=WorkAccount (set on crawl/snapshot)
-> get_config() derives:
CHROME_USER_DATA_DIR = PERSONAS_DIR/WorkAccount/chrome_user_data
CHROME_EXTENSIONS_DIR = PERSONAS_DIR/WorkAccount/chrome_extensions
-> hooks receive these as env vars without needing persona logic