Simplifies the comma-separated parsing logic to:
- If value contains '[', parse as JSON array
- Otherwise, parse as comma-separated values
This prevents incorrect splitting of arguments containing internal commas
when there's only one argument. For arguments with commas, users should
use JSON format: CHROME_ARGS='["--arg1,val", "--arg2"]'
Also exports getEnvArray in module.exports for consistency.
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
- Convert Persona from plain Python class to Django model with ModelWithConfig
- Add config JSONField for persona-specific config overrides
- Add get_derived_config() method that returns config with derived paths:
- CHROME_USER_DATA_DIR, CHROME_EXTENSIONS_DIR, COOKIES_FILE, ACTIVE_PERSONA
- Update get_config() to accept persona parameter in merge chain:
get_config(persona=crawl.persona, crawl=crawl, snapshot=snapshot)
- Remove _derive_persona_paths() - derivation now happens in Persona model
- Merge order (highest to lowest priority):
1. snapshot.config
2. crawl.config
3. user.config
4. persona.get_derived_config() <- NEW
5. environment variables
6. ArchiveBox.conf file
7. plugin defaults
8. core defaults
Usage:
config = get_config(persona=crawl.persona, crawl=crawl)
config['CHROME_USER_DATA_DIR'] # derived from persona
- Remove standalone convenience functions (cleanup_chrome_for_persona,
cleanup_chrome_all_personas) to reduce LOC
- Change Persona.get_active(config) to accept config dict as argument
instead of calling get_config() internally, since the caller needs
to pass user/crawl/snapshot/archiveresult context for proper config
- Create Persona class in personas/models.py for managing browser
profiles/identities used for archiving sessions
- Each Persona has:
- chrome_user_data_dir: Chrome profile directory
- chrome_extensions_dir: Installed extensions
- cookies_file: Cookies for wget/curl
- config_file: Persona-specific config overrides
- Add Persona methods:
- cleanup_chrome(): Remove stale SingletonLock/SingletonSocket files
- get_config(): Load persona config from config.json
- save_config(): Save persona config to config.json
- ensure_dirs(): Create persona directory structure
- all(): Iterator over all personas
- get_active(): Get persona based on ACTIVE_PERSONA config
- cleanup_chrome_all(): Clean up all personas
- Update chrome_cleanup() in misc/util.py to use Persona.cleanup_chrome_all()
instead of manual directory iteration
- Add convenience functions:
- cleanup_chrome_for_persona(name)
- cleanup_chrome_all_personas()
- Add comprehensive default CHROME_ARGS in config.json with 55+ flags
for deterministic rendering, security, performance, and UI suppression
- Update chrome_utils.js launchChromium() to read CHROME_ARGS and
CHROME_ARGS_EXTRA from environment variables (set by get_config())
- Add getEnvArray() helper to parse JSON arrays or comma-separated
strings from environment variables
- Separate args into three categories:
1. baseArgs: Static flags from CHROME_ARGS config (configurable)
2. dynamicArgs: Runtime-computed flags (port, sandbox, headless, etc.)
3. extraArgs: User overrides from CHROME_ARGS_EXTRA
- Add CHROME_SANDBOX config option to control --no-sandbox flag
Args are now configurable via:
- config.json defaults
- ArchiveBox.conf file
- Environment variables
- Per-crawl/snapshot config overrides
- Add _derive_persona_paths() in configset.py to automatically derive
CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from ACTIVE_PERSONA
when not explicitly set. This allows plugins to use these paths
without knowing about the persona system.
- Update chrome_utils.js launchChromium() to accept userDataDir option
and pass --user-data-dir to Chrome. Also cleans up SingletonLock
before launch.
- Update killZombieChrome() to clean up SingletonLock files from all
persona chrome_user_data directories after killing zombies.
- Update chrome_cleanup() in misc/util.py to handle persona-based
user data directories when cleaning up stale Chrome state.
- Simplify on_Crawl__20_chrome_launch.bg.js to use CHROME_USER_DATA_DIR
and CHROME_EXTENSIONS_DIR from env (derived by get_config()).
Config priority flow:
ACTIVE_PERSONA=WorkAccount (set on crawl/snapshot)
-> get_config() derives:
CHROME_USER_DATA_DIR = PERSONAS_DIR/WorkAccount/chrome_user_data
CHROME_EXTENSIONS_DIR = PERSONAS_DIR/WorkAccount/chrome_extensions
-> hooks receive these as env vars without needing persona logic
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [x] Snapshot data layout on disk
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Switch snapshot index storage from index.json to a flat index.jsonl
format for easier parsing and extensibility. Includes automatic
migration and backward-compatible reading, plus updated CLI pipeline to
emit/consume JSONL records.
- **New Features**
- Write and read index.jsonl with per-line records (Snapshot,
ArchiveResult, Binary, Process); reconcile prefers JSONL.
- Auto-convert legacy index.json to JSONL during migration/update;
load_from_directory/create_from_directory support both formats.
- Serialization moved to model to_jsonl methods; added schema_version to
all records, including Tag, Crawl, Binary, and Process.
- CLI pipeline updated: crawl creates a single Crawl job from all input
URLs and outputs Crawl JSONL (no immediate crawling); snapshot accepts
Crawl JSONL/IDs and outputs Snapshot JSONL; extract outputs
ArchiveResult JSONL via model methods.
- **Migration**
- Conversion runs during filesystem migration and reconcile; no manual
steps.
- Legacy index.json is deleted after conversion; external tools should
switch to index.jsonl.
<sup>Written for commit 251fe33e49.
Summary will update on new commits.</sup>
<!-- End of auto-generated description by cubic. -->
Changed from singular --plugin to plural --plugins in both snapshot and extract
commands to match the pattern in archivebox add command. Updated to accept
comma-separated plugin names (e.g., --plugins=screenshot,singlefile,title).
- Updated CLI option from --plugin to --plugins
- Added parsing for comma-separated plugin names
- Updated function signatures and logic to handle multiple plugins
- Updated help text, docstrings, and examples
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
The --plugins parameter was incorrectly renamed to --extract (boolean).
This restores --plugin (singular, matching extract command) with correct
semantics: specify which plugin to run after creating snapshots.
- Changed --extract/--no-extract back to --plugin (string parameter)
- Updated function signature and logic to use plugin parameter
- Added ArchiveResult creation for specific plugin when --plugin is passed
- Updated docstring and examples
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
- Add JSONL_INDEX_FILENAME to ALLOWED_IN_DATA_DIR for consistency
- Fix fallback logic in legacy.py to try JSON when JSONL parsing fails
- Replace bare except clauses with specific exception types
- Fix stdin double-consumption in archivebox_crawl.py
- Merge CLI --tag option with crawl tags in archivebox_snapshot.py
- Remove tautological mock tests (covered by integration tests)
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
- archivebox crawl now creates one Crawl with all URLs as newline-separated string
- Updated tests to reflect new pipeline: crawl -> snapshot -> extract
- Added tests for Crawl JSONL parsing and output
- Tests verify Crawl.from_jsonl() handles multiple URLs correctly
Implement a sleek inline tag editor with autocomplete and AJAX support:
- Create TagEditorWidget and InlineTagEditorWidget in core/widgets.py
- Pills display with X remove button, sorted alphabetically
- Text input with HTML5 datalist autocomplete
- Enter/Space/Comma to add tags, auto-creates if doesn't exist
- Backspace removes last tag when input is empty
- Add API endpoints in api/v1_core.py
- GET /tags/autocomplete/ - search tags by name
- POST /tags/create/ - get_or_create tag
- POST /tags/add-to-snapshot/ - add tag to snapshot via AJAX
- POST /tags/remove-from-snapshot/ - remove tag from snapshot
- Update admin_snapshots.py
- Replace FilteredSelectMultiple with TagEditorWidget in bulk actions
- Create SnapshotAdminForm with tags_editor field
- Update title_str() to render inline tag editor in list view
- Remove TagInline, use widget instead
- Add CSS styles in templates/admin/base.html
- Blue gradient pill styling matching admin theme
- Focus ring and hover states
- Compact inline variant for list view
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Implemented a new interactive tags editor for Django admin with
autocomplete and AJAX add/remove, replacing the old multi-select and
inline. This makes tagging snapshots faster and safer in detail, list,
and bulk actions.
- **New Features**
- TagEditorWidget and InlineTagEditorWidget with pill UI and remove
buttons, XSS-safe rendering, and delegated events.
- Keyboard support: Enter/Space/Comma to add, Backspace to remove last
when input is empty.
- Datalist autocomplete and debounced search via GET
/tags/autocomplete/.
- AJAX endpoints: POST /tags/create/, /tags/add-to-snapshot/,
/tags/remove-from-snapshot/.
- **Refactors**
- Replaced FilteredSelectMultiple with TagEditorWidget in bulk actions;
parse comma-separated tags and use bulk_create/delete for efficient
add/remove.
- Added SnapshotAdminForm with tags_editor field; saves tags
case-insensitively and fixes remove_tags matching.
- Rendered inline tag editor in list view via title_str; removed
TagInline.
- Added CSS in admin/base.html for pill styling, focus ring, and compact
inline variant.
<sup>Written for commit 0dee662f41.
Summary will update on new commits.</sup>
<!-- End of auto-generated description by cubic. -->
- Add Tag.to_jsonl() method with schema_version
- Add Crawl.to_jsonl() method with schema_version
- Fix Tag.from_jsonl() to not depend on jsonl.py helper
- Update tests to use Snapshot.from_jsonl() instead of non-existent get_or_create_snapshot
Remove model-specific functions from misc/jsonl.py:
- tag_to_jsonl() - use Tag.to_jsonl() instead
- crawl_to_jsonl() - use Crawl.to_jsonl() instead
- get_or_create_tag() - use Tag.from_jsonl() instead
- process_jsonl_records() - use model from_jsonl() methods directly
jsonl.py now only contains generic I/O utilities:
- Type constants (TYPE_SNAPSHOT, etc.)
- parse_line(), read_stdin(), read_file(), read_args_or_stdin()
- write_record(), write_records()
- filter_by_type(), process_records()
- add_tags: Uses SnapshotTag.objects.bulk_create() with ignore_conflicts
Instead of N calls to obj.tags.add(), now makes 1 query per tag
- remove_tags: Uses single SnapshotTag.objects.filter().delete()
Instead of N calls to obj.tags.remove(), now makes 1 query total
Works correctly with "select all across pages" via queryset.values_list()
- Fix case-sensitivity mismatch in remove_tags (use name__iexact)
- Fix XSS vulnerability by removing onclick attributes
- Use data attributes and event delegation instead
- Apply DOM APIs to prevent injection attacks
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
Move JSONL serialization from standalone functions to model methods
to mirror the from_jsonl() pattern:
- Add Binary.to_jsonl() method
- Add Process.to_jsonl() method
- Add ArchiveResult.to_jsonl() method
- Add Snapshot.to_jsonl() method
- Update write_index_jsonl() to use model methods
- Update jsonl.py functions to be thin wrappers
Switch from hierarchical index.json to flat index.jsonl format for
snapshot metadata storage. Each line is a self-contained JSON record
with a 'type' field (Snapshot, ArchiveResult, Binary, Process).
Changes:
- Add JSONL_INDEX_FILENAME constant to constants.py
- Add TYPE_PROCESS and TYPE_MACHINE to jsonl.py type constants
- Add binary_to_jsonl(), process_to_jsonl(), machine_to_jsonl() converters
- Add Snapshot.write_index_jsonl() to write new format
- Add Snapshot.read_index_jsonl() to read new format
- Add Snapshot.convert_index_json_to_jsonl() for migration
- Update Snapshot.reconcile_with_index() to handle both formats
- Update fs_migrate to convert during filesystem migration
- Update load_from_directory/create_from_directory for both formats
- Update legacy.py parse_json_links_details for JSONL support
The new format is easier to parse, extend, and mix record types.
Implement a sleek inline tag editor with autocomplete and AJAX support:
- Create TagEditorWidget and InlineTagEditorWidget in core/widgets.py
- Pills display with X remove button, sorted alphabetically
- Text input with HTML5 datalist autocomplete
- Enter/Space/Comma to add tags, auto-creates if doesn't exist
- Backspace removes last tag when input is empty
- Add API endpoints in api/v1_core.py
- GET /tags/autocomplete/ - search tags by name
- POST /tags/create/ - get_or_create tag
- POST /tags/add-to-snapshot/ - add tag to snapshot via AJAX
- POST /tags/remove-from-snapshot/ - remove tag from snapshot
- Update admin_snapshots.py
- Replace FilteredSelectMultiple with TagEditorWidget in bulk actions
- Create SnapshotAdminForm with tags_editor field
- Update title_str() to render inline tag editor in list view
- Remove TagInline, use widget instead
- Add CSS styles in templates/admin/base.html
- Blue gradient pill styling matching admin theme
- Focus ring and hover states
- Compact inline variant for list view
## Summary
Resolves#1484 where CUSTOM_TEMPLATES_DIR configuration was being
ignored.
The setting was previously removed from ServerConfig and hardcoded as a
constant, preventing users from customizing the templates directory
location.
## Changes
- Added CUSTOM_TEMPLATES_DIR field to StorageConfig in common.py
- Updated settings.py to use STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR
- Updated paths.py to use configurable value in version output
## Usage
Users can now configure the custom templates directory via:
- ArchiveBox.conf: `CUSTOM_TEMPLATES_DIR = ./custom_templates`
- Environment variable: `export CUSTOM_TEMPLATES_DIR=/path/to/templates`
- Defaults to DATA_DIR/user_templates if not configured
Generated with [Claude Code](https://claude.ai/code)
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Restores CUSTOM_TEMPLATES_DIR configurability so users can override the
templates directory. Fixes issue #1484 and updates the app to
consistently use the configured path.
- **Bug Fixes**
- Added CUSTOM_TEMPLATES_DIR to StorageConfig.
- Updated settings.py and paths.py to read
STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR.
- **Migration**
- Configure via ArchiveBox.conf or the CUSTOM_TEMPLATES_DIR env var.
- Defaults to DATA_DIR/user_templates if not set.
<sup>Written for commit 329d185d95.
Summary will update automatically on new commits.</sup>
<!-- End of auto-generated description by cubic. -->
### Summary
Fixed the bug where users created via the web GUI cannot login.
### Root Cause
The issue was in `archivebox/core/admin.py` which imported and
registered Django's default `UserAdmin` instead of the custom
`CustomUserAdmin` class. This bypassed all custom admin logic.
Additionally, `CustomUserAdmin` was modifying `fieldsets` without
explicitly preserving `add_fieldsets`, which could cause Django to not
properly handle the user creation form.
### Changes
- Updated `admin.py` to import and register `CustomUserAdmin`
- Explicitly set `add_fieldsets` in `CustomUserAdmin` to preserve
Django's default user creation behavior
- Added explanatory comments
### Testing
To verify the fix:
1. Start ArchiveBox web server
2. Navigate to the admin user creation page (`/admin/auth/user/add/`)
3. Create a new user with staff and superuser permissions
4. Log out and attempt to log in with the new user's credentials
5. Login should now succeed
Fixes#1707
Generated with [Claude Code](https://claude.ai/code)
Resolves issue #1484 where CUSTOM_TEMPLATES_DIR configuration was
being ignored. The setting was previously removed from ServerConfig
and hardcoded as a constant, preventing users from customizing the
templates directory location.
Changes:
- Added CUSTOM_TEMPLATES_DIR field to StorageConfig in common.py
- Updated settings.py to use STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR
- Updated paths.py to use configurable value in version output
Users can now configure the custom templates directory via:
- ArchiveBox.conf: CUSTOM_TEMPLATES_DIR = ./custom_templates
- Environment variable: export CUSTOM_TEMPLATES_DIR=/path/to/templates
- Defaults to DATA_DIR/user_templates if not configured
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
The bug was caused by importing Django's default UserAdmin instead of
CustomUserAdmin in admin.py. This bypassed all custom admin logic.
Additionally, CustomUserAdmin was modifying fieldsets without explicitly
preserving add_fieldsets, which can cause Django to not properly handle
the user creation form, leading to password hashing issues.
Changes:
- Updated admin.py to import and register CustomUserAdmin
- Explicitly set add_fieldsets in CustomUserAdmin to preserve Django's
default user creation behavior and ensure passwords are properly hashed
- Added explanatory comments
Fixes#1707
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>