### Summary
Fixed the bug where users created via the web GUI cannot login.
### Root Cause
The issue was in `archivebox/core/admin.py` which imported and
registered Django's default `UserAdmin` instead of the custom
`CustomUserAdmin` class. This bypassed all custom admin logic.
Additionally, `CustomUserAdmin` was modifying `fieldsets` without
explicitly preserving `add_fieldsets`, which could cause Django to not
properly handle the user creation form.
### Changes
- Updated `admin.py` to import and register `CustomUserAdmin`
- Explicitly set `add_fieldsets` in `CustomUserAdmin` to preserve
Django's default user creation behavior
- Added explanatory comments
### Testing
To verify the fix:
1. Start ArchiveBox web server
2. Navigate to the admin user creation page (`/admin/auth/user/add/`)
3. Create a new user with staff and superuser permissions
4. Log out and attempt to log in with the new user's credentials
5. Login should now succeed
Fixes#1707
Generated with [Claude Code](https://claude.ai/code)
The bug was caused by importing Django's default UserAdmin instead of
CustomUserAdmin in admin.py. This bypassed all custom admin logic.
Additionally, CustomUserAdmin was modifying fieldsets without explicitly
preserving add_fieldsets, which can cause Django to not properly handle
the user creation form, leading to password hashing issues.
Changes:
- Updated admin.py to import and register CustomUserAdmin
- Explicitly set add_fieldsets in CustomUserAdmin to preserve Django's
default user creation behavior and ensure passwords are properly hashed
- Added explanatory comments
Fixes#1707
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
## 🤖 Installing Claude Code GitHub App
This PR adds a GitHub Actions workflow that enables Claude Code
integration in our repository.
### What is Claude Code?
[Claude Code](https://claude.com/claude-code) is an AI coding agent that
can help with:
- Bug fixes and improvements
- Documentation updates
- Implementing new features
- Code reviews and suggestions
- Writing tests
- And more!
### How it works
Once this PR is merged, we'll be able to interact with Claude by
mentioning @claude in a pull request or issue comment.
Once the workflow is triggered, Claude will analyze the comment and
surrounding context, and execute on the request in a GitHub action.
### Important Notes
- **This workflow won't take effect until this PR is merged**
- **@claude mentions won't work until after the merge is complete**
- The workflow runs automatically whenever Claude is mentioned in PR or
issue comments
- Claude gets access to the entire PR or issue context including files,
diffs, and previous comments
### Security
- Our Anthropic API key is securely stored as a GitHub Actions secret
- Only users with write access to the repository can trigger the
workflow
- All Claude runs are stored in the GitHub Actions run history
- Claude's default tools are limited to reading/writing files and
interacting with our repo by creating comments, branches, and commits.
- We can add more allowed tools by adding them to the workflow file
like:
```
allowed_tools: Bash(npm install),Bash(npm run build),Bash(npm run lint),Bash(npm run test)
```
There's more information in the [Claude Code action
repo](https://github.com/anthropics/claude-code-action).
After merging this PR, let's try mentioning @claude in a comment on any
PR to get started!
…lures
Adds a new MAX_URL_ATTEMPTS configuration option (default: 50) that
stops retrying ArchiveResult hooks for a snapshot once that many
failures have been recorded. This prevents infinite retry loops for
problematic URLs.
When the limit is reached, any pending ArchiveResults for that snapshot
are marked as SKIPPED with an explanatory message.
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
Adds a new MAX_URL_ATTEMPTS configuration option (default: 50) that stops
retrying ArchiveResult hooks for a snapshot once that many failures have
been recorded. This prevents infinite retry loops for problematic URLs.
When the limit is reached, any pending ArchiveResults for that snapshot
are marked as SKIPPED with an explanatory message.
- Rename archivebox/plugins/media/ → archivebox/plugins/ytdlp/
- Rename hook script on_Snapshot__63_media.bg.py →
on_Snapshot__63_ytdlp.bg.py
- Update config.json: YTDLP_* as primary keys, MEDIA_* as x-aliases
- Update templates CSS classes: media-* → ytdlp-*
- Fix gallerydl bug: remove incorrect dependency on media plugin output
- Update all codebase references to use YTDLP_* and SAVE_YTDLP
- Add backwards compatibility test for MEDIA_ENABLED alias
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
- Move hardcoded default args from Python to config.json YTDLP_ARGS
- Add get_ytdlp_args() function to read from YTDLP_ARGS env var
- Keep format arg with max_size in code (depends on YTDLP_MAX_SIZE)
- YTDLP_ARGS can be overridden as JSON array in environment
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
Remove imports of deleted folder utility functions and rewrite
status command to query Snapshot model directly. This aligns with
the fs_version refactor where the DB is the single source of truth.
- Use Snapshot.objects queries for indexed/archived/unarchived counts
- Scan filesystem directly for present/orphaned directory counts
- Simplify output to focus on essential status information
Restored 10 folder status functions that were accidentally removed:
- get_indexed_folders, get_archived_folders, get_unarchived_folders
- get_present_folders, get_valid_folders, get_invalid_folders
- get_duplicate_folders, get_orphaned_folders
- get_corrupted_folders, get_unrecognized_folders
These are required by archivebox_status.py for the status command.
Added safety checks for non-existent archive directories.
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
Replace old `output` field with new fields across the codebase:
- output_str: Human-readable output summary
- output_json: Structured metadata (optional)
- output_files: Dict of output files with metadata
- output_size: Total size in bytes
- output_mimetypes: CSV of file mimetypes
Files updated:
- api/v1_core.py: Update MinimalArchiveResultSchema to expose new fields
- api/v1_core.py: Update ArchiveResultFilterSchema to search output_str
- cli/archivebox_extract.py: Use output_str in CLI output
- core/admin_archiveresults.py: Update admin fields, search, and fieldsets
- core/admin_archiveresults.py: Fix output_html variable name bug in output_summary
- misc/jsonl.py: Update archiveresult_to_jsonl() to include new fields
- plugins/extractor_utils.py: Update ExtractorResult helper class
The embed_path() method already uses output_files and output_str,
so snapshot detail page and template tags work correctly.
- Add test_hooks.py with 31 unit tests covering:
- Background hook detection (.bg. suffix)
- JSONL parsing (clean format and legacy RESULT_JSON= format)
- Install hook XYZ_BINARY env var handling
- Hook discovery and sorting
- get_extractor_name() function
- Hook execution with real subprocesses
- Install hook output format compliance
- Snapshot hook output format compliance
- Plugin metadata addition
- Update TODO_hook_architecture.md to mark all tasks complete:
- Tests: 31 tests in archivebox/tests/test_hooks.py
- Migrations: 0029 and 0030 applied successfully
All phases of the hook architecture implementation are now complete.
All snapshot hooks now:
- Read XYZ_BINARY env vars and use in cmd
- Output exactly one clean JSONL line (no RESULT_JSON= prefix)
- No extra output lines (VERSION=, START_TS=, etc.)
- Only provide allowed fields
- Don't include computed fields
- Python hooks include cmd array with binary path
- All install hooks now respect their respective XYZ_BINARY env vars
(e.g., WGET_BINARY, CHROME_BINARY, YTDLP_BINARY, etc.)
- Support both absolute paths (/usr/bin/wget2) and binary names (wget2)
- Dynamic bin_name used in Dependency JSONL output
- Updated 11 install hooks to follow the new pattern
- Mark checklist items as complete in TODO_hook_architecture.md
- Rename 13 on_Crawl__00_validate_* hooks to on_Crawl__00_install_*
- This better reflects what these hooks actually do (check/install binaries)
- Update TODO_hook_architecture.md to reflect renamed hooks
Phase 1: Database migration for new ArchiveResult fields
- Add output_str (TextField) for human-readable summary
- Add output_json (JSONField) for structured metadata
- Add output_files (JSONField) for dict of {relative_path: {}}
- Add output_size (BigIntegerField) for total bytes
- Add output_mimetypes (CharField) for CSV of mimetypes
- Add binary FK to InstalledBinary (optional)
- Migrate existing 'output' field to new split fields
Phase 3: Update run_hook() for JSONL parsing
- Support new JSONL format (any line with {type: 'ModelName', ...})
- Maintain backwards compatibility with RESULT_JSON= format
- Add plugin metadata to each parsed record
- Detect background hooks with .bg. suffix in filename
- Add find_binary_for_cmd() helper function
- Add create_model_record() for processing side-effect records
Phase 6: Update ArchiveResult.run()
- Handle background hooks (return immediately when result is None)
- Process 'records' from HookResult for side-effect models
- Use new output fields (output_str, output_json, output_files, etc.)
- Call create_model_record() for InstalledBinary, Machine updates
Phase 7: Add background hook support
- Add is_background_hook() method to ArchiveResult
- Add check_background_completed() to check if process exited
- Add finalize_background_hook() to collect results from completed hooks
- Update SnapshotMachine.is_finished() to check/finalize background hooks
- Update _populate_output_fields() to walk directory and populate stats
Also updated references to old 'output' field in:
- admin_archiveresults.py
- statemachines.py
- templatetags/core_tags.py