Commit Graph

4741 Commits

Author SHA1 Message Date
Nick Sweeting
4cd2fceb8a even more migration fixes 2025-12-29 22:30:37 -08:00
Nick Sweeting
95beddc5fc more migration fixes 2025-12-29 22:12:57 -08:00
Nick Sweeting
2e350d317d fix initial migrtaions 2025-12-29 21:27:31 -08:00
Nick Sweeting
3dd329600e comment updates 2025-12-29 21:05:34 -08:00
Nick Sweeting
80f75126c6 more fixes 2025-12-29 21:03:05 -08:00
Nick Sweeting
147d567d3f fix migrations 2025-12-29 19:25:26 -08:00
Nick Sweeting
64dccb7a19 passing 2025-12-29 18:55:57 -08:00
Nick Sweeting
5549a79869 more speed fixes 2025-12-29 18:55:37 -08:00
Nick Sweeting
abf5f44134 more debug logging 2025-12-29 18:53:52 -08:00
Nick Sweeting
bcf0513d05 more debug logging 2025-12-29 18:50:04 -08:00
Nick Sweeting
7e6e3be9e7 messing with chrome install process to reuse cached chromium with pinned version 2025-12-29 18:49:36 -08:00
Nick Sweeting
b670612685 centralize chrome pid and zombie logic in chrome_utils 2025-12-29 17:57:23 -08:00
Nick Sweeting
4ba3e8d120 fix extension loading and consolidate chromium logic 2025-12-29 17:47:37 -08:00
Nick Sweeting
638b3ba774 add modalcloser plugin 2025-12-29 14:36:15 -08:00
Nick Sweeting
bdec5cb590 Fix: Make CUSTOM_TEMPLATES_DIR configurable again (#1725)
## Summary

Resolves #1484 where CUSTOM_TEMPLATES_DIR configuration was being
ignored.

The setting was previously removed from ServerConfig and hardcoded as a
constant, preventing users from customizing the templates directory
location.

## Changes

- Added CUSTOM_TEMPLATES_DIR field to StorageConfig in common.py
- Updated settings.py to use STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR
- Updated paths.py to use configurable value in version output

## Usage

Users can now configure the custom templates directory via:
- ArchiveBox.conf: `CUSTOM_TEMPLATES_DIR = ./custom_templates`
- Environment variable: `export CUSTOM_TEMPLATES_DIR=/path/to/templates`
- Defaults to DATA_DIR/user_templates if not configured

Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Restores CUSTOM_TEMPLATES_DIR configurability so users can override the
templates directory. Fixes issue #1484 and updates the app to
consistently use the configured path.

- **Bug Fixes**
  - Added CUSTOM_TEMPLATES_DIR to StorageConfig.
- Updated settings.py and paths.py to read
STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR.

- **Migration**
  - Configure via ArchiveBox.conf or the CUSTOM_TEMPLATES_DIR env var.
  - Defaults to DATA_DIR/user_templates if not set.

<sup>Written for commit 329d185d95.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-29 13:58:05 -08:00
Nick Sweeting
2be21ac592 fix: Use CustomUserAdmin to fix user creation bug (#1726)
### Summary

Fixed the bug where users created via the web GUI cannot login.

### Root Cause

The issue was in `archivebox/core/admin.py` which imported and
registered Django's default `UserAdmin` instead of the custom
`CustomUserAdmin` class. This bypassed all custom admin logic.
Additionally, `CustomUserAdmin` was modifying `fieldsets` without
explicitly preserving `add_fieldsets`, which could cause Django to not
properly handle the user creation form.

### Changes

- Updated `admin.py` to import and register `CustomUserAdmin`
- Explicitly set `add_fieldsets` in `CustomUserAdmin` to preserve
Django's default user creation behavior
- Added explanatory comments

### Testing

To verify the fix:
1. Start ArchiveBox web server
2. Navigate to the admin user creation page (`/admin/auth/user/add/`)
3. Create a new user with staff and superuser permissions
4. Log out and attempt to log in with the new user's credentials
5. Login should now succeed

Fixes #1707

Generated with [Claude Code](https://claude.ai/code)
2025-12-29 13:57:31 -08:00
Nick Sweeting
8c69124935 make infiniscroll plugin also expand details and comments sections 2025-12-29 13:55:27 -08:00
Nick Sweeting
621359c37c add duplicate issue detection bot with opencode 2025-12-29 13:55:26 -08:00
Nick Sweeting
b649db5294 fix infiniscroll plugin 2025-12-29 13:55:26 -08:00
claude[bot]
329d185d95 Fix: Make CUSTOM_TEMPLATES_DIR configurable again
Resolves issue #1484 where CUSTOM_TEMPLATES_DIR configuration was
being ignored. The setting was previously removed from ServerConfig
and hardcoded as a constant, preventing users from customizing the
templates directory location.

Changes:
- Added CUSTOM_TEMPLATES_DIR field to StorageConfig in common.py
- Updated settings.py to use STORAGE_CONFIG.CUSTOM_TEMPLATES_DIR
- Updated paths.py to use configurable value in version output

Users can now configure the custom templates directory via:
- ArchiveBox.conf: CUSTOM_TEMPLATES_DIR = ./custom_templates
- Environment variable: export CUSTOM_TEMPLATES_DIR=/path/to/templates
- Defaults to DATA_DIR/user_templates if not configured

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-29 21:50:21 +00:00
claude[bot]
2e1093f840 fix: Use CustomUserAdmin instead of Django's default UserAdmin to fix user creation bug
The bug was caused by importing Django's default UserAdmin instead of
CustomUserAdmin in admin.py. This bypassed all custom admin logic.

Additionally, CustomUserAdmin was modifying fieldsets without explicitly
preserving add_fieldsets, which can cause Django to not properly handle
the user creation form, leading to password hashing issues.

Changes:
- Updated admin.py to import and register CustomUserAdmin
- Explicitly set add_fieldsets in CustomUserAdmin to preserve Django's
  default user creation behavior and ensure passwords are properly hashed
- Added explanatory comments

Fixes #1707

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-29 21:47:53 +00:00
Nick Sweeting
9f015df0d8 Add Claude Code GitHub Workflow (#1724)
## 🤖 Installing Claude Code GitHub App

This PR adds a GitHub Actions workflow that enables Claude Code
integration in our repository.

### What is Claude Code?

[Claude Code](https://claude.com/claude-code) is an AI coding agent that
can help with:
- Bug fixes and improvements  
- Documentation updates
- Implementing new features
- Code reviews and suggestions
- Writing tests
- And more!

### How it works

Once this PR is merged, we'll be able to interact with Claude by
mentioning @claude in a pull request or issue comment.
Once the workflow is triggered, Claude will analyze the comment and
surrounding context, and execute on the request in a GitHub action.

### Important Notes

- **This workflow won't take effect until this PR is merged**
- **@claude mentions won't work until after the merge is complete**
- The workflow runs automatically whenever Claude is mentioned in PR or
issue comments
- Claude gets access to the entire PR or issue context including files,
diffs, and previous comments

### Security

- Our Anthropic API key is securely stored as a GitHub Actions secret
- Only users with write access to the repository can trigger the
workflow
- All Claude runs are stored in the GitHub Actions run history
- Claude's default tools are limited to reading/writing files and
interacting with our repo by creating comments, branches, and commits.
- We can add more allowed tools by adding them to the workflow file
like:

```
allowed_tools: Bash(npm install),Bash(npm run build),Bash(npm run lint),Bash(npm run test)
```

There's more information in the [Claude Code action
repo](https://github.com/anthropics/claude-code-action).

After merging this PR, let's try mentioning @claude in a comment on any
PR to get started!
2025-12-29 13:43:10 -08:00
Nick Sweeting
8c280100c7 Change permissions for pull-requests and issues 2025-12-29 13:42:59 -08:00
Nick Sweeting
d8b10d0827 Delete .github/workflows/claude-code-review.yml 2025-12-29 13:40:55 -08:00
Nick Sweeting
58b7f9c334 "Claude Code Review workflow" 2025-12-29 13:40:20 -08:00
Nick Sweeting
0162ee2434 "Claude PR Assistant workflow" 2025-12-29 13:40:18 -08:00
Nick Sweeting
34d03be891 Add MAX_URL_ATTEMPTS option to ArchiveBox (#1723)
…lures

Adds a new MAX_URL_ATTEMPTS configuration option (default: 50) that
stops retrying ArchiveResult hooks for a snapshot once that many
failures have been recorded. This prevents infinite retry loops for
problematic URLs.

When the limit is reached, any pending ArchiveResults for that snapshot
are marked as SKIPPED with an explanatory message.

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-29 13:32:11 -08:00
Nick Sweeting
690f0669cd remove uneeded test 2025-12-29 13:30:25 -08:00
Claude
f88182df7a Merge remote-tracking branch 'origin/dev' into claude/add-max-url-attempts-oBHCD 2025-12-29 21:29:01 +00:00
Nick Sweeting
73e977ea97 ytdlp fixes 2025-12-29 13:26:50 -08:00
Nick Sweeting
92c26124a3 remove more hardcoded plugin names from codebase 2025-12-29 13:21:47 -08:00
Nick Sweeting
967c5d53e0 make plugin config more consistent 2025-12-29 13:21:46 -08:00
Nick Sweeting
8d76b2b0c6 add infiniscroll plugin 2025-12-29 13:14:40 -08:00
Nick Sweeting
e20fdae2a5 fix gh ci cd 2025-12-29 13:14:40 -08:00
Claude
88d7906033 Add MAX_URL_ATTEMPTS config option to stop retries after too many failures
Adds a new MAX_URL_ATTEMPTS configuration option (default: 50) that stops
retrying ArchiveResult hooks for a snapshot once that many failures have
been recorded. This prevents infinite retry loops for problematic URLs.

When the limit is reached, any pending ArchiveResults for that snapshot
are marked as SKIPPED with an explanatory message.
2025-12-29 20:20:50 +00:00
Nick Sweeting
e38ddf3a25 Rename media plugin to ytdlp (#1722)
- Rename archivebox/plugins/media/ → archivebox/plugins/ytdlp/
- Rename hook script on_Snapshot__63_media.bg.py →
on_Snapshot__63_ytdlp.bg.py
- Update config.json: YTDLP_* as primary keys, MEDIA_* as x-aliases
- Update templates CSS classes: media-* → ytdlp-*
- Fix gallerydl bug: remove incorrect dependency on media plugin output
- Update all codebase references to use YTDLP_* and SAVE_YTDLP
- Add backwards compatibility test for MEDIA_ENABLED alias

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-29 11:47:05 -08:00
Claude
ac64c77341 move default yt-dlp args to config.json YTDLP_ARGS for user override
- Move hardcoded default args from Python to config.json YTDLP_ARGS
- Add get_ytdlp_args() function to read from YTDLP_ARGS env var
- Keep format arg with max_size in code (depends on YTDLP_MAX_SIZE)
- YTDLP_ARGS can be overridden as JSON array in environment
2025-12-29 19:38:37 +00:00
Claude
a5654e877f rename media plugin to ytdlp with backwards-compatible aliases
- Rename archivebox/plugins/media/ → archivebox/plugins/ytdlp/
- Rename hook script on_Snapshot__63_media.bg.py → on_Snapshot__63_ytdlp.bg.py
- Update config.json: YTDLP_* as primary keys, MEDIA_* as x-aliases
- Update templates CSS classes: media-* → ytdlp-*
- Fix gallerydl bug: remove incorrect dependency on media plugin output
- Update all codebase references to use YTDLP_* and SAVE_YTDLP
- Add backwards compatibility test for MEDIA_ENABLED alias
2025-12-29 19:09:05 +00:00
Nick Sweeting
30c60eef76 much better tests and add page ui 2025-12-29 04:02:11 -08:00
Nick Sweeting
9487f8a0de add ci for parallel tests 2025-12-29 02:39:24 -08:00
Nick Sweeting
f4e7820533 use full dotted paths for all archivebox imports, add migrations and more fixes 2025-12-29 00:47:08 -08:00
Nick Sweeting
1e4d3ffd11 improve plugin tests and config 2025-12-29 00:45:23 -08:00
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Nick Sweeting
54f91c1339 Improve concurrency control between plugin hooks (#1721)
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-28 12:48:53 -08:00
Nick Sweeting
6d991a08ea fix final_status uneeded 2025-12-28 12:47:36 -08:00
Claude
057b49ad85 Update status command to use DB as source of truth
Remove imports of deleted folder utility functions and rewrite
status command to query Snapshot model directly. This aligns with
the fs_version refactor where the DB is the single source of truth.

- Use Snapshot.objects queries for indexed/archived/unarchived counts
- Scan filesystem directly for present/orphaned directory counts
- Simplify output to focus on essential status information
2025-12-28 19:19:03 +00:00
Claude
767458e4e0 Revert "Restore missing folder utility functions"
This reverts commit 32bcf0896d.
2025-12-28 19:16:52 +00:00
Claude
32bcf0896d Restore missing folder utility functions
Restored 10 folder status functions that were accidentally removed:
- get_indexed_folders, get_archived_folders, get_unarchived_folders
- get_present_folders, get_valid_folders, get_invalid_folders
- get_duplicate_folders, get_orphaned_folders
- get_corrupted_folders, get_unrecognized_folders

These are required by archivebox_status.py for the status command.
Added safety checks for non-existent archive directories.
2025-12-28 14:00:48 +00:00
Claude
6b3c87276f Mark hook renumbering testing as complete in TODO
All hook utility tests pass (extract_step, is_background_hook, discover_hooks).
Model fields and methods verified (current_step, hook_name, advance_step_if_ready).
2025-12-28 13:48:11 +00:00
Claude
1b5a816022 Implement hook step-based concurrency system
This implements the hook concurrency plan from TODO_hook_concurrency.md:

## Schema Changes
- Add Snapshot.current_step (IntegerField 0-9, default=0)
- Create migration 0034_snapshot_current_step.py
- Fix uuid_compat imports in migrations 0032 and 0003

## Core Logic
- Add extract_step(hook_name) utility - extracts step from __XX_ pattern
- Add is_background_hook(hook_name) utility - checks for .bg. suffix
- Update Snapshot.create_pending_archiveresults() to create one AR per hook
- Update ArchiveResult.run() to handle hook_name field
- Add Snapshot.advance_step_if_ready() method for step advancement
- Integrate with SnapshotMachine.is_finished() to call advance_step_if_ready()

## Worker Coordination
- Update ArchiveResultWorker.get_queue() for step-based filtering
- ARs are only claimable when their step <= snapshot.current_step

## Hook Renumbering
- Step 5 (DOM extraction): singlefile→50, screenshot→51, pdf→52, dom→53,
  title→54, readability→55, headers→55, mercury→56, htmltotext→57
- Step 6 (post-DOM): wget→61, git→62, media→63.bg, gallerydl→64.bg,
  forumdl→65.bg, papersdl→66.bg
- Step 7 (URL extraction): parse_* hooks moved to 70-75

Background hooks (.bg suffix) don't block step advancement, enabling
long-running downloads to continue while other hooks proceed.
2025-12-28 13:47:25 +00:00