Nick Sweeting
30c60eef76
much better tests and add page ui
2025-12-29 04:02:11 -08:00
Nick Sweeting
f4e7820533
use full dotted paths for all archivebox imports, add migrations and more fixes
2025-12-29 00:47:08 -08:00
Nick Sweeting
f0aa19fa7d
wip
2025-12-28 17:51:54 -08:00
Nick Sweeting
54f91c1339
Improve concurrency control between plugin hooks ( #1721 )
...
<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->
# Summary
<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->
# Related issues
<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->
# Changes these areas
- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk
2025-12-28 12:48:53 -08:00
Nick Sweeting
6d991a08ea
fix final_status uneeded
2025-12-28 12:47:36 -08:00
Claude
1b5a816022
Implement hook step-based concurrency system
...
This implements the hook concurrency plan from TODO_hook_concurrency.md:
## Schema Changes
- Add Snapshot.current_step (IntegerField 0-9, default=0)
- Create migration 0034_snapshot_current_step.py
- Fix uuid_compat imports in migrations 0032 and 0003
## Core Logic
- Add extract_step(hook_name) utility - extracts step from __XX_ pattern
- Add is_background_hook(hook_name) utility - checks for .bg. suffix
- Update Snapshot.create_pending_archiveresults() to create one AR per hook
- Update ArchiveResult.run() to handle hook_name field
- Add Snapshot.advance_step_if_ready() method for step advancement
- Integrate with SnapshotMachine.is_finished() to call advance_step_if_ready()
## Worker Coordination
- Update ArchiveResultWorker.get_queue() for step-based filtering
- ARs are only claimable when their step <= snapshot.current_step
## Hook Renumbering
- Step 5 (DOM extraction): singlefile→50, screenshot→51, pdf→52, dom→53,
title→54, readability→55, headers→55, mercury→56, htmltotext→57
- Step 6 (post-DOM): wget→61, git→62, media→63.bg, gallerydl→64.bg,
forumdl→65.bg, papersdl→66.bg
- Step 7 (URL extraction): parse_* hooks moved to 70-75
Background hooks (.bg suffix) don't block step advancement, enabling
long-running downloads to continue while other hooks proceed.
2025-12-28 13:47:25 +00:00
Nick Sweeting
bd265c0083
rename extractor to plugin everywhere
2025-12-28 04:43:15 -08:00
Nick Sweeting
50e527ec65
way better plugin hooks system wip
2025-12-28 03:39:59 -08:00
Nick Sweeting
9b533ad3c8
tweak concurrency for more speed
2025-12-27 12:08:53 -08:00
Nick Sweeting
9838d7ba02
tons of ui fixes and plugin fixes
2025-12-25 03:59:51 -08:00
Nick Sweeting
bb53228ebf
remove Seed model in favor of Crawl as template
2025-12-25 01:52:41 -08:00
Nick Sweeting
866f993f26
logging and admin ui improvements
2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186
remove huey
2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c
wip 2
2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81
wip major changes
2025-12-24 20:10:38 -08:00
Nick Sweeting
c1335fed37
Remove ABID system and KVTag model - use UUIDv7 IDs exclusively
...
This commit completes the simplification of the ID system by:
- Removing the ABID (ArchiveBox ID) system entirely
- Removing the base_models/abid.py file
- Removing KVTag model in favor of the existing Tag model in core/models.py
- Simplifying all models to use standard UUIDv7 primary keys
- Removing ABID-related admin functionality
- Cleaning up commented-out ABID code from views and statemachines
- Deleting migration files for ABID field removal (no longer needed)
All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)`
Note: Old migrations containing ABID references are preserved for database
migration history compatibility.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-24 06:13:49 -08:00
Nick Sweeting
f6d22a3cc4
tweak worker updated logic and add output_dir_template and symlinks logic
2024-12-13 06:03:52 -08:00
Nick Sweeting
c11a1b54f1
add new worker test
2024-12-12 22:08:18 -08:00
Nick Sweeting
5c06b8ff00
add new Event model to workers/models
2024-12-12 22:08:17 -08:00
Nick Sweeting
2a1afcf6c2
move crawl models back into dedicated app
2024-12-12 21:45:55 -08:00
Nick Sweeting
5cf7725f0e
add new archivebox worker implementation based on better distributed systems principles
2024-12-12 21:41:45 -08:00
Nick Sweeting
28386ff172
add jobs_dashboard.html back
2024-11-19 05:35:52 -08:00
Nick Sweeting
2595139180
improve statemachine logging and archivebox update CLI cmd
2024-11-19 03:31:05 -08:00
Nick Sweeting
c9a05c9d94
working archivebox update CLI cmd
2024-11-19 02:32:05 -08:00
Nick Sweeting
328eb98a38
move main funcs into cli files and switch to using click for CLI
2024-11-19 00:18:51 -08:00
Nick Sweeting
4a5d607296
move logging_util into archivebox.misc subfolder
2024-11-18 19:08:49 -08:00
Nick Sweeting
e469c5a344
merge queues and actors apps into new workers app
2024-11-18 18:52:48 -08:00