Commit Graph

32 Commits

Author SHA1 Message Date
Nick Sweeting
f4e7820533 use full dotted paths for all archivebox imports, add migrations and more fixes 2025-12-29 00:47:08 -08:00
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Claude
1b5a816022 Implement hook step-based concurrency system
This implements the hook concurrency plan from TODO_hook_concurrency.md:

## Schema Changes
- Add Snapshot.current_step (IntegerField 0-9, default=0)
- Create migration 0034_snapshot_current_step.py
- Fix uuid_compat imports in migrations 0032 and 0003

## Core Logic
- Add extract_step(hook_name) utility - extracts step from __XX_ pattern
- Add is_background_hook(hook_name) utility - checks for .bg. suffix
- Update Snapshot.create_pending_archiveresults() to create one AR per hook
- Update ArchiveResult.run() to handle hook_name field
- Add Snapshot.advance_step_if_ready() method for step advancement
- Integrate with SnapshotMachine.is_finished() to call advance_step_if_ready()

## Worker Coordination
- Update ArchiveResultWorker.get_queue() for step-based filtering
- ARs are only claimable when their step <= snapshot.current_step

## Hook Renumbering
- Step 5 (DOM extraction): singlefile→50, screenshot→51, pdf→52, dom→53,
  title→54, readability→55, headers→55, mercury→56, htmltotext→57
- Step 6 (post-DOM): wget→61, git→62, media→63.bg, gallerydl→64.bg,
  forumdl→65.bg, papersdl→66.bg
- Step 7 (URL extraction): parse_* hooks moved to 70-75

Background hooks (.bg suffix) don't block step advancement, enabling
long-running downloads to continue while other hooks proceed.
2025-12-28 13:47:25 +00:00
Nick Sweeting
50e527ec65 way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Claude
741c098a2b Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh 2025-12-27 05:53:06 +00:00
Nick Sweeting
2f81c0cc76 add overrides options to binproviders 2025-12-26 20:39:56 -08:00
Claude
ae2ab5b273 Add Python 3.13 support with uuid7 backport compatibility
- Create uuid_compat.py module that provides uuid7 for Python <3.14
  using uuid_extensions package, and native uuid.uuid7 for Python 3.14+
- Update all model files and migrations to use archivebox.uuid_compat
- Add uuid7 conditional dependency in pyproject.toml for Python <3.14
- Update requires-python to >=3.13 (from >=3.14)
- Update GitHub workflows, lock_pkgs.sh to use Python 3.13
- Update tool configs (ruff, pyright, uv) for Python 3.13

This enables running ArchiveBox on Python 3.13 while maintaining
forward compatibility with Python 3.14's native uuid7 support.
2025-12-27 01:07:30 +00:00
Nick Sweeting
bb53228ebf remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
866f993f26 logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186 remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c wip 2 2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00
Nick Sweeting
c1335fed37 Remove ABID system and KVTag model - use UUIDv7 IDs exclusively
This commit completes the simplification of the ID system by:

- Removing the ABID (ArchiveBox ID) system entirely
- Removing the base_models/abid.py file
- Removing KVTag model in favor of the existing Tag model in core/models.py
- Simplifying all models to use standard UUIDv7 primary keys
- Removing ABID-related admin functionality
- Cleaning up commented-out ABID code from views and statemachines
- Deleting migration files for ABID field removal (no longer needed)

All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)`

Note: Old migrations containing ABID references are preserved for database
migration history compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 06:13:49 -08:00
Nick Sweeting
651ba0b11c add new Process model to Machine models 2024-12-12 21:45:55 -08:00
Nick Sweeting
569081a9eb rename abid_utils to base_models 2024-11-18 19:40:05 -08:00
Nick Sweeting
c8e186f21b fix plugin loading order, admin, abx-pkg 2024-11-16 06:44:12 -08:00
Nick Sweeting
b3c1cb716e move abx plugins inside vendor dir 2024-10-28 04:07:35 -07:00
Nick Sweeting
80d8a6b667 split archivebox.use into archivebox.reads and archivebox.writes 2024-10-15 01:03:01 -07:00
Nick Sweeting
aaf069fab0 remove tags field from Machine admin 2024-10-15 01:02:13 -07:00
Nick Sweeting
01ba6d49d3 new vastly simplified plugin spec without pydantic 2024-10-14 21:50:47 -07:00
Nick Sweeting
c0b7887fd7 fix admin registration using abx hooks 2024-10-14 17:38:38 -07:00
Nick Sweeting
f75ae805f8 comment out Crawl api methods temporarily 2024-10-14 15:41:58 -07:00
Nick Sweeting
536a5ea745 clear up Machine models cache vars 2024-10-14 15:38:32 -07:00
Nick Sweeting
6e7071bd19 add new binproviders and binaries args to install and version, bump pydantic-pkgr version 2024-10-11 00:45:59 -07:00
Nick Sweeting
73e69ccb8b fixes for docs generation 2024-10-04 19:16:46 -07:00
Nick Sweeting
12f32c4690 fix tmp data dir resolution when running help or version outside data dir 2024-10-04 01:40:41 -07:00
Nick Sweeting
aae9deccc2 add machine migrations 2024-10-03 04:06:28 -07:00
Nick Sweeting
b072fd8ef4 load all binaries from cache by default 2024-10-03 04:06:17 -07:00
Nick Sweeting
490e5ba11d fallback to localhost if detecting dnsserver fails 2024-10-03 03:53:50 -07:00
Nick Sweeting
161afc7297 add health stats counters to machine models 2024-10-03 03:11:04 -07:00
Nick Sweeting
e315905721 add new InstalledBinary model to cache binaries on host machine 2024-10-03 03:10:22 -07:00
Nick Sweeting
f46d62a114 add py-machineid lib for new machine app 2024-10-01 21:46:35 -07:00