Commit Graph

129 Commits

Author SHA1 Message Date
Nick Sweeting
bd265c0083 rename extractor to plugin everywhere 2025-12-28 04:43:15 -08:00
Nick Sweeting
50e527ec65 way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Claude
3d985fa8c8 Implement hook architecture with JSONL output support
Phase 1: Database migration for new ArchiveResult fields
- Add output_str (TextField) for human-readable summary
- Add output_json (JSONField) for structured metadata
- Add output_files (JSONField) for dict of {relative_path: {}}
- Add output_size (BigIntegerField) for total bytes
- Add output_mimetypes (CharField) for CSV of mimetypes
- Add binary FK to InstalledBinary (optional)
- Migrate existing 'output' field to new split fields

Phase 3: Update run_hook() for JSONL parsing
- Support new JSONL format (any line with {type: 'ModelName', ...})
- Maintain backwards compatibility with RESULT_JSON= format
- Add plugin metadata to each parsed record
- Detect background hooks with .bg. suffix in filename
- Add find_binary_for_cmd() helper function
- Add create_model_record() for processing side-effect records

Phase 6: Update ArchiveResult.run()
- Handle background hooks (return immediately when result is None)
- Process 'records' from HookResult for side-effect models
- Use new output fields (output_str, output_json, output_files, etc.)
- Call create_model_record() for InstalledBinary, Machine updates

Phase 7: Add background hook support
- Add is_background_hook() method to ArchiveResult
- Add check_background_completed() to check if process exited
- Add finalize_background_hook() to collect results from completed hooks
- Update SnapshotMachine.is_finished() to check/finalize background hooks
- Update _populate_output_fields() to walk directory and populate stats

Also updated references to old 'output' field in:
- admin_archiveresults.py
- statemachines.py
- templatetags/core_tags.py
2025-12-27 08:38:49 +00:00
Nick Sweeting
35dd9acafe implement fs_version migrations 2025-12-27 00:25:35 -08:00
Claude
766bb28536 Fix migration tests and M2M field alteration issue
- Remove M2M tags field alteration from migration 0027 (Django doesn't support altering M2M fields via migration)
- Add machine app tables to 0.8.x test schema
- Add missing columns (config, num_uses_failed, num_uses_succeeded) to 0.8.x test schema
- Skip 0.8.x migration tests due to complex migration state dependencies with machine app
- All 15 0.7.x migration tests now pass
- Merge dev branch and resolve pyproject.toml conflict (keep both uuid7 and gallery-dl deps)
2025-12-27 03:00:44 +00:00
Claude
ae2ab5b273 Add Python 3.13 support with uuid7 backport compatibility
- Create uuid_compat.py module that provides uuid7 for Python <3.14
  using uuid_extensions package, and native uuid.uuid7 for Python 3.14+
- Update all model files and migrations to use archivebox.uuid_compat
- Add uuid7 conditional dependency in pyproject.toml for Python <3.14
- Update requires-python to >=3.13 (from >=3.14)
- Update GitHub workflows, lock_pkgs.sh to use Python 3.13
- Update tool configs (ruff, pyright, uv) for Python 3.13

This enables running ArchiveBox on Python 3.13 while maintaining
forward compatibility with Python 3.14's native uuid7 support.
2025-12-27 01:07:30 +00:00
Nick Sweeting
4fd7fcdbcf new gallerydl plugin and more 2025-12-26 11:55:03 -08:00
Nick Sweeting
9838d7ba02 tons of ui fixes and plugin fixes 2025-12-25 03:59:51 -08:00
Nick Sweeting
866f993f26 logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186 remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c wip 2 2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00
Nick Sweeting
c1335fed37 Remove ABID system and KVTag model - use UUIDv7 IDs exclusively
This commit completes the simplification of the ID system by:

- Removing the ABID (ArchiveBox ID) system entirely
- Removing the base_models/abid.py file
- Removing KVTag model in favor of the existing Tag model in core/models.py
- Simplifying all models to use standard UUIDv7 primary keys
- Removing ABID-related admin functionality
- Cleaning up commented-out ABID code from views and statemachines
- Deleting migration files for ABID field removal (no longer needed)

All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)`

Note: Old migrations containing ABID references are preserved for database
migration history compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 06:13:49 -08:00
dish
9ca66c6a2b fix syntax error in archivebox/core/models.py 2024-12-18 18:17:17 -05:00
Nick Sweeting
2a1afcf6c2 move crawl models back into dedicated app 2024-12-12 21:45:55 -08:00
Nick Sweeting
bd5dd2f949 clearer core models separation of concerns using new basemodels 2024-12-12 21:45:53 -08:00
Nick Sweeting
ac53fdf677 make chrome binary and configs directly runnable and make extractor use external bin 2024-12-06 02:06:39 -08:00
Nick Sweeting
1ceaa1ac7a add ABID model check and fix model inheritance 2024-12-03 02:14:21 -08:00
Nick Sweeting
b948e49013 add urls log to Crawl model 2024-11-19 06:32:33 -08:00
Nick Sweeting
c9a05c9d94 working archivebox update CLI cmd 2024-11-19 02:32:05 -08:00
Nick Sweeting
569081a9eb rename abid_utils to base_models 2024-11-18 19:40:05 -08:00
Nick Sweeting
e469c5a344 merge queues and actors apps into new workers app 2024-11-18 18:52:48 -08:00
Nick Sweeting
0acd388c02 fix imports and deps 2024-11-18 18:07:34 -08:00
Nick Sweeting
385ccaa14d extend core models with ModelWithOutputDir 2024-11-18 04:27:38 -08:00
Nick Sweeting
1ec2753664 fix statemachine create_root_snapshot and retry timing 2024-11-18 04:27:37 -08:00
Nick Sweeting
c8e186f21b fix plugin loading order, admin, abx-pkg 2024-11-16 06:44:12 -08:00
Nick Sweeting
ba26d75079 add notes and label fields, fix model getters 2024-11-16 02:47:35 -08:00
Nick Sweeting
a9a3b153b1 more StateMachine, Actor, and Orchestrator improvements 2024-11-04 07:08:39 -08:00
Nick Sweeting
48f8416762 add new core and crawsl statemachine manager 2024-11-03 00:41:11 -07:00
Nick Sweeting
02a1fc3049 rename sessions app in INSTALLED_APPS to personas 2024-10-21 00:37:57 -07:00
Nick Sweeting
59b669691f fix Admin data view for Config to render both sections and individual values 2024-10-14 17:39:14 -07:00
Nick Sweeting
1d7f0ab20d fix Tag creation via admin erroring because slug field is not filled 2024-10-14 17:38:53 -07:00
Nick Sweeting
c0b7887fd7 fix admin registration using abx hooks 2024-10-14 17:38:38 -07:00
Nick Sweeting
a0bef4e27b move crawl model out of core 2024-10-14 15:42:36 -07:00
Nick Sweeting
de2ab43f7f switch .is_dir and .exists for os.access to avoid PermissionError on startup 2024-10-08 03:02:34 -07:00
Nick Sweeting
e315905721 add new InstalledBinary model to cache binaries on host machine 2024-10-03 03:10:22 -07:00
Nick Sweeting
295c5c46e0 add new crawl model 2024-10-01 21:47:16 -07:00
Nick Sweeting
363a499289 move util.py into misc folder 2024-09-30 17:25:15 -07:00
Nick Sweeting
dfca4b13b2 move system.py into misc folder 2024-09-30 17:13:55 -07:00
Nick Sweeting
3e5b6ddeae move config into dedicated global app 2024-09-30 15:59:05 -07:00
Nick Sweeting
d8a9dca0f6 use constants in more places 2024-09-26 02:38:45 -07:00
Nick Sweeting
e99260feb2 fix rich logging issues 2024-09-24 21:17:07 -07:00
Nick Sweeting
b56b1cac35 cleanup plugantic and pkg apps, make BaseHook actually create its own settings 2024-09-06 01:48:18 -07:00
Nick Sweeting
2e1e1945f2 add django-object-actions to provide Regenerate ABID button 2024-09-05 23:19:21 -07:00
Nick Sweeting
44669fab73 add BaseHook concept to underlie all Plugin hooks 2024-09-05 03:36:18 -07:00
Nick Sweeting
cbf2a8fdc3 rename datetime fields to _at, massively improve ABID generation safety and determinism 2024-09-04 23:42:36 -07:00
Nick Sweeting
68a39b7392 remove .old_id entirely and make ABID generation only happen once on initial save 2024-09-04 16:40:15 -07:00
Nick Sweeting
4427869ae8 fix ABID generation by chopping ts_src precision to consistent length 2024-09-04 02:02:29 -07:00
Nick Sweeting
ae13f1811f better ABID display in admin UI 2024-09-03 17:11:10 -07:00
Nick Sweeting
1e73a06ba0 change ABIDModel.created to use AutoTimeField seeded on .save instead of auto_now_add so that ts_src for ABID is available on creation before DB row is created 2024-08-28 03:02:37 -07:00