Commit Graph

61 Commits

Author SHA1 Message Date
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Nick Sweeting
4ccb0863bb continue renaming extractor to plugin, add plan for hook concurrency, add chrome kill helper script 2025-12-28 05:29:24 -08:00
Nick Sweeting
bd265c0083 rename extractor to plugin everywhere 2025-12-28 04:43:15 -08:00
Nick Sweeting
50e527ec65 way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Claude
b632894bc9 Update views, API, and exports for new ArchiveResult output fields
Replace old `output` field with new fields across the codebase:
- output_str: Human-readable output summary
- output_json: Structured metadata (optional)
- output_files: Dict of output files with metadata
- output_size: Total size in bytes
- output_mimetypes: CSV of file mimetypes

Files updated:
- api/v1_core.py: Update MinimalArchiveResultSchema to expose new fields
- api/v1_core.py: Update ArchiveResultFilterSchema to search output_str
- cli/archivebox_extract.py: Use output_str in CLI output
- core/admin_archiveresults.py: Update admin fields, search, and fieldsets
- core/admin_archiveresults.py: Fix output_html variable name bug in output_summary
- misc/jsonl.py: Update archiveresult_to_jsonl() to include new fields
- plugins/extractor_utils.py: Update ExtractorResult helper class

The embed_path() method already uses output_files and output_str,
so snapshot detail page and template tags work correctly.
2025-12-27 20:28:22 +00:00
Claude
c3acadd528 Remove extractor field from Crawl model and fix tests
- Remove extractor field from Crawl model (moved to config dict)
- Update migration 0002_drop_seed_model to not add extractor
- Update archivebox_add.py to use config['PARSER'] instead
- Update admin.py recrawl to not pass extractor
- Update jsonl.py serialization to not include extractor
- Update test schema SCHEMA_0_8 to not include extractor
- Set default timeout to 60s for test commands
2025-12-27 01:49:09 +00:00
Nick Sweeting
4fd7fcdbcf new gallerydl plugin and more 2025-12-26 11:55:03 -08:00
Nick Sweeting
9838d7ba02 tons of ui fixes and plugin fixes 2025-12-25 03:59:51 -08:00
Nick Sweeting
bb53228ebf remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
866f993f26 logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186 remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c wip 2 2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00
Ben Muthalaly
71c02ca4eb Update archivebox/misc/logging_util.py
Co-authored-by: Nick Sweeting <git@sweeting.me>
2025-02-05 17:55:45 -06:00
Ben Muthalaly
9f4cf0a8e1 Kill the timer process if it doesn't properly terminate. 2025-02-03 02:47:33 -06:00
Nick Sweeting
c5fc4068f4 fix unneeded import 2024-12-18 18:09:21 -08:00
Nick Sweeting
7975b47c85 remove dependencies on unneeded libraries 2024-12-18 18:07:35 -08:00
Nick Sweeting
d192eb5c48 add filestore content addressible store draft 2024-12-04 02:15:04 -08:00
Nick Sweeting
a3fe78afaa add basename to hashing get_dir_info 2024-12-04 02:15:04 -08:00
Nick Sweeting
eae7ed8447 add hashing misc library for merkle tree generation 2024-12-03 02:12:20 -08:00
Nick Sweeting
2595139180 improve statemachine logging and archivebox update CLI cmd 2024-11-19 03:31:05 -08:00
Nick Sweeting
c9a05c9d94 working archivebox update CLI cmd 2024-11-19 02:32:05 -08:00
Nick Sweeting
328eb98a38 move main funcs into cli files and switch to using click for CLI 2024-11-19 00:18:51 -08:00
Nick Sweeting
4c25e90378 move monkey_patches.py into archivebox.misc subfolder 2024-11-18 19:10:42 -08:00
Nick Sweeting
4a5d607296 move logging_util into archivebox.misc subfolder 2024-11-18 19:08:49 -08:00
Nick Sweeting
b3c1cb716e move abx plugins inside vendor dir 2024-10-28 04:07:35 -07:00
Nick Sweeting
4b6f08b0fe swap more direct settings.CONFIG access to abx getters 2024-10-24 15:42:19 -07:00
Nick Sweeting
60f0458c77 rename configfile to collection 2024-10-24 15:40:24 -07:00
Nick Sweeting
657eec479b fix CONSTANTS.LIB_DIR old style access 2024-10-21 03:20:20 -07:00
Nick Sweeting
b3107ab830 move final legacy config to plugins and fix archivebox config cmd and add search opt 2024-10-21 02:56:00 -07:00
Nick Sweeting
a211461ffc fix LIB_DIR and TMP_DIR loading when primary option isnt available 2024-10-21 00:35:56 -07:00
Nick Sweeting
bb9c3fda14 fix makemigrations being blocked by check_migrations func 2024-10-14 17:40:06 -07:00
Nick Sweeting
9a04ed7c76 move serve_static and shell_welcome_message into misc 2024-10-14 17:35:28 -07:00
Nick Sweeting
f75ae805f8 comment out Crawl api methods temporarily 2024-10-14 15:41:58 -07:00
Nick Sweeting
2f68a1d476 fix ldap lib loading after apt install 2024-10-09 04:03:02 -07:00
Nick Sweeting
afc24e802a tweak version log output 2024-10-09 03:18:22 -07:00
Nick Sweeting
613caec8eb improve install flow with sudo, check package managers, and fix docker build 2024-10-09 00:41:16 -07:00
Nick Sweeting
9f274cf9f4 remove platformdirs dependency 2024-10-08 19:17:18 -07:00
Nick Sweeting
3e4a846488 fix more installer bugs 2024-10-08 18:06:57 -07:00
Nick Sweeting
4b34b729ab fuck it go back to nested lib and tmp dirs with supervisord sock workaround 2024-10-08 17:48:59 -07:00
Nick Sweeting
35c7019772 handle failure on tmp_dir and lib_dir detection better 2024-10-08 16:56:25 -07:00
Nick Sweeting
de2ab43f7f switch .is_dir and .exists for os.access to avoid PermissionError on startup 2024-10-08 03:02:34 -07:00
Nick Sweeting
611a2b7c1b fix a few small nits 2024-10-08 02:10:08 -07:00
Nick Sweeting
46c0463539 safer import handling 2024-10-08 00:51:58 -07:00
Nick Sweeting
cf1ea8f80f improve config loading of TMP_DIR, LIB_DIR, move to separate files 2024-10-07 23:45:11 -07:00
Nick Sweeting
0c7d7a2225 fix archivebox init colors and dir status checking 2024-10-04 21:34:19 -07:00
Nick Sweeting
da274fd8e8 remove dead code 2024-10-04 14:48:20 -07:00
Nick Sweeting
12f32c4690 fix tmp data dir resolution when running help or version outside data dir 2024-10-04 01:40:41 -07:00
Nick Sweeting
035a14b6ea better help text output 2024-10-02 19:46:31 -07:00
Nick Sweeting
18474f452b move config moved out of legacy files and better version output 2024-09-30 23:52:00 -07:00