60 Commits

Author SHA1 Message Date
Nick Sweeting
dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting
c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting
2e350d317d fix initial migrtaions 2025-12-29 21:27:31 -08:00
Nick Sweeting
f4e7820533 use full dotted paths for all archivebox imports, add migrations and more fixes 2025-12-29 00:47:08 -08:00
Nick Sweeting
f0aa19fa7d wip 2025-12-28 17:51:54 -08:00
Nick Sweeting
bd265c0083 rename extractor to plugin everywhere 2025-12-28 04:43:15 -08:00
Claude
c3acadd528 Remove extractor field from Crawl model and fix tests
- Remove extractor field from Crawl model (moved to config dict)
- Update migration 0002_drop_seed_model to not add extractor
- Update archivebox_add.py to use config['PARSER'] instead
- Update admin.py recrawl to not pass extractor
- Update jsonl.py serialization to not include extractor
- Update test schema SCHEMA_0_8 to not include extractor
- Set default timeout to 60s for test commands
2025-12-27 01:49:09 +00:00
Nick Sweeting
bb53228ebf remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
866f993f26 logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186 remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
1915333b81 wip major changes 2025-12-24 20:10:38 -08:00
Nick Sweeting
b948e49013 add urls log to Crawl model 2024-11-19 06:32:33 -08:00
Nick Sweeting
6740202d78 fix cli loading edge case where setup_django wasnt running when it should 2024-11-19 04:20:00 -08:00
Nick Sweeting
0347b911aa archivebox add and remove CLI cmds 2024-11-19 03:40:01 -08:00
Nick Sweeting
328eb98a38 move main funcs into cli files and switch to using click for CLI 2024-11-19 00:18:51 -08:00
Nick Sweeting
569081a9eb rename abid_utils to base_models 2024-11-18 19:40:05 -08:00
Nick Sweeting
65afd405b1 merge seeds and crawls apps 2024-11-18 19:23:14 -08:00
Nick Sweeting
4a5d607296 move logging_util into archivebox.misc subfolder 2024-11-18 19:08:49 -08:00
Nick Sweeting
e469c5a344 merge queues and actors apps into new workers app 2024-11-18 18:52:48 -08:00
Nick Sweeting
0acd388c02 fix imports and deps 2024-11-18 18:07:34 -08:00
Nick Sweeting
eeb2671e4d API improvements 2024-11-18 04:27:38 -08:00
Nick Sweeting
1e3ce67834 fix API and CLU calls 2024-11-18 04:27:38 -08:00
Nick Sweeting
b4a5da3ffd update archivebox add CLI command to use new actor system 2024-11-16 02:45:37 -08:00
Nick Sweeting
cf1ea8f80f improve config loading of TMP_DIR, LIB_DIR, move to separate files 2024-10-07 23:45:11 -07:00
Nick Sweeting
b913e6f426 rename OUTPUT_DIR to DATA_DIR 2024-09-30 17:44:18 -07:00
Nick Sweeting
363a499289 move util.py into misc folder 2024-09-30 17:25:15 -07:00
Nick Sweeting
3e5b6ddeae move config into dedicated global app 2024-09-30 15:59:05 -07:00
Nick Sweeting
8cfe6f4afb cleanup update flag handling and show better logging to clarify when its working 2022-05-09 20:15:55 -07:00
Nick Sweeting
36f0646501 Merge pull request #669 from FliegendeWurst/fix-issue-235
add command: --parser option (fixes #235)
2021-03-31 00:53:47 -04:00
Nick Sweeting
2656e59215 change list style 2021-03-31 00:47:42 -04:00
FliegendeWurst
60bd9a902e add command: --parser option 2021-03-28 10:09:11 +02:00
Nick Sweeting
fea0b89dbe add tag cli option 2021-03-27 03:57:05 -04:00
Nick Sweeting
49939f3eaa only accept stdin if args are not passed, fix stdin hang in docker 2021-02-16 01:20:47 -05:00
Nick Sweeting
9fa70b3452 add extractors arg to oneshot command and bump version to v0.5.1 2020-12-11 15:48:46 +02:00
Nick Sweeting
257d3f2a98 Update archivebox/cli/archivebox_add.py 2020-11-13 14:52:21 -05:00
Cristian
54df0a035b fix: Move csv split to the add function to avoid optional nullable argument 2020-11-13 13:10:17 -05:00
Cristian
1ec8276514 fix: Use a comma separated input instead of nargs for the extract flag 2020-11-13 13:01:11 -05:00
Cristian
44eede96e5 feat: Add extract flag to add command 2020-11-13 09:24:34 -05:00
Nick Sweeting
718d39e242 add common code extensions to default blacklist 2020-08-18 08:12:10 -04:00
Nick Sweeting
b681a477ae add overwrite flag to add command to force re-archiving 2020-08-18 04:37:54 -04:00
Cristian
6006b4f93b refactor: Organize code to remove flake8 issues 2020-07-24 12:25:25 -05:00
Cristian
a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Cristian
f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00
Nick Sweeting
d3bfa98a91 fix depth flag and tweak logging 2020-07-13 11:26:34 -04:00
Cristian
4ebf929606 refactor: Change wording on CLI help 2020-07-08 08:30:07 -05:00
Cristian
f12bfeb322 refactor: Change add() to receive url and depth instead of import_str and import_path 2020-07-08 08:17:47 -05:00
Cristian
c1d8a74e4f feat: Make input sent via stdin behave the same as using args 2020-07-07 15:49:40 -05:00
Cristian
b68c13918f feat: Disable stdin from archivebox add 2020-07-07 12:39:36 -05:00
Cristian
a6940092bb feat: Make sure that depth can only be either 1 or 0 2020-07-07 10:25:02 -05:00
Cristian
32e790979e feat: Enable depth=1 functionality 2020-07-07 10:07:44 -05:00