Commit Graph

40 Commits

Author SHA1 Message Date
Nick Sweeting
b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting
bc21d4bfdb type and test fixes 2026-03-15 20:12:27 -07:00
Nick Sweeting
70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting
ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting
456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting
c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting
60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
claude[bot]
762cddc8c5 fix: address PR review comments from cubic-dev-ai
- Add JSONL_INDEX_FILENAME to ALLOWED_IN_DATA_DIR for consistency
- Fix fallback logic in legacy.py to try JSON when JSONL parsing fails
- Replace bare except clauses with specific exception types
- Fix stdin double-consumption in archivebox_crawl.py
- Merge CLI --tag option with crawl tags in archivebox_snapshot.py
- Remove tautological mock tests (covered by integration tests)

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-30 20:09:51 +00:00
Claude
d36079829b feat: replace index.json with index.jsonl flat JSONL format
Switch from hierarchical index.json to flat index.jsonl format for
snapshot metadata storage. Each line is a self-contained JSON record
with a 'type' field (Snapshot, ArchiveResult, Binary, Process).

Changes:
- Add JSONL_INDEX_FILENAME constant to constants.py
- Add TYPE_PROCESS and TYPE_MACHINE to jsonl.py type constants
- Add binary_to_jsonl(), process_to_jsonl(), machine_to_jsonl() converters
- Add Snapshot.write_index_jsonl() to write new format
- Add Snapshot.read_index_jsonl() to read new format
- Add Snapshot.convert_index_json_to_jsonl() for migration
- Update Snapshot.reconcile_with_index() to handle both formats
- Update fs_migrate to convert during filesystem migration
- Update load_from_directory/create_from_directory for both formats
- Update legacy.py parse_json_links_details for JSONL support

The new format is easier to parse, extend, and mix record types.
2025-12-30 18:21:06 +00:00
Nick Sweeting
bb53228ebf remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
4a5d607296 move logging_util into archivebox.misc subfolder 2024-11-18 19:08:49 -08:00
Nick Sweeting
60f0458c77 rename configfile to collection 2024-10-24 15:40:24 -07:00
Nick Sweeting
a211461ffc fix LIB_DIR and TMP_DIR loading when primary option isnt available 2024-10-21 00:35:56 -07:00
Nick Sweeting
613caec8eb improve install flow with sudo, check package managers, and fix docker build 2024-10-09 00:41:16 -07:00
Nick Sweeting
9f274cf9f4 remove platformdirs dependency 2024-10-08 19:17:18 -07:00
Nick Sweeting
4b34b729ab fuck it go back to nested lib and tmp dirs with supervisord sock workaround 2024-10-08 17:48:59 -07:00
Nick Sweeting
35c7019772 handle failure on tmp_dir and lib_dir detection better 2024-10-08 16:56:25 -07:00
Nick Sweeting
216e885b85 bump pydantic-pkgr 2024-10-08 03:53:41 -07:00
Nick Sweeting
de2ab43f7f switch .is_dir and .exists for os.access to avoid PermissionError on startup 2024-10-08 03:02:34 -07:00
Nick Sweeting
cf1ea8f80f improve config loading of TMP_DIR, LIB_DIR, move to separate files 2024-10-07 23:45:11 -07:00
Nick Sweeting
db10a2142e remove extra files from repo root and move package.json into etc 2024-10-05 03:53:23 -07:00
Nick Sweeting
66a785bb35 only use system tmp dirs because of socket path length restrictions 2024-10-05 03:16:27 -07:00
Nick Sweeting
35446ce742 include sonic-client by default and allow ldap to be installed at runtime 2024-10-05 03:11:48 -07:00
Nick Sweeting
ce2e19a429 switch to uv builds and rc1 versioning system 2024-10-04 23:48:25 -07:00
Nick Sweeting
ac96cc62fc fix CUSTOM_TEMPLATES_DIR loading 2024-10-04 21:40:36 -07:00
Nick Sweeting
0c7d7a2225 fix archivebox init colors and dir status checking 2024-10-04 21:34:19 -07:00
Nick Sweeting
d747cf7f31 fix SYSTEM_TMP_DIR and SYSTEM_LIB_DIR in docker 2024-10-04 21:03:02 -07:00
Nick Sweeting
811f9a8d93 move queue db name into constants and fix file detection at startup 2024-10-04 19:38:36 -07:00
Nick Sweeting
396a7ffcd8 move tmp dir to machine-id scoped dir 2024-10-04 03:24:15 -07:00
Nick Sweeting
12f32c4690 fix tmp data dir resolution when running help or version outside data dir 2024-10-04 01:40:41 -07:00
Nick Sweeting
152b530249 scope LIB_DIR by os, arch, and docker status 2024-10-04 00:08:44 -07:00
Nick Sweeting
b36e89d086 relocate LIB_DIR and TMP_DIR inside docker so it doesnt clash with outside docker 2024-10-03 03:43:02 -07:00
Nick Sweeting
18474f452b move config moved out of legacy files and better version output 2024-09-30 23:52:00 -07:00
Nick Sweeting
66cd711df9 improve version detection 2024-09-30 18:12:48 -07:00
Nick Sweeting
b913e6f426 rename OUTPUT_DIR to DATA_DIR 2024-09-30 17:44:18 -07:00
Nick Sweeting
363a499289 move util.py into misc folder 2024-09-30 17:25:15 -07:00
Nick Sweeting
dfca4b13b2 move system.py into misc folder 2024-09-30 17:13:55 -07:00
Nick Sweeting
7a41b6ae46 remove ConfigSectionName and add type hints to CONSTANTS 2024-09-30 16:50:36 -07:00
Nick Sweeting
3e5b6ddeae move config into dedicated global app 2024-09-30 15:59:05 -07:00