mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-01-02 17:05:38 +10:00
- Rename archivebox/plugins/media/ → archivebox/plugins/ytdlp/ - Rename hook script on_Snapshot__63_media.bg.py → on_Snapshot__63_ytdlp.bg.py - Update config.json: YTDLP_* as primary keys, MEDIA_* as x-aliases - Update templates CSS classes: media-* → ytdlp-* - Fix gallerydl bug: remove incorrect dependency on media plugin output - Update all codebase references to use YTDLP_* and SAVE_YTDLP - Add backwards compatibility test for MEDIA_ENABLED alias
533 lines
19 KiB
Markdown
533 lines
19 KiB
Markdown
# ArchiveBox Hook Script Concurrency & Execution Plan
|
|
|
|
## Overview
|
|
|
|
Snapshot.run() should enforce that snapshot hooks are run in **10 discrete, sequential "steps"**: `0*`, `1*`, `2*`, `3*`, `4*`, `5*`, `6*`, `7*`, `8*`, `9*`.
|
|
|
|
For every discovered hook script, ArchiveBox should create an ArchiveResult in `queued` state, then manage running them using `retry_at` and inline logic to enforce this ordering.
|
|
|
|
## Design Decisions
|
|
|
|
### ArchiveResult Schema
|
|
- Add `ArchiveResult.hook_name` (CharField, nullable) - just filename, e.g., `'on_Snapshot__20_chrome_tab.bg.js'`
|
|
- Keep `ArchiveResult.plugin` - still important (plugin directory name)
|
|
- Step number derived on-the-fly from `hook_name` via `extract_step(hook_name)` - not stored
|
|
|
|
### Snapshot Schema
|
|
- Add `Snapshot.current_step` (IntegerField 0-9, default=0)
|
|
- Integrate with `SnapshotMachine` state transitions for step advancement
|
|
|
|
### Hook Discovery & Execution
|
|
- `Snapshot.run()` discovers all hooks upfront, creates one AR per hook with `hook_name` set
|
|
- All ARs for a given step can be claimed and executed in parallel by workers
|
|
- Workers claim ARs where `extract_step(ar.hook_name) <= snapshot.current_step`
|
|
- `Snapshot.advance_step_if_ready()` increments `current_step` when:
|
|
- All **foreground** hooks in current step are finished (SUCCEEDED/FAILED/SKIPPED)
|
|
- Background hooks don't block advancement (they continue running)
|
|
- Called from `SnapshotMachine` state transitions
|
|
|
|
### ArchiveResult.run() Behavior
|
|
- If `self.hook_name` is set: run that single hook
|
|
- If `self.hook_name` is None: discover all hooks for `self.plugin` and run sequentially
|
|
- Background hooks detected by `.bg.` in filename (e.g., `on_Snapshot__20_chrome_tab.bg.js`)
|
|
- Background hooks return immediately (ArchiveResult stays in STARTED state)
|
|
- Foreground hooks wait for completion, update status from JSONL output
|
|
|
|
### Hook Execution Flow
|
|
1. **Within a step**: Workers claim all ARs for current step in parallel
|
|
2. **Foreground hooks** (no .bg): ArchiveResult waits for completion, transitions to SUCCEEDED/FAILED/SKIPPED
|
|
3. **Background hooks** (.bg): ArchiveResult transitions to STARTED, hook continues running
|
|
4. **Step advancement**: `Snapshot.advance_step_if_ready()` checks:
|
|
- Are all foreground ARs in current step finished? (SUCCEEDED/FAILED/SKIPPED)
|
|
- Ignore ARs still in STARTED (background hooks)
|
|
- If yes, increment `current_step`
|
|
5. **Snapshot sealing**: When `current_step=9` and all foreground hooks done, kill background hooks via `Snapshot.cleanup()`
|
|
|
|
### Unnumbered Hooks
|
|
- Extract step via `re.search(r'__(\d{2})_', hook_name)`, default to 9 if no match
|
|
- Log warning for unnumbered hooks
|
|
- Purely runtime derivation - no stored field
|
|
|
|
## Hook Numbering Convention
|
|
|
|
Hooks scripts are numbered `00` to `99` to control:
|
|
- **First digit (0-9)**: Which step they are part of
|
|
- **Second digit (0-9)**: Order within that step
|
|
|
|
Hook scripts are launched **strictly sequentially** based on their filename alphabetical order, and run in sets of several per step before moving on to the next step.
|
|
|
|
**Naming Format:**
|
|
```
|
|
on_{ModelName}__{run_order}_{human_readable_description}[.bg].{ext}
|
|
```
|
|
|
|
**Examples:**
|
|
```
|
|
on_Snapshot__00_this_would_run_first.sh
|
|
on_Snapshot__05_start_ytdlp_download.bg.sh
|
|
on_Snapshot__10_chrome_tab_opened.js
|
|
on_Snapshot__50_screenshot.js
|
|
on_Snapshot__53_media.bg.py
|
|
```
|
|
|
|
## Background (.bg) vs Foreground Scripts
|
|
|
|
### Foreground Scripts (no .bg suffix)
|
|
- Launch in parallel with other hooks in their step
|
|
- Step waits for all foreground hooks to complete or timeout
|
|
- Get killed with SIGTERM if they exceed their `PLUGINNAME_TIMEOUT`
|
|
- Step advances when all foreground hooks finish
|
|
|
|
### Background Scripts (.bg suffix)
|
|
- Launch in parallel with other hooks in their step
|
|
- Do NOT block step progression - step can advance while they run
|
|
- Continue running across step boundaries until complete or timeout
|
|
- Get killed with SIGTERM when Snapshot transitions to SEALED (via `Snapshot.cleanup()`)
|
|
- Should exit naturally when work is complete (best case)
|
|
|
|
**Important:** A .bg script started in step 2 can keep running through steps 3, 4, 5... until the Snapshot seals or the hook exits naturally.
|
|
|
|
## Execution Step Guidelines
|
|
|
|
These are **naming conventions and guidelines**, not enforced checkpoints. They provide semantic organization for plugin ordering:
|
|
|
|
### Step 0: Pre-Setup
|
|
```
|
|
00-09: Initial setup, validation, feature detection
|
|
```
|
|
|
|
### Step 1: Chrome Launch & Tab Creation
|
|
```
|
|
10-19: Browser/tab lifecycle setup
|
|
- Chrome browser launch
|
|
- Tab creation and CDP connection
|
|
```
|
|
|
|
### Step 2: Navigation & Settlement
|
|
```
|
|
20-29: Page loading and settling
|
|
- Navigate to URL
|
|
- Wait for page load
|
|
- Initial response capture (responses, ssl, consolelog as .bg listeners)
|
|
```
|
|
|
|
### Step 3: Page Adjustment
|
|
```
|
|
30-39: DOM manipulation before archiving
|
|
- Hide popups/banners
|
|
- Solve captchas
|
|
- Expand comments/details sections
|
|
- Inject custom CSS/JS
|
|
- Accessibility modifications
|
|
```
|
|
|
|
### Step 4: Ready for Archiving
|
|
```
|
|
40-49: Final pre-archiving checks
|
|
- Verify page is fully adjusted
|
|
- Wait for any pending modifications
|
|
```
|
|
|
|
### Step 5: DOM Extraction (Sequential, Non-BG)
|
|
```
|
|
50-59: Extractors that need exclusive DOM access
|
|
- singlefile (MUST NOT be .bg)
|
|
- screenshot (MUST NOT be .bg)
|
|
- pdf (MUST NOT be .bg)
|
|
- dom (MUST NOT be .bg)
|
|
- title
|
|
- headers
|
|
- readability
|
|
- mercury
|
|
|
|
These MUST run sequentially as they temporarily modify the DOM
|
|
during extraction, then revert it. Running in parallel would corrupt results.
|
|
```
|
|
|
|
### Step 6: Post-DOM Extraction
|
|
```
|
|
60-69: Extractors that don't need DOM or run on downloaded files
|
|
- wget
|
|
- git
|
|
- media (.bg - can run for hours)
|
|
- gallerydl (.bg)
|
|
- forumdl (.bg)
|
|
- papersdl (.bg)
|
|
```
|
|
|
|
### Step 7: Chrome Cleanup
|
|
```
|
|
70-79: Browser/tab teardown
|
|
- Close tabs
|
|
- Cleanup Chrome resources
|
|
```
|
|
|
|
### Step 8: Post-Processing
|
|
```
|
|
80-89: Reprocess outputs from earlier extractors
|
|
- OCR of images
|
|
- Audio/video transcription
|
|
- URL parsing from downloaded content (rss, html, json, txt, csv, md)
|
|
- LLM analysis/summarization of outputs
|
|
```
|
|
|
|
### Step 9: Indexing & Finalization
|
|
```
|
|
90-99: Save to indexes and finalize
|
|
- Index text content to Sonic/SQLite FTS
|
|
- Create symlinks
|
|
- Generate merkle trees
|
|
- Final status updates
|
|
```
|
|
|
|
## Hook Script Interface
|
|
|
|
### Input: CLI Arguments (NOT stdin)
|
|
Hooks receive configuration as CLI flags (CSV or JSON-encoded):
|
|
|
|
```bash
|
|
--url="https://example.com"
|
|
--snapshot-id="1234-5678-uuid"
|
|
--config='{"some_key": "some_value"}'
|
|
--plugins=git,media,favicon,title
|
|
--timeout=50
|
|
--enable-something
|
|
```
|
|
|
|
### Input: Environment Variables
|
|
All configuration comes from env vars, defined in `plugin_dir/config.json` JSONSchema:
|
|
|
|
```bash
|
|
WGET_BINARY=/usr/bin/wget
|
|
WGET_TIMEOUT=60
|
|
WGET_USER_AGENT="Mozilla/5.0..."
|
|
WGET_EXTRA_ARGS="--no-check-certificate"
|
|
SAVE_WGET=True
|
|
```
|
|
|
|
**Required:** Every plugin must support `PLUGINNAME_TIMEOUT` for self-termination.
|
|
|
|
### Output: Filesystem (CWD)
|
|
Hooks read/write files to:
|
|
- `$CWD`: Their own output subdirectory (e.g., `archive/snapshots/{id}/wget/`)
|
|
- `$CWD/..`: Parent directory (to read outputs from other hooks)
|
|
|
|
This allows hooks to:
|
|
- Access files created by other hooks
|
|
- Keep their outputs separate by default
|
|
- Use semaphore files for coordination (if needed)
|
|
|
|
### Output: JSONL to stdout
|
|
Hooks emit one JSONL line per database record they want to create or update:
|
|
|
|
```jsonl
|
|
{"type": "Tag", "name": "sci-fi"}
|
|
{"type": "ArchiveResult", "id": "1234-uuid", "status": "succeeded", "output_str": "wget/index.html"}
|
|
{"type": "Snapshot", "id": "5678-uuid", "title": "Example Page"}
|
|
```
|
|
|
|
See `archivebox/misc/jsonl.py` and model `from_json()` / `from_jsonl()` methods for full list of supported types and fields.
|
|
|
|
### Output: stderr for Human Logs
|
|
Hooks should emit human-readable output or debug info to **stderr**. There are no guarantees this will be persisted long-term. Use stdout JSONL or filesystem for outputs that matter.
|
|
|
|
### Cleanup: Delete Cruft
|
|
If hooks emit no meaningful long-term outputs, they should delete any temporary files themselves to avoid wasting space. However, the ArchiveResult DB row should be kept so we know:
|
|
- It doesn't need to be retried
|
|
- It isn't missing
|
|
- What happened (status, error message)
|
|
|
|
### Signal Handling: SIGINT/SIGTERM
|
|
Hooks are expected to listen for polite `SIGINT`/`SIGTERM` and finish hastily, then exit cleanly. Beyond that, they may be `SIGKILL'd` at ArchiveBox's discretion.
|
|
|
|
**If hooks double-fork or spawn long-running processes:** They must output a `.pid` file in their directory so zombies can be swept safely.
|
|
|
|
## Hook Failure Modes & Retry Logic
|
|
|
|
Hooks can fail in several ways. ArchiveBox handles each differently:
|
|
|
|
### 1. Soft Failure (Record & Don't Retry)
|
|
**Exit:** `0` (success)
|
|
**JSONL:** `{"type": "ArchiveResult", "status": "failed", "output_str": "404 Not Found"}`
|
|
|
|
This means: "I ran successfully, but the resource wasn't available." Don't retry this.
|
|
|
|
**Use cases:**
|
|
- 404 errors
|
|
- Content not available
|
|
- Feature not applicable to this URL
|
|
|
|
### 2. Hard Failure / Temporary Error (Retry Later)
|
|
**Exit:** Non-zero (1, 2, etc.)
|
|
**JSONL:** None (or incomplete)
|
|
|
|
This means: "Something went wrong, I couldn't complete." Treat this ArchiveResult as "missing" and set `retry_at` for later.
|
|
|
|
**Use cases:**
|
|
- 500 server errors
|
|
- Network timeouts
|
|
- Binary not found / crashed
|
|
- Transient errors
|
|
|
|
**Behavior:**
|
|
- ArchiveBox sets `retry_at` on the ArchiveResult
|
|
- Hook will be retried during next `archivebox update`
|
|
|
|
### 3. Partial Success (Update & Continue)
|
|
**Exit:** Non-zero
|
|
**JSONL:** Partial records emitted before crash
|
|
|
|
**Behavior:**
|
|
- Update ArchiveResult with whatever was emitted
|
|
- Mark remaining work as "missing" with `retry_at`
|
|
|
|
### 4. Success (Record & Continue)
|
|
**Exit:** `0`
|
|
**JSONL:** `{"type": "ArchiveResult", "status": "succeeded", "output_str": "output/file.html"}`
|
|
|
|
This is the happy path.
|
|
|
|
### Error Handling Rules
|
|
|
|
- **DO NOT skip hooks** based on failures
|
|
- **Continue to next hook** regardless of foreground or background failures
|
|
- **Update ArchiveResults** with whatever information is available
|
|
- **Set retry_at** for "missing" or temporarily-failed hooks
|
|
- **Let background scripts continue** even if foreground scripts fail
|
|
|
|
## File Structure
|
|
|
|
```
|
|
archivebox/plugins/{plugin_name}/
|
|
├── config.json # JSONSchema: env var config options
|
|
├── binaries.jsonl # Runtime dependencies: apt|brew|pip|npm|env
|
|
├── on_Snapshot__XX_name.py # Hook script (foreground)
|
|
├── on_Snapshot__XX_name.bg.py # Hook script (background)
|
|
└── tests/
|
|
└── test_name.py
|
|
```
|
|
|
|
## Implementation Checklist
|
|
|
|
### Phase 1: Schema Migration ✅
|
|
- [x] Add `Snapshot.current_step` (IntegerField 0-9, default=0)
|
|
- [x] Add `ArchiveResult.hook_name` (CharField, nullable) - just filename
|
|
- [x] Create migration: `0034_snapshot_current_step.py`
|
|
|
|
### Phase 2: Core Logic Updates ✅
|
|
- [x] Add `extract_step(hook_name)` utility in `archivebox/hooks.py`
|
|
- Extract first digit from `__XX_` pattern
|
|
- Default to 9 for unnumbered hooks
|
|
- [x] Add `is_background_hook(hook_name)` utility in `archivebox/hooks.py`
|
|
- Check for `.bg.` in filename
|
|
- [x] Update `Snapshot.create_pending_archiveresults()` in `archivebox/core/models.py`:
|
|
- Discover all hooks (not plugins)
|
|
- Create one AR per hook with `hook_name` set
|
|
- [x] Update `ArchiveResult.run()` in `archivebox/core/models.py`:
|
|
- If `hook_name` set: run single hook
|
|
- If `hook_name` None: discover all plugin hooks (existing behavior)
|
|
- [x] Add `Snapshot.advance_step_if_ready()` method:
|
|
- Check if all foreground ARs in current step finished
|
|
- Increment `current_step` if ready
|
|
- Ignore background hooks (.bg) in completion check
|
|
- [x] Integrate with `SnapshotMachine.is_finished()` in `archivebox/core/statemachines.py`:
|
|
- Call `advance_step_if_ready()` before checking if done
|
|
|
|
### Phase 3: Worker Coordination ✅
|
|
- [x] Update worker AR claiming query in `archivebox/workers/worker.py`:
|
|
- Filter: `extract_step(ar.hook_name) <= snapshot.current_step`
|
|
- Claims ARs in QUEUED state, checks step in Python before processing
|
|
- Orders by hook_name for deterministic execution within step
|
|
|
|
### Phase 4: Hook Renumbering ✅
|
|
- [x] Renumber hooks per renumbering map below
|
|
- [x] Add `.bg` suffix to long-running hooks (media, gallerydl, forumdl, papersdl)
|
|
- [x] Move parse_* hooks to step 7 (70-79)
|
|
- [x] Test all hooks still work after renumbering
|
|
|
|
## Migration Path
|
|
|
|
### Natural Compatibility
|
|
No special migration needed:
|
|
1. Existing ARs with `hook_name=None` continue to work (discover all plugin hooks at runtime)
|
|
2. New ARs get `hook_name` set (single hook per AR)
|
|
3. `ArchiveResult.run()` handles both cases naturally
|
|
4. Unnumbered hooks default to step 9 (log warning)
|
|
|
|
### Renumbering Map
|
|
|
|
**Completed Renames:**
|
|
```
|
|
# Step 5: DOM Extraction (sequential, non-background)
|
|
singlefile/on_Snapshot__37_singlefile.py → singlefile/on_Snapshot__50_singlefile.py ✅
|
|
screenshot/on_Snapshot__34_screenshot.js → screenshot/on_Snapshot__51_screenshot.js ✅
|
|
pdf/on_Snapshot__35_pdf.js → pdf/on_Snapshot__52_pdf.js ✅
|
|
dom/on_Snapshot__36_dom.js → dom/on_Snapshot__53_dom.js ✅
|
|
title/on_Snapshot__32_title.js → title/on_Snapshot__54_title.js ✅
|
|
readability/on_Snapshot__52_readability.py → readability/on_Snapshot__55_readability.py ✅
|
|
headers/on_Snapshot__33_headers.js → headers/on_Snapshot__55_headers.js ✅
|
|
mercury/on_Snapshot__53_mercury.py → mercury/on_Snapshot__56_mercury.py ✅
|
|
htmltotext/on_Snapshot__54_htmltotext.py → htmltotext/on_Snapshot__57_htmltotext.py ✅
|
|
|
|
# Step 6: Post-DOM Extraction (background for long-running)
|
|
wget/on_Snapshot__50_wget.py → wget/on_Snapshot__61_wget.py ✅
|
|
git/on_Snapshot__12_git.py → git/on_Snapshot__62_git.py ✅
|
|
media/on_Snapshot__51_media.py → media/on_Snapshot__63_media.bg.py ✅
|
|
gallerydl/on_Snapshot__52_gallerydl.py → gallerydl/on_Snapshot__64_gallerydl.bg.py ✅
|
|
forumdl/on_Snapshot__53_forumdl.py → forumdl/on_Snapshot__65_forumdl.bg.py ✅
|
|
papersdl/on_Snapshot__54_papersdl.py → papersdl/on_Snapshot__66_papersdl.bg.py ✅
|
|
|
|
# Step 7: URL Extraction (parse_* hooks moved from step 6)
|
|
parse_html_urls/on_Snapshot__60_parse_html_urls.py → parse_html_urls/on_Snapshot__70_parse_html_urls.py ✅
|
|
parse_txt_urls/on_Snapshot__62_parse_txt_urls.py → parse_txt_urls/on_Snapshot__71_parse_txt_urls.py ✅
|
|
parse_rss_urls/on_Snapshot__61_parse_rss_urls.py → parse_rss_urls/on_Snapshot__72_parse_rss_urls.py ✅
|
|
parse_netscape_urls/on_Snapshot__63_parse_netscape_urls.py → parse_netscape_urls/on_Snapshot__73_parse_netscape_urls.py ✅
|
|
parse_jsonl_urls/on_Snapshot__64_parse_jsonl_urls.py → parse_jsonl_urls/on_Snapshot__74_parse_jsonl_urls.py ✅
|
|
parse_dom_outlinks/on_Snapshot__40_parse_dom_outlinks.js → parse_dom_outlinks/on_Snapshot__75_parse_dom_outlinks.js ✅
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Test hook ordering (00-99)
|
|
- Test step grouping (first digit)
|
|
- Test .bg vs foreground execution
|
|
- Test timeout enforcement
|
|
- Test JSONL parsing
|
|
- Test failure modes & retry_at logic
|
|
|
|
### Integration Tests
|
|
- Test full Snapshot.run() with mixed hooks
|
|
- Test .bg scripts running beyond step 99
|
|
- Test zombie process cleanup
|
|
- Test graceful SIGTERM handling
|
|
- Test concurrent .bg script coordination
|
|
|
|
### Performance Tests
|
|
- Measure overhead of per-hook ArchiveResults
|
|
- Test with 50+ concurrent .bg scripts
|
|
- Test filesystem contention with many hooks
|
|
|
|
## Open Questions
|
|
|
|
### Q: Should we provide semaphore utilities?
|
|
**A:** No. Keep plugins decoupled. Let them use simple filesystem coordination if needed.
|
|
|
|
### Q: What happens if ArchiveResult table gets huge?
|
|
**A:** We can delete old successful ArchiveResults periodically, or archive them to cold storage. The important data is in the filesystem outputs.
|
|
|
|
### Q: Should naturally-exiting .bg scripts still be .bg?
|
|
**A:** Yes. The .bg suffix means "don't block step progression," not "run until step 99." Natural exit is the best case.
|
|
|
|
## Examples
|
|
|
|
### Foreground Hook (Sequential DOM Access)
|
|
```python
|
|
#!/usr/bin/env python3
|
|
# archivebox/plugins/screenshot/on_Snapshot__51_screenshot.js
|
|
|
|
# Runs at step 5, blocks step progression until complete
|
|
# Gets killed if it exceeds SCREENSHOT_TIMEOUT
|
|
|
|
timeout = get_env_int('SCREENSHOT_TIMEOUT') or get_env_int('TIMEOUT', 60)
|
|
|
|
try:
|
|
result = subprocess.run(cmd, capture_output=True, timeout=timeout)
|
|
if result.returncode == 0:
|
|
print(json.dumps({
|
|
"type": "ArchiveResult",
|
|
"status": "succeeded",
|
|
"output_str": "screenshot.png"
|
|
}))
|
|
sys.exit(0)
|
|
else:
|
|
# Temporary failure - will be retried
|
|
sys.exit(1)
|
|
except subprocess.TimeoutExpired:
|
|
# Timeout - will be retried
|
|
sys.exit(1)
|
|
```
|
|
|
|
### Background Hook (Long-Running Download)
|
|
```python
|
|
#!/usr/bin/env python3
|
|
# archivebox/plugins/ytdlp/on_Snapshot__63_ytdlp.bg.py
|
|
|
|
# Runs at step 6, doesn't block step progression
|
|
# Gets full YTDLP_TIMEOUT (e.g., 3600s) regardless of when step 99 completes
|
|
|
|
timeout = get_env_int('YTDLP_TIMEOUT') or get_env_int('TIMEOUT', 3600)
|
|
|
|
try:
|
|
result = subprocess.run(['yt-dlp', url], capture_output=True, timeout=timeout)
|
|
if result.returncode == 0:
|
|
print(json.dumps({
|
|
"type": "ArchiveResult",
|
|
"status": "succeeded",
|
|
"output_str": "media/"
|
|
}))
|
|
sys.exit(0)
|
|
else:
|
|
# Hard failure - don't retry
|
|
print(json.dumps({
|
|
"type": "ArchiveResult",
|
|
"status": "failed",
|
|
"output_str": "Video unavailable"
|
|
}))
|
|
sys.exit(0) # Exit 0 to record the failure
|
|
except subprocess.TimeoutExpired:
|
|
# Timeout - will be retried
|
|
sys.exit(1)
|
|
```
|
|
|
|
### Background Hook with Natural Exit
|
|
```javascript
|
|
#!/usr/bin/env node
|
|
// archivebox/plugins/ssl/on_Snapshot__23_ssl.bg.js
|
|
|
|
// Sets up listener, captures SSL info, then exits naturally
|
|
// No SIGTERM handler needed - already exits when done
|
|
|
|
async function main() {
|
|
const page = await connectToChrome();
|
|
|
|
// Set up listener
|
|
page.on('response', async (response) => {
|
|
const securityDetails = response.securityDetails();
|
|
if (securityDetails) {
|
|
fs.writeFileSync('ssl.json', JSON.stringify(securityDetails));
|
|
}
|
|
});
|
|
|
|
// Wait for navigation (done by other hook)
|
|
await waitForNavigation();
|
|
|
|
// Emit result
|
|
console.log(JSON.stringify({
|
|
type: 'ArchiveResult',
|
|
status: 'succeeded',
|
|
output_str: 'ssl.json'
|
|
}));
|
|
|
|
process.exit(0); // Natural exit - no await indefinitely
|
|
}
|
|
|
|
main().catch(e => {
|
|
console.error(`ERROR: ${e.message}`);
|
|
process.exit(1); // Will be retried
|
|
});
|
|
```
|
|
|
|
## Summary
|
|
|
|
This plan provides:
|
|
- ✅ Clear execution ordering (10 steps, 00-99 numbering)
|
|
- ✅ Async support (.bg suffix)
|
|
- ✅ Independent timeout control per plugin
|
|
- ✅ Flexible failure handling & retry logic
|
|
- ✅ Streaming JSONL output for DB updates
|
|
- ✅ Simple filesystem-based coordination
|
|
- ✅ Backward compatibility during migration
|
|
|
|
The main implementation work is refactoring `Snapshot.run()` to enforce step ordering and manage .bg script lifecycles. Plugin renumbering is straightforward mechanical work.
|