Files
ArchiveBox/archivebox-ts/package.json
Claude 891409a1cc Add Chrome extension support with 2captcha extractor and update singlefile
Implements a Chrome extension management system that allows extractors to use browser extensions:

New 2captcha extractor (runs BEFORE puppeteer):
- Downloads Chrome extensions from Web Store (.crx files)
- Unpacks extensions to ./extensions/ directory
- Writes CHROME_EXTENSIONS_PATHS and CHROME_EXTENSIONS_IDS to .env
- Supports 2captcha (CAPTCHA solving), singlefile, uBlock, cookie consent blocker
- Configurable via API_KEY_2CAPTCHA and EXTENSIONS_ENABLED env vars

Updated puppeteer extractor:
- Reads CHROME_EXTENSIONS_PATHS from .env
- Loads extensions when launching Chrome
- Runs in headed mode when extensions are present (extensions require visible browser)
- Passes extension IDs to Chrome via --load-extension and --allowlisted-extension-id

Updated singlefile extractor (now uses extension instead of CLI):
- Connects to existing Chrome browser via CDP
- Triggers SingleFile extension via Ctrl+Shift+Y keyboard shortcut
- Waits for downloaded file to appear in Chrome downloads directory
- More reliable than single-file-cli and better quality output
- Fully integrates with Chrome's extension ecosystem

Benefits:
- Automatic CAPTCHA solving via 2captcha extension
- Better ad/cookie blocking via uBlock and cookie consent extensions
- Higher quality single-file archives using official SingleFile extension
- Extensions share browser state (cookies, local storage, etc.)
- Foundation for adding more browser extensions in the future

Dependencies:
- Added unzip-crx-3 for unpacking .crx extension files
- Updated extractors to use puppeteer-core for CDP connections

Execution order:
1. 2captcha downloads/configures extensions
2. puppeteer launches Chrome with extensions loaded
3. All other extractors reuse the same Chrome instance with extensions active
2025-11-03 21:03:18 +00:00

38 lines
873 B
JSON

{
"name": "archivebox-ts",
"version": "0.1.0",
"description": "TypeScript-based version of ArchiveBox with simplified architecture",
"main": "dist/cli.js",
"bin": {
"archivebox-ts": "./dist/cli.js"
},
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"start": "node dist/cli.js",
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [
"archiving",
"web-archiving",
"snapshot"
],
"author": "",
"license": "MIT",
"dependencies": {
"@mozilla/readability": "^0.6.0",
"better-sqlite3": "^11.0.0",
"commander": "^12.0.0",
"jsdom": "^27.1.0",
"nanoid": "^3.3.7",
"puppeteer": "^24.28.0",
"puppeteer-core": "^24.28.0",
"unzip-crx-3": "^0.2.0"
},
"devDependencies": {
"@types/better-sqlite3": "^7.6.9",
"@types/node": "^20.11.0",
"typescript": "^5.3.3"
}
}