mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-01-03 17:35:45 +10:00
Implements a Chrome extension management system that allows extractors to use browser extensions: New 2captcha extractor (runs BEFORE puppeteer): - Downloads Chrome extensions from Web Store (.crx files) - Unpacks extensions to ./extensions/ directory - Writes CHROME_EXTENSIONS_PATHS and CHROME_EXTENSIONS_IDS to .env - Supports 2captcha (CAPTCHA solving), singlefile, uBlock, cookie consent blocker - Configurable via API_KEY_2CAPTCHA and EXTENSIONS_ENABLED env vars Updated puppeteer extractor: - Reads CHROME_EXTENSIONS_PATHS from .env - Loads extensions when launching Chrome - Runs in headed mode when extensions are present (extensions require visible browser) - Passes extension IDs to Chrome via --load-extension and --allowlisted-extension-id Updated singlefile extractor (now uses extension instead of CLI): - Connects to existing Chrome browser via CDP - Triggers SingleFile extension via Ctrl+Shift+Y keyboard shortcut - Waits for downloaded file to appear in Chrome downloads directory - More reliable than single-file-cli and better quality output - Fully integrates with Chrome's extension ecosystem Benefits: - Automatic CAPTCHA solving via 2captcha extension - Better ad/cookie blocking via uBlock and cookie consent extensions - Higher quality single-file archives using official SingleFile extension - Extensions share browser state (cookies, local storage, etc.) - Foundation for adding more browser extensions in the future Dependencies: - Added unzip-crx-3 for unpacking .crx extension files - Updated extractors to use puppeteer-core for CDP connections Execution order: 1. 2captcha downloads/configures extensions 2. puppeteer launches Chrome with extensions loaded 3. All other extractors reuse the same Chrome instance with extensions active
38 lines
873 B
JSON
38 lines
873 B
JSON
{
|
|
"name": "archivebox-ts",
|
|
"version": "0.1.0",
|
|
"description": "TypeScript-based version of ArchiveBox with simplified architecture",
|
|
"main": "dist/cli.js",
|
|
"bin": {
|
|
"archivebox-ts": "./dist/cli.js"
|
|
},
|
|
"scripts": {
|
|
"build": "tsc",
|
|
"dev": "tsc --watch",
|
|
"start": "node dist/cli.js",
|
|
"test": "echo \"Error: no test specified\" && exit 1"
|
|
},
|
|
"keywords": [
|
|
"archiving",
|
|
"web-archiving",
|
|
"snapshot"
|
|
],
|
|
"author": "",
|
|
"license": "MIT",
|
|
"dependencies": {
|
|
"@mozilla/readability": "^0.6.0",
|
|
"better-sqlite3": "^11.0.0",
|
|
"commander": "^12.0.0",
|
|
"jsdom": "^27.1.0",
|
|
"nanoid": "^3.3.7",
|
|
"puppeteer": "^24.28.0",
|
|
"puppeteer-core": "^24.28.0",
|
|
"unzip-crx-3": "^0.2.0"
|
|
},
|
|
"devDependencies": {
|
|
"@types/better-sqlite3": "^7.6.9",
|
|
"@types/node": "^20.11.0",
|
|
"typescript": "^5.3.3"
|
|
}
|
|
}
|