mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-01-06 02:46:05 +10:00
5.1 KiB
5.1 KiB
Claude Code Development Guide for ArchiveBox
Quick Start
# Set up dev environment
uv sync --dev
# Run tests as non-root user (required - ArchiveBox refuses to run as root)
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/ -v'
Development Environment Setup
Prerequisites
- Python 3.11+ (3.13 recommended)
- uv package manager
- A non-root user for running tests (e.g.,
testuser)
Install Dependencies
uv sync --dev
Activate Virtual Environment
source .venv/bin/activate
Running Tests
CRITICAL: Never Run as Root
ArchiveBox has a root check that prevents running as root user. Always run tests as a non-root user:
# Run all migration tests
sudo -u testuser bash -c 'source /path/to/.venv/bin/activate && python -m pytest archivebox/tests/test_migrations_*.py -v'
# Run specific test file
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -v'
# Run single test
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_fresh.py::TestFreshInstall::test_init_creates_database -xvs'
Test File Structure
archivebox/tests/
├── test_migrations_helpers.py # Schemas, seeding functions, verification helpers
├── test_migrations_fresh.py # Fresh install tests
├── test_migrations_04_to_09.py # 0.4.x → 0.9.x migration tests
├── test_migrations_07_to_09.py # 0.7.x → 0.9.x migration tests
└── test_migrations_08_to_09.py # 0.8.x → 0.9.x migration tests
Test Writing Standards
NO MOCKS - Real Tests Only
Tests must exercise real code paths:
- Create real SQLite databases with version-specific schemas
- Seed with realistic test data
- Run actual
python -m archiveboxcommands via subprocess - Query SQLite directly to verify results
NO SKIPS
Never use @skip, skipTest, or pytest.mark.skip. Every test must run.
Strict Assertions
initcommand must return exit code 0 (not[0, 1])- Verify ALL data is preserved, not just "at least one"
- Use exact counts (
==) not loose bounds (>=)
Example Test Pattern
def test_migration_preserves_snapshots(self):
"""Migration should preserve all snapshots."""
result = run_archivebox(self.work_dir, ['init'], timeout=45)
self.assertEqual(result.returncode, 0, f"Init failed: {result.stderr}")
ok, msg = verify_snapshot_count(self.db_path, expected_count)
self.assertTrue(ok, msg)
Migration Testing
Schema Versions
- 0.4.x: First Django version. Tags as comma-separated string, no ArchiveResult model
- 0.7.x: Tag model with M2M, ArchiveResult model, AutoField PKs
- 0.8.x: Crawl/Seed models, UUID PKs, status fields, depth/retry_at
- 0.9.x: Seed model removed, seed_id FK removed from Crawl
Testing a Migration Path
- Create SQLite DB with source version schema (from
test_migrations_helpers.py) - Seed with realistic test data using
seed_0_X_data() - Run
archivebox initto trigger migrations - Verify data preservation with
verify_*functions - Test CLI commands work post-migration (
status,list,add, etc.)
Squashed Migrations
When testing 0.8.x (dev branch), you must record ALL replaced migrations:
# The squashed migration replaces these - all must be recorded
('core', '0023_alter_archiveresult_options_archiveresult_abid_and_more'),
('core', '0024_auto_20240513_1143'),
# ... all 52 migrations from 0023-0074 ...
('core', '0023_new_schema'), # Also record the squashed migration itself
Common Gotchas
1. File Permissions
New files created by root need permissions fixed for testuser:
chmod 644 archivebox/tests/test_*.py
2. DATA_DIR Environment Variable
Tests use temp directories. The run_archivebox() helper sets DATA_DIR automatically.
3. Extractors Disabled for Speed
Tests disable all extractors via environment variables for faster execution:
env['SAVE_TITLE'] = 'False'
env['SAVE_FAVICON'] = 'False'
# ... etc
4. Timeout Settings
Use appropriate timeouts for migration tests (45s for init, 60s default).
5. Circular FK References in Schemas
SQLite handles circular references with IF NOT EXISTS. Order matters less than in other DBs.
Architecture Notes
Crawl Model (0.9.x)
- Crawl groups multiple Snapshots from a single
addcommand - Each
addcreates one Crawl with one or more Snapshots - Seed model was removed - crawls now store URLs directly
Migration Strategy
- Squashed migrations for clean installs
- Individual migrations recorded for upgrades from dev branch
replacesattribute in squashed migrations lists what they replace
Debugging Tips
Check Migration State
sqlite3 /path/to/index.sqlite3 "SELECT app, name FROM django_migrations WHERE app='core' ORDER BY id;"
Check Table Schema
sqlite3 /path/to/index.sqlite3 "PRAGMA table_info(core_snapshot);"
Verbose Test Output
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -xvs 2>&1 | head -200'