{% extends "core/base.html" %} {% load static %} {% load i18n %} {% block breadcrumbs %} {% endblock %} {% block extra_head %} {% endblock %} {% block body %}


{% if stdout %}

Add new URLs to your archive: results

                {{ stdout | safe }}
                


  Add more URLs ➕
{% else %}
{% csrf_token %}

Create a new Crawl

A Crawl is a job that processes URLs and creates Snapshots (archived copies) for each URL discovered. The settings below apply to the entire crawl and all snapshots it creates.

{{ form.url.label_tag }}
0 URLs detected
{{ form.url }}
{% if form.url.errors %}
{{ form.url.errors }}
{% endif %}
Enter URLs to archive, as one per line, CSV, JSON, or embedded in text (e.g. markdown, HTML, etc.). Examples:
https://example.com
https://news.ycombinator.com,https://news.google.com
[ArchiveBox](https://github.com/ArchiveBox/ArchiveBox)
{{ form.tag.label_tag }} {{ form.tag }} {% if form.tag.errors %}
{{ form.tag.errors }}
{% endif %}
Tags will be applied to all snapshots created by this crawl.
{{ form.depth.label_tag }} {{ form.depth }} {% if form.depth.errors %}
{{ form.depth.errors }}
{% endif %}
Controls how many links deep the crawl will follow from the starting URLs.
{{ form.max_urls.label_tag }} {{ form.max_urls }} {% if form.max_urls.errors %}
{{ form.max_urls.errors }}
{% endif %}
0 means unlimited. When set, only the first N filtered URLs will be snapshotted.
{{ form.max_size.label_tag }} {{ form.max_size }} {% if form.max_size.errors %}
{{ form.max_size.errors }}
{% endif %}
0 means unlimited. Accepts bytes or units like 45mb and 1gb.
{{ form.url_filters }} {% if form.url_filters.errors %}
{{ form.url_filters.errors }}
{% endif %}
{{ form.notes.label_tag }} {{ form.notes }} {% if form.notes.errors %}
{{ form.notes.errors }}
{% endif %}
Optional description for this crawl (visible in the admin interface).
{{ form.persona.label_tag }} {{ form.persona }} {% if form.persona.errors %}
{{ form.persona.errors }}
{% endif %}
Authentication profile (Chrome profile, cookies, etc.) to use when accessing URLs. Create new persona / import from Chrome →

Crawl Plugins

Select which archiving methods to run for all snapshots in this crawl. If none selected, all available plugins will be used. View plugin details →

Quick Select:
{{ form.chrome_plugins }}
{{ form.archiving_plugins }}
{{ form.parsing_plugins }}
(defaults to SEARCH_BACKEND_ENGINE)
{{ form.search_plugins }}
{{ form.binary_plugins }}
{{ form.extension_plugins }}

Advanced Crawl Options

Additional settings that control how this crawl processes URLs and creates snapshots.

{{ form.schedule.label_tag }} {{ form.schedule }} {% if form.schedule.errors %}
{{ form.schedule.errors }}
{% endif %}
Optional: Schedule this crawl to repeat automatically. Examples:
daily - Run once per day
weekly - Run once per week
0 */6 * * * - Every 6 hours (cron format)
0 0 * * 0 - Every Sunday at midnight (cron format)
{{ form.index_only }} {{ form.index_only.label_tag }} {% if form.index_only.errors %}
{{ form.index_only.errors }}
{% endif %}
Create the crawl and queue snapshots without running archive plugins yet.
{{ form.config.label_tag }} {{ form.config }} {% if form.config.errors %}
{{ form.config.errors }}
{% endif %}
Override any config option for this crawl (e.g., TIMEOUT, USER_AGENT, CHROME_BINARY, etc.). URL_ALLOWLIST, URL_DENYLIST, and ENABLED_PLUGINS are updated automatically from the fields above.



{% if absolute_add_path %} {% endif %} {% endif %}
{% endblock %} {% block footer %}{% endblock %} {% block sidebar %}{% endblock %}