Files
ArchiveBox/archivebox/search
Ross Williams b6a20c962a Extract text from singlefile.html when indexing
singlefile.html contains a lot of large strings in the form of `data:`
URLs, which can be unnecessarily stored in full-text indices. Also,
large chunks of JavaScript shouldn't be indexed, either, as they pollute
search results for searches about JS functions, etc.

This commit takes a blanket approach of parsing singlefile.html as it is
read and only outputting text and selected textual attributes (like
`alt`) for indexing.
2023-10-12 13:06:35 -04:00
..