mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-03-15 20:43:02 +10:00
better title regex to match titles surrounded by newlines
This commit is contained in:
@@ -66,9 +66,9 @@ URL_REGEX = re.compile(
|
||||
re.IGNORECASE,
|
||||
)
|
||||
HTML_TITLE_REGEX = re.compile(
|
||||
r'<title>' # start matching text after <title> tag
|
||||
r'<title.*?>' # start matching text after <title> tag
|
||||
r'(.[^<>]+)', # get everything up to these symbols
|
||||
re.IGNORECASE,
|
||||
re.IGNORECASE | re.MULTILINE | re.DOTALL | re.UNICODE,
|
||||
)
|
||||
|
||||
### Checks & Tests
|
||||
|
||||
Reference in New Issue
Block a user