- Complete Refactoring of the old code base to make it more readable and maintainable
- Intoduced unit tests & Makefile
- Extracted URLs are no longer limited only to HTML href tags. Now, also URLs like https://www.facebook.de/abc123 will be detected in paragraphs of the HTML or in the text part of the email. URLs without a protocol like facebook.de are still not detected, since this would probably lead to too many false positives. Also, the email clients I tried (e.g. Outlook) does not make these URLs clickable for the user.
- Now, there is the possibility to define a structured output (currently only JSON is supported) but it is easy to add more in the future if needed