Added
- This CHANGELOG file.
- Metadata (and text with removed boilerplate) can now be extracted from each webpage
- documentation
Changed
- Cached python data created by the tests will be ignored by git.
- If metadata is passed to create_graph, the file name is stored anyway.
- include_content changed to store_content -> only stores text info
- Allow choosing subsets of html tags.
- Basename instead of path is stored in metadata
- Names of functions and files changed:
create_model -> create_graph
create_network -> links2graph
networks.py -> graphs.py
Bugfix
- Custom metadata couldn't be handled due to typo.
- When extracting url from metatag, return als url-object, not as string.
- only install webdriver if necessary
- checking if html is empty before trying to extract metadata