Easierscrape

Latest version: v0.2.2

Safety actively analyzes 632783 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.2.2

- Added year and name to LICENSE
- Added "Overview" section to documentation index
- Added more examples to documentation index
- Updated MANIFEST.in

0.2.1

- Added blacklist and whitelist options to `tree_gen` and `print_tree`
- Added a `download_path` parameter to the `Scraper` class that allows users to choose where downloads go.
- Added a `clear_downloads` method to allow users to delete the `Scraper` download directory.
- Updated testing to check to make sure the number of files downloaded is correct.

0.2.0

- Rewrote `Scraper` to take in the url, rather than the functions that it contains. This means that pages are only fully scraped once, rather than scraped once for each function. Previously, a website would be opened for every method on it (it would open once to scrape images, another time to scrape tables, etc.). Now, it is opened only once and contents are scraped from a single open session, not across multiple. This reduces the number of "hits" each webpage bears when being scraped and improves runtime.
- For example, instead of doing the following:
python
scraper = Scraper()
scraper.parse_tables("www.example.com")
scraper.parse_images("www.example.com")
scraper.parse_lists("www.example.com")

Do this instead:
python
scraper = Scraper("www.example.com")
scraper.parse_tables()
scraper.parse_images()
scraper.parse_lists()

- The rewrite also makes the output of `_soup_url` a private class attribute (`soup_url`), rather than a private method.
- Rewrote `print_tree` to not lean on `tree_gen`. The functionality of `tree_gen` remains the same as before (generates an `anytree` tree and returns the head node), but `print_tree` now takes in a parameter for the depth of the tree, not the head `Node` itself.
- For example, instead of doing the following:
python
scraper = Scraper()
scraper.print_tree(scraper.tree_gen("www.example.com", 2))

Do this instead:
python
scraper = Scraper("www.example.com")
scraper.print_tree(2)

- Updated docs and demo recording to match the rewrite.
- Added a destructor to the `Scraper` class.
- Added an xlsx option for `parse_tables`

0.1.3

- Removed usage of `urllib` legacy interface
- Fixed bug with `_tree_gen rec`
- Updated README.md with CONTRIBUTING and LICENSE links
- Updated test_all.py to clean up the download directory after running `make test`
- Added `get_screenshot` method

0.1.2

Fixed issues with 0.1.1 release:
- Corrected issue with `_concat_urls` which led to incorrect urls noticed after the Selenium release (corrected test as well).
- Corrected `__main__` to use `Scraper` class instead of function imports.
- Added testing for CLI usage through `main`.
- Updated README and docs to match the Selenium update.
- Added recording for CLI usage.

0.1.1

This patch replaces requests with Selenium to allow for dynamic scraping and places all functions under a Scraper class.

Page 1 of 2

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.