Advertools

Latest version: v0.16.1

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

0.13.2

-------------------

* Added
- Crawling recipe for how to use the ``DEFAULT_REQUEST_HEADERS`` to change
the default headers.

* Changed
- Split long lists of URL while crawling regardless of the ``follow_links``
parameter

* Fixed
- Clarify that while authenticating for Twitter only ``app_key`` and
``app_secret`` are required, with the option to provide ``oauth_token``
and ``oauth_token_secret`` if/when needed.

0.13.1

-------------------

* Added
- Command line interface with most functions
- Make documentation interactive for most pages using ``thebe-sphinx``

* Changed
- Use `np.nan` wherever there are missing values in ``url_to_df``

* Fixed
- Don't remove double quotes from etags when downloading XML sitemaps
- Replace instances of ``pd.DataFrame.append`` with ``pd.concat``, which is
depracated.
- Replace empty values with np.nan for the size column in ``logs_to_df``

0.13.0

-------------------

* Added
- New function ``crawl_headers``: A crawler that only makes `HEAD` requests
to a known list of URLs.
- New function ``reverse_dns_lookup``: A way to get host information for a
large list of IP addresses concurrently.
- New options for crawling: `exclude_url_params`, `include_url_params`,
`exclude_url_regex`, and `include_url_regex` for controlling which links to
follow while crawling.

* Fixed
- Any ``custom_settings`` options given to the ``crawl`` function that were
defined using a dictionary can now be set without issues. There was an
issue if those options were not strings.

* Changed
- The `skip_url_params` option was removed and replaced with the more
versatile ``exclude_url_params``, which accepts either ``True`` or a list
of URL parameters to exclude while following links.

0.12.012

-----------------------

* Added
- New function ``logs_to_df``: Convert a log file of any non-JSON format
into a pandas DataFrame and save it to a `parquet` file. This also
compresses the file to a much smaller size.
- Crawler extracts all available ``img`` attributes: 'alt', 'crossorigin',
'height', 'ismap', 'loading', 'longdesc', 'referrerpolicy', 'sizes',
'src', 'srcset', 'usemap', and 'width' (excluding global HTML attributes
like ``style`` and ``draggable``).
- New parameter for the ``crawl`` function ``skip_url_params``: Defaults to
False, consistent with previous behavior, with the ability to not
follow/crawl links containing any URL parameters.
- New column for ``url_to_df`` "last_dir": Extract the value in the last
directory for each of the URLs.

* Changed
- Query parameter columns in ``url_to_df`` DataFrame are now sorted by how
full the columns are (the percentage of values that are not `NA`)

0.12.3

-------------------

* Fixed
- Crawler stops when provided with bad URLs in list mode.

0.11.1

-------------------

* Added
- The `nofollow` attribute for nav, header, and footer links.

* Fixed
- Timeout error while downloading robots.txt files.
- Make extracting nav, header, and footer links consistent with all links.

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.