Webchanges

Latest version: v3.26.0

Safety actively analyzes 688931 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 9

3.10.3

===================
2022-07-22

Added
-----
* ``url`` jobs with ``use_browser: true`` that receive an error HTTP status code from the server will now include the
text returned by the server in the error message (e.g. "Rate exceeded.", "upstream request timeout", etc.), except if
HTTP status code 404 - Not Found is received.

Changed
-------
* The command line argument ``--jobs`` used to specify a jobs file now accepts a `glob pattern
<https://en.wikipedia.org/wiki/Glob_(programming)>`__, e.g. wildcards, to specify multiple files. If more than one
file matches the pattern, their contents will be concatenated before a job list is built. Useful e.g. if you have
multiple jobs files that run on different schedules and you want to clean the snapshot database of URLs/commands no
longer monitored ("garbage collect") using ``--gc-cache`` (e.g. ``webchanges --jobs *.yaml --gc-cache``).
* The command line argument ``--list`` will now list the full path of the jobs file(s).
* Traceback information for Python Exceptions is suppressed by default. Use the command line argument ``--verbose``
(or ``-v``) to display it.

Fixed
-----
* Fixed ``Unicode strings with encoding declaration are not supported.`` error in the ``xpath`` filter using
``method: xml`` under certain conditions (MacOS only). Reported by `jprokos <https://github.com/jprokos>`__ in `#42
<https://github.com/mborsetti/webchanges/issues/42>`__.

Internals
---------
* The source distribution is now available on PyPI to support certain packagers like ``fpm``.
* Improved handling and reporting of Playwright browser errors (for ``url`` jobs with ``use_browser: true``).

3.10.2

===================
2022-06-22

⚠ Breaking Changes
------------------
* Due to a fix to the ``html2text`` filter (see below), the first time you run this new version **you may get a change
report with deletions and additions of lines that look identical. This will happen one time only** and will prevent
future such change reports.

Added
-----
* You can now run the command line argument ``--test`` without specifying a JOB; this will check the config
(default: ``config.yaml``) and job (default: ``job.yaml``) files for syntax errors.
* New job directive ``compared_versions`` allows change detection to be made against multiple saved snapshots;
useful for monitoring websites that change between a set of states (e.g. they are running A/B testing).
* New command line argument ``--check-new`` to check if a new version of **webchanges** is available.
* Error messages for ``url`` jobs failing with HTTP reason codes of 400 and higher now include any text returned by the
website (e.g. "Rate exceeded.", "upstream request timeout", etc.). Not implemented in jobs with ``use_browser: true``
due to limitations in Playwright.

Changed
-------
* On Linux and macOS systems, for security reasons we now check that the hooks file **and** the directory it is located
in are **owned** and **writeable** by **only** the user who is running the job (and not by its group or by other
users), identical to what we do with the jobs file if any job uses the ``shellpipe`` filter. An
explanatory ImportWarning message will be issued if the permissions are not correct and the import of the hooks module
is skipped.
* The command line argument ``-v`` or ``--verbose`` now shows reduced verbosity logging output while ``-vv`` (or
``--verbose --verbose``) shows full verbosity.

Fixed
-----
* The ``html2text`` filter is no longer retaining any spaces found in the HTML after *the end of the text* on a line,
which are not displayed in HTML and therefore a bug in the conversion library used. This was causing a change report
to be issued whenever the number of such invisible spaces changed.
* The ``cookies`` directive was not adding cookies correctly to the header for jobs with ``browser: true``.
* The ``wait_for_timeout`` job directive was not accepting integers (only floats). Reported by `Markus Weimar
<https://github.com/Markus00000>`__ in `#39 <https://github.com/mborsetti/webchanges/issues/39>`__.
* Improved the usefulness of the message of FileNotFoundError exceptions in filters ``execute`` and ``shellpipe``
and in reporter ``run_command``.
* Fixed an issue in the legacy parser used by the ``xpath`` filter which under specific conditions caused more html
than expected to be returned.
* Fixed how we determine if a new version has been released (due to an API change by PyPI).
* When adding custom JobBase classes through the hooks file, their configuration file entries are no longer causing
warnings to be issued as unrecognized directives.

Internals
---------
* Changed bootstrapping logic so that when using ``-vv`` the logs will include messages relating to the registration of
the various classes.
* Improved execution speed of certain informational command line arguments.
* Updated the vendored version of ``packaging.version.parse()`` to 21.3, released on 2021-11-27.
* Changed the import logic for the ``packaging.version.parse()`` function so that if ``packaging`` is found to be
installed, it will be imported from there instead of from the vendored module.
* ``urllib3`` is now an explicit dependency due to the refactoring of the ``requests`` package (we previously used
``requests.packages.urllib3``). Has no effect since ``urllib3`` is already being installed as a dependency of
``requests``.
* Added ``py.typed`` marker file to implement `PEP 561 <https://peps.python.org/pep-0561/>`__.

3.10.1

Not secure
===================
2022-05-03

Fixed
-----
* ``KeyError: 'indent'`` error when using ``beautify`` filter. Reported by `César de Tassis Filho
<https://github.com/CTassisF>`__ in `#37 <https://github.com/mborsetti/webchanges/issues/37>`__.

3.10

Not secure
===================
20220502

⚠ Breaking changes
------------------

Pyppeteer has been replaced with Playwright
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This change only affects jobs that ``use_browser: true`` (i.e. those running on a browser to run JavaScript). If none
of your jobs have ``use_browser: true``, there's nothing new here (and nothing to do).

Must do
~~~~~~~
If *any* of your jobs have ``use_browser: true``, you **MUST**:

1) Install the new dependencies:

.. code-block:: bash

pip install --upgrade webchanges[use_browser]

2) (Optional) ensure you have an up-to-date Google Chrome browser:

.. code-block:: bash

webchanges --install-chrome

Additionally, if any of your ``use_browser: true`` jobs use the ``wait_for`` directive, it needs to be replaced with
one of:

* ``wait_for_function`` if you were specifying a JavaScript function (see
`here <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__ for full function details).
* ``wait_for_selector`` if you were specifying a selector string or xpath string (see `here
<https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-selector>`__ for full function details), or
* ``wait_for_timeout`` if you were specifying a timeout; however, this function should only be used for debugging
because it "is going to be flaky", so use one of the other two ``wait_for`` if you can.; full details `here
<https://playwright.dev/python/docs/api/class-frame#frame-wait-for-timeout>`__.

Optionally, the values of ``wait_for_function`` and ``wait_for_selector`` can now be dicts to take full advantage of all
the features offered by those functions in Playwright (see documentation links above).

If you are using the ``wait_for_navigation`` directive, it is now called ``wait_for_url`` and offers both glob pattern
and regex matching; ``wait_for_navigation`` will act as an alias for now but but a deprecation warning will be issued.

If you are using the ``chromium_revision`` or ``_beta_use_playwright`` directives in your configuration file, you
should delete them to prevent future errors (for now only a deprecation warning is issued).

Finally, if you are using the experimental ``block_elements`` sub-directive, it is not (yet?) implemented in Playwright
and is simply ignored.

Improvements
~~~~~~~~~~~~
``wait_until`` has additional functionality, and now takes one of:

* ``load`` (default): Consider operation to be finished when the ``load`` event is fired.
* ``domcontentloaded``: Consider operation to be finished when the ``DOMContentLoaded`` event is fired.
* ``networkidle`` (old ``networkidle0`` and ``networkidle2`` map into this): Consider operation to be finished when
there are no network connections for at least 500 ms.
* ``commit`` (new): Consider operation to be finished when network response is received and the document started
loading.

New directives
~~~~~~~~~~~~~~
The following directives are new to the Playwright implementation:

* ``referer``: Referer header value (a string). If provided, it will take preference over the referer header value set
by the ``headers`` sub-directive.
* ``initialization_url``: A url to navigate to before the ``url`` (e.g. a home page where some state gets set).
* ``initialization_js``: Only used in conjunction with ``initialization_url``, a JavaScript to execute after
loading ``initialization_url`` and before navigating to the ``url`` (e.g. to emulate a log in). Advanced usage
* ``ignore_default_args`` directive for ``url`` jobs with ``use_browser: true`` (using Chrome) to control how Playwright
launches Chrome.

In addition, the new ``--no-headless`` command line argument will run the Chrome browser in "headed" mode, i.e.
displaying the website as it loads it, to facilitate with debugging and testing (e.g. ``webchanges --test 1
--no-headless --test-reporter email``).

See more details of the new directives in the updated documentation.


Freeing space by removing Pyppeteer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can free up disk space if no other packages use Pyppeteer by, in order:

1) Removing the downloaded Chromium images by deleting the entire *directory* (and its subdirectories) shown by running:

.. code-block:: bash

python -c "import pathlib; from pyppeteer.chromium_downloader import DOWNLOADS_FOLDER; print(pathlib.Path(DOWNLOADS_FOLDER).parent)"

2) Uninstalling the Pyppeteer package by running:

.. code-block:: bash

pip uninstall pyppeteer


Rationale
~~~~~~~~~
The implementation of ``use_browser: true`` jobs (i.e. those running on a browser to run JavaScript) using Pyppeteer
and the Chromium browser it uses has been very problematic, as the library:

* is in alpha,
* is very slow,
* defaults to years-old obsolete versions of Chromium,
* can be insecure (e.g. found that TLS certificates were disabled for downloading browsers!),
* creates conflicts with imports (e.g. requires obsolete version of websockets),
* is poorly documented,
* is poorly maintained,
* may require OS-specific dependencies that need to be separately installed,
* does not work with Arm-based processors,
* is prone to crashing,
* and outright freezes withe the current version of Python (3.10)!

Pyppeteer's `open issues <https://github.com/pyppeteer/pyppeteer/issues>`__ now exceed 130 and are growing almost daily.

`Playwright <https://playwright.dev/python/>`__ has none of the issues above, the core dev team apparently is the same
who wrote Puppeteer (of which Pyppeteer is a port to Python), and is supported by the deep pockets of Microsoft. The
Python version is officially supported and up-to-date, and (in our configuration) uses the latest stable version of
Google Chrome out of the box without the contortions of manually having to pick and set revisions.

Playwright has been in beta testing within **webchanges** for months and has been performing very well (significantly
more so than Pyppeteer).


Documentation
-------------
* Major updates on anything that has to do with ``use_browser``.
* Fixed two examples of the ``email`` reporter. Reported by `jprokos <https://github.com/jprokos>`__ in
`34 <https://github.com/mborsetti/webchanges/issues/34>`__.


Advanced
--------
* If you subclassed JobBase in your ``hooks.py`` file, and are defining a ``retrieve`` method, please note that the
number of arguments has been increased to 3 as follows:

.. code-block:: python

def retrieve(self, job_state: JobState, headless: bool = True) -> tuple[str | bytes, str]:
"""Runs job to retrieve the data, and returns data and ETag.

:param job_state: The JobState object, to keep track of the state of the retrieval.
:param headless: For browser-based jobs, whether headless mode should be used.
:returns: The data retrieved and the ETag.
"""

3.9.2

Not secure
===================
2022-04-13

⚠ Last release using Pyppeteer
------------------------------
* This is the last release using Pyppeteer for jobs with ``use_browser: true``, which will be replaced by Playwright
in release 9.10, forthcoming hopefully in a few weeks. See above for more information on how to prepare -- and start
using Playwright now!

Added
-----
* New ``ignore_dh_key_too_small`` directive for ``url`` jobs to overcome the ``ssl.SSLError: [SSL: DH_KEY_TOO_SMALL] dh
key too small (_ssl.c:1129)`` error.
* New ``indent`` sub-directive for the ``beautify`` filter (requires BeautifulSoup version 4.11.0 or later).
* New ``--dump-history JOB`` command line argument to print all saved snapshot history for a job.
* Playwright only: new``--no-headless`` command line argument to help with debugging and testing (e.g. run
``webchanges --test 1 --no-headless``). Not available for Pyppeteer.
* Extracted Discord reporting from ``webhooks`` into its own ``discord`` reporter to fix it not working and to
add embedding functionality as well as color (contributed by `Michał Ciołek <https://github.com/michalciolek>`__
`upstream <https://github.com/thp/urlwatch/issues/683>`__. Reported by `jprokos <https://github.com/jprokos>`__` in
`33 <https://github.com/mborsetti/webchanges/issues/33>`__.)

Fixed
-----
* We are no longer rewriting to disk the entire database at every run. Now it's only rewritten if there are changes
(and minimally) and, obviously, when running with the ``--gc-cache`` or ``--clean-cache`` command line argument.
Reported by `JsBergbau <https://github.com/JsBergbau>`__ `upstream <https://github.com/thp/urlwatch/issues/690>`__.
Also updated documentation suggesting to run ``--clean-cache`` or ``--gc-cache`` periodically.
* A ValueError is no longer raised if an unknown directive is found in the configuration file, but a Warning is
issued instead. Reported by `c0deing <https://github.com/c0deing>`__ in `#26
<https://github.com/mborsetti/webchanges/issues/26>`__.
* The ``kind`` job directive (used for custom job classes in ``hooks.py``) was undocumented and not fully functioning.
* For jobs with ``use_browser: true`` and a ``switch`` directive containing ``--window-size``, turn off Playwright's
default fixed viewport (of 1280x720) as it overrides ``--window-size``.
* Email headers ("From:", "To:", etc.) now have title case per RFC 2076. Reported by `fdelapena
<https://github.com/fdelapena>`__ in `#29 <https://github.com/mborsetti/webchanges/issues/29>`__.

Documentation
-------------
* Added warnings for Windows users to run Python in UTF-8 mode. Reported by `Knut Wannheden
<https://github.com/knutwannheden>`__ in `#25 <https://github.com/mborsetti/webchanges/issues/25>`__.
* Added suggestion to run ``--clean-cache`` or ``--gc-cache`` periodically to compact the database file.
* Continued improvements.

Internals
---------
* Updated licensing file to `GitHub naming standards
<https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository>`__
and updated its contents to more clearly state that this software redistributes source code of release 2.21 dated 30
July 2020 of urlwatch (https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f) retaining its
license, which is distributed as part of the source code.
* Pyppeteer has been removed from the test suite.
* Deprecated ``webchanges.jobs.ShellError`` exception in favor of Python's native ``subprocess.SubprocessError`` one and
its subclasses.

3.9.1

Not secure
===================
2022-01-27

Fixed
-----
* Config file directives checker would incorrect reject reports added through ``hooks.py``. Reported by `Knut Wannheden
<https://github.com/knutwannheden>`__ in `#24 <https://github.com/mborsetti/webchanges/issues/24>`__.

Page 5 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.