Zyte-spider-templates

Latest version: v0.11.2

Safety actively analyzes 722032 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

0.11.2

-------------------

* Do not log warning about disabled components.

0.11.1

-------------------

* The :ref:`e-commerce <e-commerce>` and :ref:`job posting <job-posting>`
spider templates no longer ignore item requests for a different domain.

0.11.0

-------------------

* New :ref:`Articles spider template <article>`, built on top of
Zyte API’s :http:`request:article` and :http:`request:articleNavigation`.

* New :ref:`Job Posting spider template <job-posting>`, built on top of
Zyte API’s :http:`request:jobPosting` and :http:`request:jobPostingNavigation`.

* :ref:`Search queries <search-queries>` support is added to the
:ref:`e-commerce spider template <e-commerce>`.
This allows to provide a list of search queries to the
spider; the spider finds a search form on the target webpage, and submits all the queries.

* ProductList extraction support is added to the
:ref:`e-commerce spider template <e-commerce>`. This allows spiders to
extract basic product information without going into product detail pages.

* New features are added to the :ref:`Google Search spider template <google-search>`:

* An option to follow the result links and extract data
from the target pages (via the ``extract`` argument)
* Content Languages (lr) parameter
* Content Countries (cr) parameter
* User Country (gl) parameter
* User Language (hl) parameter
* results_per_page parameter

* Added a Scrapy add-on. This allows to greatly simplify the initial
zyte-spider-templates configuration.

* Bug fix: incorrectly extracted URLs no longer make spiders drop
other requests.

* Cleaned up the CI; improved the testing suite; cleaned up the documentation.

0.10.0

-------------------

* Dropped Python 3.8 support, added Python 3.13 support.

* Increased the minimum required versions of some dependencies:

* ``pydantic``: ``2`` → ``2.1``

* ``scrapy-poet``: ``0.21.0`` → ``0.24.0``

* ``scrapy-spider-metadata``: ``0.1.2`` → ``0.2.0``

* ``scrapy-zyte-api[provider]``: ``0.16.0`` → ``0.23.0``

* ``zyte-common-items``: ``0.22.0`` → ``0.23.0``

* Added :ref:`custom attributes <custom-attributes>` support to the
:ref:`e-commerce spider template <e-commerce>` through its new
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.custom_attrs_input`
and
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.custom_attrs_method`
parameters.

* The
:class:`~zyte_spider_templates.spiders.serp.GoogleSearchSpiderParams.max_pages`
parameter of the :ref:`Google Search spider template <google-search>` can no
longer be 0 or lower.

* The :ref:`Google Search spider template <google-search>` now follows
pagination for the results of each query page by page, instead of sending a
request for every page in parallel. It stops once it reaches a page without
organic results.

* Improved the description of
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy`
values.

* Fixed type hint issues related to Scrapy.

0.9.0

------------------

* Now requires ``zyte-common-items >= 0.22.0``.

* New :ref:`Google Search spider template <google-search>`, built on top of
Zyte API’s :http:`request:serp`.

* The heuristics of the :ref:`e-commerce spider template <e-commerce>` to
ignore certain URLs when following category links now also handles
subdomains. For example, before https://example.com/blog was ignored, now
https://blog.example.com is also ignored.

* In the :ref:`spider parameters JSON schema <params-schema>`, the
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.crawl_strategy`
parameter of the :ref:`e-commerce spider template <e-commerce>` switches
position, from being the last parameter to being between
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.urls_file`
and
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.geolocation`.

* Removed the ``valid_page_types`` attribute of
:class:`zyte_spider_templates.middlewares.CrawlingLogsMiddleware`.

0.8.0

------------------

* Added new input parameters:

* ``urls`` accepts a newline-delimited list of URLs.

* ``urls_file`` accepts a URL that points to a plain-text file with a
newline-delimited list of URLs.

Only one of ``url``, ``urls`` and ``urls_file`` should be used at a time.

* Added new crawling strategies:

* ``automatic`` - uses heuristics to see if an input URL is a homepage, for
which it uses a modified ``full`` strategy where other links are discovered
only in the homepage. Otherwise, it assumes it's a navigation page and uses
the existing ``navigation`` strategy.

* ``direct_item`` - input URLs are directly extracted as products.

* Added new parameters classes: ``LocationParam`` and ``PostalAddress``. Note
that these are available for use when customizing the templates and are not
currently being utilized by any template.

* Backward incompatible changes:

* ``automatic`` becomes the new default crawling strategy instead of ``full``.

* CI test improvements.

Page 1 of 3

Releases

Has known vulnerabilities

Zyte-spider-templates

Page 1 of 3

0.11.2

0.11.1

0.11.0

0.10.0

0.9.0

0.8.0

Page 1 of 3

Links

Releases