Zyte-spider-templates

Latest version: v0.9.0

Safety actively analyzes 681790 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.9.0

------------------

* Now requires ``zyte-common-items >= 0.22.0``.

* New :ref:`Google Search spider template <google-search>`, built on top of
Zyte API’s :http:`request:serp`.

* The heuristics of the :ref:`e-commerce spider template <e-commerce>` to
ignore certain URLs when following category links now also handles
subdomains. For example, before https://example.com/blog was ignored, now
https://blog.example.com is also ignored.

* In the :ref:`spider parameters JSON schema <params-schema>`, the
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.crawl_strategy`
parameter of the :ref:`e-commerce spider template <e-commerce>` switches
position, from being the last parameter to being between
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.urls_file`
and
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.geolocation`.

* Removed the ``valid_page_types`` attribute of
:class:`zyte_spider_templates.middlewares.CrawlingLogsMiddleware`.

0.8.0

------------------

* Added new input parameters:

* ``urls`` accepts a newline-delimited list of URLs.

* ``urls_file`` accepts a URL that points to a plain-text file with a
newline-delimited list of URLs.

Only one of ``url``, ``urls`` and ``urls_file`` should be used at a time.

* Added new crawling strategies:

* ``automatic`` - uses heuristics to see if an input URL is a homepage, for
which it uses a modified ``full`` strategy where other links are discovered
only in the homepage. Otherwise, it assumes it's a navigation page and uses
the existing ``navigation`` strategy.

* ``direct_item`` - input URLs are directly extracted as products.

* Added new parameters classes: ``LocationParam`` and ``PostalAddress``. Note
that these are available for use when customizing the templates and are not
currently being utilized by any template.

* Backward incompatible changes:

* ``automatic`` becomes the new default crawling strategy instead of ``full``.

* CI test improvements.

0.7.2

------------------

* Implemented :ref:`mixin classes for spider parameters <parameter-mixins>`, to
improve reuse.

* Improved docs, providing an example about overriding existing parameters when
:ref:`customizing parameters <custom-params>`, and featuring
:class:`~web_poet.AnyResponse` in the :ref:`example about overriding parsing
<override-parsing>`.

0.7.1

------------------

* The
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams.crawl_strategy`
parameter of
:class:`~zyte_spider_templates.spiders.ecommerce.EcommerceSpider`
now defaults to
:attr:`~zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy.full`
instead of
:attr:`~zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy.navigation`.
We also reworded some descriptions of :enum:`~.EcommerceCrawlStrategy` values
for clarification.

0.7.0

------------------

* Updated requirement versions:

* :doc:`scrapy-poet <scrapy-poet:index>` >= 0.21.0
* :doc:`scrapy-zyte-api <scrapy-zyte-api:index>` >= 0.16.0

* With the updated dependencies above, this fixes the issue of having 2 separate
Zyte API Requests (*productNavigation* and *httpResponseBody*) for the same URL. Note
that this issue only occurs when requesting product navigation pages.

* Moved :class:`zyte_spider_templates.spiders.ecommerce.ExtractFrom` into
:class:`zyte_spider_templates.spiders.base.ExtractFrom`.

0.6.1

------------------

* Improved the :attr:`zyte_spider_templates.spiders.base.BaseSpiderParams.url`
description.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.