Crawl-frontier

Latest version: v0.2.0

Safety actively analyzes 683530 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.4.0

A tremendous work was done:
- `distributed-frontera` and `frontera` were merged together into the single project: to make it easier to use and understand,
- Backend was completely redesigned. Now it's consisting of `Queue`, `Metadata` and `States` objects for low-level code and higher-level `Backend` implementations for crawling policies,
- Added definition of run modes: single process, distributed spiders, distributed spider and backend.
- Overall distributed concept is now integrated into Frontera, making difference between usage of components in single process and distributed spiders/backend run modes clearer.
- Significantly restructured and augmented documentation, addressing user needs in a more accessible way.
- Much less configuration footprint.

Enjoy this new year release and let us know what you think!

0.3.3

- tldextract is no longer minimum required dependency,
- SQLAlchemy backend now persists headers, cookies, and method, also `_create_page` method added to ease customization,
- Canonical solver code (needs documentation)
- Other fixes and improvements

0.3.2

Now, it's possible to configure Frontera from Scrapy settings. The order of precedence for configuration sources is following:
1. Settings defined in the module pointed by FRONTERA_SETTINGS (higher precedence)
2. settings defined in the Scrapy settings,
3. default frontier settings.

0.3.1

Main issue solved in this version is that now, request callbacks and request.meta contents are successfully serializing and deserializing in SQL Alchemy-based backend. Therefore, majority of Scrapy extensions shouldn't suffer from loosing meta or callbacks passing over Frontera anymore. Second, there is hot fix for cold start problem, when seeds are added, and Scrapy is quickly finishing with no further activity. Well thought solution for this will be offered later.

0.3.0

- Frontera is the new name for Crawl Frontier.
- Signature of get_next_requests method is changed, now it accepts arbitrary key-value arguments.
- Overused buffer (subject to remove in the future in favor of downloader internal queue).
- Backend internals became more customizable.
- Scheduler now requests for new requests when there is free space in Scrapy downloader queue, instead of waiting for absolute emptiness.
- Several Frontera middlewares are disabled by default.

0.2.0

- Added documentation (Scrapy Seed Loaders+Tests+Examples)
- Refactored backend tests
- Added requests library example
- Added requests library manager and object converters
- Added FrontierManagerWrapper
- Added frontier object converters
- Fixed script examples for new changes
- Optional Color logging (only if available)
- Changed Scrapy frontier and recorder integration to scheduler+middlewares
- Changed default frontier backend
- Added comment support to seeds
- Added doc requirements for RTD build
- Removed optional dependencies for setup.py and requirements
- Changed tests to pytest
- Updated docstrings and documentation
- Changed frontier componets (Backend and Middleware) to abc
- Modified Scrapy frontier example to use seed loaders
- Refactored Scrapy Seed loaders
- Added new fields to Request and Response frontier objects
- Added ScrapyFrontierManager (Scrapy wrapper for Frontier Manager)
- Changed frontier core objects (Page/Link to Request/Response)

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.