Scrapling

Latest version: v0.2.96

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 4

0.2.9

What's changed
New features
1. Introducing the **long-awaited** async support for Scrapling! Now you have the `AsyncFetcher` class version of `Fetcher`, and both `StealthyFetcher` and `PlayWrightFetcher` have a new method called `async_fetch` with the same options.
python
>> from scrapling import StealthyFetcher
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True

2. Now the `StealthyFetcher` class has the `geoip` argument in its fetch methods which when enabled makes the class automatically use IP's longitude, latitude, timezone, country, and locale, then spoof the WebRTC IP address. It will also calculate and spoof the browser's language based on the distribution of language speakers in the target region.
3. Added the `retries` argument to `Fetcher`/`AsyncFetcher` classes so now you can set the number of retries of each request done by `httpx`.
4. Added the `url_join` method to `Adaptor` and Fetchers which takes a relative URL and joins it with the current URL to generate an absolute full URL!
5. Added the `keep_cdata` method to `Adaptor` and Fetchers to stop the parser from removing cdata when needed.
6. Now `Adaptor`/`Response` `body` method returns the raw HTML response when possible (without processing it in the library).
7. Adding logging for the `Response` class so now when you use the Fetchers you will get a log that gives info about the response you got.
Example:

python
>> from scrapling.defaults import Fetcher
>> Fetcher.get('https://books.toscrape.com/index.html')
[2024-12-16 13:33:36] INFO: Fetched (200) <GET https://books.toscrape.com/index.html> (referer: https://www.google.com/search?q=toscrape)
>>

8. Now using all standard string methods on a `TextHandler` like `.replace()` will result in another `TextHandler`. It was returning the standard string before.
9. Big improvements to speed across the library and improvements to stealth in Fetchers classes overall.
10. Added dummy functions like `extract_first`\`extract` which returns the same result as the parent. These functions are added only to make it easy to copy code from Scrapy/Parsel to Scrapling when needed as these functions are used there!
11. Due to refactoring a lot of the code and using caching at the right positions, now doing requests in bulk will have a big speed increase.

Breaking changes
- Now the support for Python 3.8 has been dropped. (Mainly because Playwright stopped supporting it but it was a problematic version anyway)
- The `debug` argument has been removed from all the library, now if you want to set the library to debugging, do this after importing the library:

python
>>> import logging
>>> logging.getLogger("scrapling").setLevel(logging.DEBUG)

Bugs Squashed
1. Now WebGL is enabled by default as a lot of protections are checking if it's enabled now.
2. Some mistakes and typos in the docs/README.

Quality of life changes
1. All logging is now unified under the logger name `scrapling` for easier and cleaner control. We were using the root logger before.
2. Restructured the tests folder into a cleaner structure and added tests for the new features. All the tests were rewritten to a cleaner version and more tests were added for higher coverage.
3. Refactored a big part of the code to be cleaner and easier to maintain.

> All these changes were part of the changes I decided before to add with 0.3 but decided to add them here because it will be some time till the next version. Now the next step is to finish the detailed documentation website and then work on version 0.3

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.8

What's changed
- This is a small update that includes some must-have quality-of-life changes to the code and fixed a typo in the main README file (20)
---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.7

What's changed
New features
- Now if you used the `wait_selector` argument with `StealthyFetcher` and `PlayWrightFetcher` classes, Scrapling will wait again for the JS to fully load and execute like normal. If you used the `network_idle` argument, Scrapling will wait for it again too after waiting for all of that. If the states are all fulfilled then no waiting happens, of course.
- Now you can enable and disable ads on `StealthyFetcher` with the `disable_ads` argument. This is enabled by default and it installs the `ublock origin` addon.
- Now you can set the locale used by `PlayWrightFetcher` with the `locale` argument. The default value is still `en-US`.
- Now the basic requests done through `Fetcher` can accept proxies in this format `http://username:passwordlocalhost:8030`.
- The stealth mode improved a bit for `PlayWrightFetcher`.

Bugs Squashed/Improvements
1. Now enabling proxies on the `PlayWrightFetcher` class is not tied to the `stealth` mode being on or off (Thanks to [AbdullahY36](https://github.com/AbdullahY36) for pointing that out)
2. Now the `ResponseEncoding` tests if the encoding returned from the response can be used with the page or not. If the returned encoding triggered an error, Scrapling defaults to `utf-8`
---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.6

What's changed
New features
- Now the `PlayWrightFetcher` can use the real browser directly with the `real_chrome` argument passed to the `PlayWrightFetcher.fetch` function but this requires you to have Chrome browser installed. Scrapling will launch an instance of your Chrome browser and you can use most of the options as normal. (Before you only had the `cdp_url` argument to do so)
- Pumped up the version of headers generated for real browsers.

Bugs Squashed
1. Turns out the format of the browser headers generated by `BrowserForge` was outdated which made Scrapling detected by some protections so now `BrowserForge` is only used to generate real useragent.
2. Now the `hide_canvas` argument is turned off by default as it's being detected by Google's ReCaptcha.

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.5

What's changed

Bugs Squashed
- Handled an error that happens with the 'wait_selector' argument if it resolved to more than 1 element. This affects the `StealthyFetcher` and the `PlayWrightFetcher` classes.
- Fixed the encoding type in cases where the `content_type` header gets value with parameters like `charset` (Thanks to andyfcx for [12](https://github.com/D4Vinci/Scrapling/issues/12) )

Quality of life
- Added more tests to cover new parts of the code and made tests run in threads.
- I updated the docs strings to be readable correctly on Sphinx's apidoc or similar tools.

New Contributors
- andyfcx made their first contribution at [13 ](https://github.com/D4Vinci/Scrapling/pull/13)

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.4

What's changed

Bugs Squashed
- Fixed a bug when retrieving response bytes after using the `network_idle` argument in both the `StealthyFetcher` and `PlayWrightFetcher` classes. <br/> That was causing the following error message:

Response.body: Protocol error (Network.getResponseBody): No resource with given identifier found

- The PlayWright API sometimes returns empty status text with responses, so now `Scrapling` will calculate it manually if that happens. This affects both the `StealthyFetcher` and `PlayWrightFetcher` classes.

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

Page 2 of 4

Releases

Has known vulnerabilities

Previous Next

Scrapling

Page 2 of 4

0.2.9

0.2.8

0.2.7

0.2.6

0.2.5

0.2.4

Page 2 of 4

Links

Releases