Scrapling

Latest version: v0.2.93

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.2.93

This is an essential update for everyone to fully enjoy Scrapling as it's intended

What's changed
1. The return type is now consistent across all the parser engine so you will always get a return type as one of these `Adaptor`, `Adaptors`, `TextHandler`, `TextHandlers`, `None`, and a list in case you have mixed results like combined CSS selector. This allows a better coding experience with minimum manual type checking, makes the library more stable, and makes chaining methods almost always possible.
2. Most of the parser engine especially the `Adaptor` class got refactored to a cleaner version and most importantly a faster version. So now almost all the methods/properties, especially the searching methods, got a speed increase between 5-40%. Some methods got bigger speed boosts like `find_by_regex` got a ~60% speed boost! The automatch feature got a small ~5% speed boost.
3. Fixed logic bugs with the `find_all`/`find` methods that made the passed filters used in OR fashion and other times as aa AND. So now all filters you, all elements returned need to fulfill it except the passed tag names.
4. Now all regex-related methods return `TextHandler`/`TextHandlers` for easier methods chaining.
5. Added a new `below_elements` property that returns an `Adaptors` object of all elements under the current element in the DOM tree.
6. Now all methods/properties that were returning HTML source as string are now returning it as `TextHandler` so you can do regex easily on it etc...
7. StealthyFetcher is now a bit faster and more stealthy. Also, now the option to make it possible to click Captchas like Cloudflare Turnstile is enabled by default.
8. The auto-completion and type hints improved a lot in nearly half the library. Especially `Adaptor`, `TextHandler`, and `TextHandlers`.
9. Now slicing `TextHandler`, accessing by index, or using the `split` methods returns another `TextHandler` instead of the standard Python string. Now almost all standard string operations return other `Texthandler` to make chaining methods/functions always possible.
10. Fixed some small bugs and typos. For example, the Fetcher async_put was doing post request instead of put request 😶‍🌫️
11. Improved the README a bit till I finish the documentation website.

This was supposed to be a small update till version 0.3 but thought to make it better.
Thanks for all your support!

---

Shoutout to our biggest Sponsor: [Scrapeless](https://www.scrapeless.com/?utm_source=github&utm_medium=ads&utm_campaign=scraping&utm_term=D4Vinci)
[![Scrapeless Banner](https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/scrapeless.jpg)](https://www.scrapeless.com/?utm_source=github&utm_medium=ads&utm_campaign=scraping&utm_term=D4Vinci)

0.2.92

What's changed
- Now response returned by browser-based fetchers uses more reliable data sources in cases where the page loaded uses many Iframes.
- **Now installing `Scrapling` is made even easier, you install it with pip then run `scrapling install` in the terminal and you are ready!**
- Fixed an inaccurate type hint in the parser.

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.91

What's changed
- Fixed a bug where the logging fetch logging sentence was showing in the first request only.
- The default behavior for Playwright API while browsing a page is returning the first response that fulfills the load state given to the `goto` method `["load", "domcontentloaded", "networkidle"]` so if a website has a wait page like Cloudflare's one that redirects you to the real website afterward, Playwright will return the first status code which in this case would be something like 403. This update solves this issue for both `PlaywrightFetcher` and `StealthyFetcher` as both are using Playwright API so the result depends on Playwright's default behavior no more.
- Added support for proxies that use SOCKS proxies in the `Fetcher` class.
- Fixed the type hint for the `wait_selector_state` argument so now it will show the accurate values you should use while auto-completing.

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.9

What's changed
New features
1. Introducing the **long-awaited** async support for Scrapling! Now you have the `AsyncFetcher` class version of `Fetcher`, and both `StealthyFetcher` and `PlayWrightFetcher` have a new method called `async_fetch` with the same options.
python
>> from scrapling import StealthyFetcher
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True

2. Now the `StealthyFetcher` class has the `geoip` argument in its fetch methods which when enabled makes the class automatically use IP's longitude, latitude, timezone, country, and locale, then spoof the WebRTC IP address. It will also calculate and spoof the browser's language based on the distribution of language speakers in the target region.
3. Added the `retries` argument to `Fetcher`/`AsyncFetcher` classes so now you can set the number of retries of each request done by `httpx`.
4. Added the `url_join` method to `Adaptor` and Fetchers which takes a relative URL and joins it with the current URL to generate an absolute full URL!
5. Added the `keep_cdata` method to `Adaptor` and Fetchers to stop the parser from removing cdata when needed.
6. Now `Adaptor`/`Response` `body` method returns the raw HTML response when possible (without processing it in the library).
7. Adding logging for the `Response` class so now when you use the Fetchers you will get a log that gives info about the response you got.
Example:

python
>> from scrapling.defaults import Fetcher
>> Fetcher.get('https://books.toscrape.com/index.html')
[2024-12-16 13:33:36] INFO: Fetched (200) <GET https://books.toscrape.com/index.html> (referer: https://www.google.com/search?q=toscrape)
>>

8. Now using all standard string methods on a `TextHandler` like `.replace()` will result in another `TextHandler`. It was returning the standard string before.
9. Big improvements to speed across the library and improvements to stealth in Fetchers classes overall.
10. Added dummy functions like `extract_first`\`extract` which returns the same result as the parent. These functions are added only to make it easy to copy code from Scrapy/Parsel to Scrapling when needed as these functions are used there!
11. Due to refactoring a lot of the code and using caching at the right positions, now doing requests in bulk will have a big speed increase.

Breaking changes
- Now the support for Python 3.8 has been dropped. (Mainly because Playwright stopped supporting it but it was a problematic version anyway)
- The `debug` argument has been removed from all the library, now if you want to set the library to debugging, do this after importing the library:

python
>>> import logging
>>> logging.getLogger("scrapling").setLevel(logging.DEBUG)


Bugs Squashed
1. Now WebGL is enabled by default as a lot of protections are checking if it's enabled now.
2. Some mistakes and typos in the docs/README.

Quality of life changes
1. All logging is now unified under the logger name `scrapling` for easier and cleaner control. We were using the root logger before.
2. Restructured the tests folder into a cleaner structure and added tests for the new features. All the tests were rewritten to a cleaner version and more tests were added for higher coverage.
3. Refactored a big part of the code to be cleaner and easier to maintain.

> All these changes were part of the changes I decided before to add with 0.3 but decided to add them here because it will be some time till the next version. Now the next step is to finish the detailed documentation website and then work on version 0.3

---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.8

What's changed
- This is a small update that includes some must-have quality-of-life changes to the code and fixed a typo in the main README file (20)
---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

0.2.7

What's changed
New features
- Now if you used the `wait_selector` argument with `StealthyFetcher` and `PlayWrightFetcher` classes, Scrapling will wait again for the JS to fully load and execute like normal. If you used the `network_idle` argument, Scrapling will wait for it again too after waiting for all of that. If the states are all fulfilled then no waiting happens, of course.
- Now you can enable and disable ads on `StealthyFetcher` with the `disable_ads` argument. This is enabled by default and it installs the `ublock origin` addon.
- Now you can set the locale used by `PlayWrightFetcher` with the `locale` argument. The default value is still `en-US`.
- Now the basic requests done through `Fetcher` can accept proxies in this format `http://username:passwordlocalhost:8030`.
- The stealth mode improved a bit for `PlayWrightFetcher`.


Bugs Squashed/Improvements
1. Now enabling proxies on the `PlayWrightFetcher` class is not tied to the `stealth` mode being on or off (Thanks to [AbdullahY36](https://github.com/AbdullahY36) for pointing that out)
2. Now the `ResponseEncoding` tests if the encoding returned from the response can be used with the page or not. If the returned encoding triggered an error, Scrapling defaults to `utf-8`
---

> [!NOTE]
> A friendly reminder that maintaining and improving `Scrapling` takes a lot of time and effort which I have been happily doing for months even though it's becoming harder. So, if you like `Scrapling` and want it to keep improving, you can help by supporting me through the [Sponsor button](https://github.com/sponsors/D4Vinci).

Page 1 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.