Firecrawl-py

Latest version: v1.13.5

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

1.5.0

Self-Host Fixes
- **Reworked Guide:** The `SELF_HOST.md` and `docker-compose.yaml` have been updated for clarity and compatibility
- **Kubernetes Improvements:** Updated self-hosted Kubernetes deployment examples for compatibility and consistency (1177)
- **Self-Host Fixes:** Numerous fixes aimed at improving self-host performance and stability (1207)
- **Proxy Support:** Added proxy support tailored for self-hosted environments (1212)
- **Playwright Integration:** Added fixes and continuous integration for the Playwright microservice (1210)
- **Search Endpoint Upgrade:** Added SearXNG support for the `/search` endpoint (1193)

Core Fixes & Enhancements
- **Crawl Status Fixes:** Fixed various race conditions in the crawl status endpoint (1184)
- **Timeout Enforcement:** Added timeout for scrapeURL engines to prevent hanging requests (1183)
- **Query Parameter Retention:** Map function now preserves query parameters in results (1191)
- **Screenshot Action Order:** Ensured screenshots execute after specified actions (1192)
- **PDF Scraping:** Improved handling for PDFs behind anti-bot measures (1198)
- **Map/scrapeURL Abort Control:** Integrated AbortController to stop scraping when the request times out (1205)
- **SDK Timeout Enforcement:** Enforced request timeouts in the SDK (1204)

New Features & Additions
- **Proxy & Stealth Options:** Introduced a proxy option and stealthProxy flag (1196)
- **Deep Research (Alpha):** Launched an alpha implementation of deep research (1202)
- **LLM Text Generator:** Added a new endpoint for llms.txt generation (1201)

Docker & Containerization
- **Production Ready Docker Image:** A streamlined, production ready Docker image is now available to simplify self-hosted deployments.

For the complete details, check out the [full changelog](https://github.com/mendableai/firecrawl/compare/v1.4.4...v1.5.0).

What's Changed
* fix(crawl-status): consider concurrency limited jobs as prioritized (FIR-851) by mogery in https://github.com/mendableai/firecrawl/pull/1184
* fix(scrapeURL/sb): enforce timeout (FIR-980) by mogery in https://github.com/mendableai/firecrawl/pull/1183
* fix(map): do not remove query parameters from results (FIR-1015) by mogery in https://github.com/mendableai/firecrawl/pull/1191
* fix(scrapeURL/fire-engine): perform format screenshot after specified actions (FIR-985) by mogery in https://github.com/mendableai/firecrawl/pull/1192
* Update self-hosted Kubernetes deployments examples for compatibility and consistency by tetuyoko in https://github.com/mendableai/firecrawl/pull/1177
* fix(v1/types): fix extract -> json rename (FIR-1072) by mogery in https://github.com/mendableai/firecrawl/pull/1195
* feat(v1): proxy option / stealthProxy flag (FIR-1050) by mogery in https://github.com/mendableai/firecrawl/pull/1196
* fix(v1/types): fix extract -> json rename, ROUND II (FIR-1072) by mogery in https://github.com/mendableai/firecrawl/pull/1199
* (feat/deep-research) Alpha implementation of deep research by nickscamara in https://github.com/mendableai/firecrawl/pull/1202
* Add llmstxt generator endpoint by ericciarla in https://github.com/mendableai/firecrawl/pull/1201
* fix(concurrency-limit): move to renewing a lock on each active job instead of estimating time to complete (FIR-1075) by mogery in https://github.com/mendableai/firecrawl/pull/1197
* SELFHOST FIXES (FIR-1105) by mogery in https://github.com/mendableai/firecrawl/pull/1207
* feat(v1/map): stop mapping if timed out via AbortController (FIR-747) by mogery in https://github.com/mendableai/firecrawl/pull/1205
* Playwright page error schema by makeiteasierapps in https://github.com/mendableai/firecrawl/pull/1172
* feat(ci/self-host): add playwright microservice tests by mogery in https://github.com/mendableai/firecrawl/pull/1210
* feat(scrapeURL): handle PDFs behind anti-bot (FIR-722) by mogery in https://github.com/mendableai/firecrawl/pull/1198
* Use correct list typing for py 3.8 support by niazarak in https://github.com/mendableai/firecrawl/pull/931
* feat(map): mock support (FIR-1109) by mogery in https://github.com/mendableai/firecrawl/pull/1213
* Add searxng for search endpoint by loorisr in https://github.com/mendableai/firecrawl/pull/1193
* feat(sdk): enforce timeout on client-side if set (FIR-864) by mogery in https://github.com/mendableai/firecrawl/pull/1204
* feat(self-host): proxy support (FIR-1111) by mogery in https://github.com/mendableai/firecrawl/pull/1212
* temp by mogery in https://github.com/mendableai/firecrawl/pull/1218
* gemini extractor Implementation by aparupganguly in https://github.com/mendableai/firecrawl/pull/1206

New Contributors
* tetuyoko made their first contribution in https://github.com/mendableai/firecrawl/pull/1177
* makeiteasierapps made their first contribution in https://github.com/mendableai/firecrawl/pull/1172
* niazarak made their first contribution in https://github.com/mendableai/firecrawl/pull/931
* loorisr made their first contribution in https://github.com/mendableai/firecrawl/pull/1193

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v1.4.4...v1.5.0

1.4.4

🚀 Features & Enhancements

- Scrape API: Added action & wait time validation ([1146](https://github.com/mendableai/firecrawl/pull/1146))
- Extraction Improvements:
- Added detection of PDF/image sub-links & extracted text via Gemini ([1173](https://github.com/mendableai/firecrawl/pull/1173))
- Multi-entity prompt enhancements for extraction ([1181](https://github.com/mendableai/firecrawl/pull/1181))
- Show sources out of __experimental in extraction ([1180](https://github.com/mendableai/firecrawl/pull/1180))
- Environment Setup: Added Serper & Search API env vars to docker-compose ([1147](https://github.com/mendableai/firecrawl/pull/1147))
- Credit System Update: Now displays "tokens" instead of "credits" when out of tokens ([1178](https://github.com/mendableai/firecrawl/pull/1178))

✏️ Examples
- Gemini 2.0 Crawler: Implemented new crawling example ([1161](https://github.com/mendableai/firecrawl/pull/1161))
- Gemini TrendFinder: [https://github.com/mendableai/gemini-trendfinder](https://github.com/mendableai/gemini-trendfinder)
- Normal Search to Open Deep Research: [https://github.com/nickscamara/open-deep-research](https://github.com/nickscamara/open-deep-research)

🐛 Fixes

- HTML Transformer: Updated free_string function parameter type ([1163](https://github.com/mendableai/firecrawl/pull/1163))
- Gemini Crawler: Updated library & improved PDF link extraction ([1175](https://github.com/mendableai/firecrawl/pull/1175))
- Crawl Queue Worker: Only reports successful page count in num_docs ([1179](https://github.com/mendableai/firecrawl/pull/1179))
- Scraping & URLs:
- Fixed relative URL conversion ([584](https://github.com/mendableai/firecrawl/pull/584))
- Enforced scrape rate limit in batch scraping ([1182](https://github.com/mendableai/firecrawl/pull/1182))

What's Changed
* [FIR-796] feat(api/types): Add action and wait time validation for scrape requests by ftonato in https://github.com/mendableai/firecrawl/pull/1146
* Implemented Gemini 2.0 crawler by aparupganguly in https://github.com/mendableai/firecrawl/pull/1161
* Add Serper and Search API env vars to docker-compose by RealLukeMartin in https://github.com/mendableai/firecrawl/pull/1147
* fix(html-transformer): Update free_string function parameter type by carterlasalle in https://github.com/mendableai/firecrawl/pull/1163
* Add detection of PDF/image sub-links and extract text via Gemini by mayooear in https://github.com/mendableai/firecrawl/pull/1173
* fix: update gemini library. extract pdf links from scraped content by mayooear in https://github.com/mendableai/firecrawl/pull/1175
* feat(v1/checkCredits): say "tokens" instead of "credits" if out of tokens by mogery in https://github.com/mendableai/firecrawl/pull/1178
* feat(v1/extract) Show sources out of __experimental by nickscamara in https://github.com/mendableai/firecrawl/pull/1180
* (feat/extract) Multi-entity prompt improvements by nickscamara in https://github.com/mendableai/firecrawl/pull/1181
* fix(queue-worker/crawl): only report successful page count in num_docs (FIR-960) by mogery in https://github.com/mendableai/firecrawl/pull/1179
* fix: relative url 2 full url use error base url by dolonfly in https://github.com/mendableai/firecrawl/pull/584
* fix(v1/batch/scrape): use scrape rate limit by mogery in https://github.com/mendableai/firecrawl/pull/1182

New Contributors
* RealLukeMartin made their first contribution in https://github.com/mendableai/firecrawl/pull/1147
* carterlasalle made their first contribution in https://github.com/mendableai/firecrawl/pull/1163
* mayooear made their first contribution in https://github.com/mendableai/firecrawl/pull/1173
* dolonfly made their first contribution in https://github.com/mendableai/firecrawl/pull/584

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v1.4.3...v1.4.4

1.4.3

Summary of changes
- Open Deep Research: An open source version of OpenAI Deep Research. See [here](https://github.com/nickscamara/open-deep-research)
- R1 Web Extractor Feature: New extraction capability added.
- O3-Mini Web Crawler: Introduces a lightweight crawler for specific use cases.
- Updated Model Parameters: Enhancements to o3-mini_company_researcher.
- URL Deduplication: Fixes handling of URLs ending with /, index.html, index.php, etc.
- Improved URL Blocking: Uses tldts parsing for better blocklist management.
- Valid JSON via rawHtml in Scrape: Ensures valid JSON extraction.
- Product Reviews Summarizer: Implements summarization using o3-mini.
- Scrape Options for Extract: Adds more configuration options for extracting data.
- O3-Mini Job Resource Extractor: Extracts job-related resources using o3-mini.
- Cached Scrapes for Extract evals: Improves performance by using cached data for extractions evals.

What's Changed
* You forgot an 'e' by sami0596 in https://github.com/mendableai/firecrawl/pull/1118
* added cached scrapes to extract by rafaelsideguide in https://github.com/mendableai/firecrawl/pull/1107
* Added R1 web extractor feature by aparupganguly in https://github.com/mendableai/firecrawl/pull/1115
* Feature o3-mini web crawler by aparupganguly in https://github.com/mendableai/firecrawl/pull/1120
* Updated Model Parameters (o3-mini_company_researcher) by aparupganguly in https://github.com/mendableai/firecrawl/pull/1130
* Fix corepack and self hosting setup by rothnic in https://github.com/mendableai/firecrawl/pull/1131
* fix(crawl-redis/generateURLPermutations): dedupe index.html/index.php/slash/bare URL ends (FIR-827) by mogery in https://github.com/mendableai/firecrawl/pull/1134
* feat(blocklist): Improve URL blocking with tldts parsing by ftonato in https://github.com/mendableai/firecrawl/pull/1117
* fix(scrape): allow getting valid JSON via rawHtml (FIR-852) by mogery in https://github.com/mendableai/firecrawl/pull/1138
* Implemented prodcut reviews summarizer using o3 mini by aparupganguly in https://github.com/mendableai/firecrawl/pull/1139
* [Feat] Added scrapeOptions to extract by rafaelsideguide in https://github.com/mendableai/firecrawl/pull/1133
* Feature/o3 mini job resource extractor by aparupganguly in https://github.com/mendableai/firecrawl/pull/1144

New Contributors
* sami0596 made their first contribution in https://github.com/mendableai/firecrawl/pull/1118
* aparupganguly made their first contribution in https://github.com/mendableai/firecrawl/pull/1115
* rothnic made their first contribution in https://github.com/mendableai/firecrawl/pull/1131

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v1.4.2...v1.4.3

1.4.2

We're excited to announce several new features and improvements:

New Features
- Added web search capabilities to the extract endpoint via the `enableWebSearch` parameter
- Introduced source tracking with `__experimental_showSources` parameter
- Added configurable webhook events for crawl and batch operations
- New `timeout` parameter for map endpoint
- Optional ad blocking with `blockAds` parameter (enabled by default)

Infrastructure & UI
- Enhanced proxy selection and infrastructure reliability
- Added domain checker tool to cloud platform
- Redesigned LLMs.txt generator interface for better usability

What's Changed
* (feat/extract) Refactor and Reranker improvements by nickscamara in https://github.com/mendableai/firecrawl/pull/1100
* Fix bad WebSocket URL in CrawlWatcher by ProfHercules in https://github.com/mendableai/firecrawl/pull/1053
* (feat/extract) Add sources to the extraction by nickscamara in https://github.com/mendableai/firecrawl/pull/1101
* feat(v1/map): Timeout parameter (FIR-393) by mogery in https://github.com/mendableai/firecrawl/pull/1105
* fix(scrapeURL/fire-engine): default to separate US-generic proxy list if no location is specified (FIR-728) by mogery in https://github.com/mendableai/firecrawl/pull/1104
* feat(scrapeUrl/fire-engine): add blockAds flag (FIR-692) by mogery in https://github.com/mendableai/firecrawl/pull/1106
* (feat/extract) Logs analyzeSchemaAndPrompt output did not match the schema by nickscamara in https://github.com/mendableai/firecrawl/pull/1108
* (feat/extract) Improved completions to use model's limits by nickscamara in https://github.com/mendableai/firecrawl/pull/1109
* feat(v0): store v0 users (team ID) in Redis for collection (FIR-698) by mogery in https://github.com/mendableai/firecrawl/pull/1111
* feat(github/ci): connect to tailscale (FIR-748) by mogery in https://github.com/mendableai/firecrawl/pull/1112
* (feat/conc) Move fully to a concurrency limit system by nickscamara in https://github.com/mendableai/firecrawl/pull/1045
* Added instructions for empty string to extract prompts by rafaelsideguide in https://github.com/mendableai/firecrawl/pull/1114

New Contributors
* ProfHercules made their first contribution in https://github.com/mendableai/firecrawl/pull/1053

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/1.4.1...v1.4.2
**Firecrawl website changelog**: https://firecrawl.dev/changelog

1.4.1

We've significantly enhanced our data extraction capabilities with several key updates:

- Extract now returns a lot more data due to a new re-ranker system
- Improved infrastructure reliability
- Migrated from Cheerio to a high-performance Rust-based parser for faster and more memory-efficient parsing
- Enhanced crawl cancellation functionality for better control over running jobs

What's Changed
* Added "today" to extract prompts by rafaelsideguide in https://github.com/mendableai/firecrawl/pull/1084
* docs: update cancel crawl response by ftonato in https://github.com/mendableai/firecrawl/pull/1087
* port most of cheerio stuff to rust by mogery in https://github.com/mendableai/firecrawl/pull/1089
* Re-ranker changes by nickscamara in https://github.com/mendableai/firecrawl/pull/1090
* Rerank with lower threshold + back to map if length = 0 by rafaelsideguide in https://github.com/mendableai/firecrawl/pull/1086

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v1.4.0...1.4.1

1.4.0

Get structured web data with /extract

We’re excited to announce the [release of **/extract**](https://www.firecrawl.dev/extract) - get data from any website with just a prompt. With /extract, you can retrieve any information from anywhere on a website without being limited by scraping roadblocks or the typical context constraints of LLMs.

![Frame 46557](https://github.com/user-attachments/assets/551429dc-c78d-42f4-ae41-abb06e965585)

No more manual copy-pasting, broken scraping scripts, or debugging LLM calls. - it’s never been easier to **enrich your data**, **create datasets**, or **power AI applications** with clean, structured data from any website.

Companies are already using extract to:
- Enrich CRM data
- Streamline KYB processes
- Monitor competitors
- Supercharge onboarding experiences
- Build targeted prospecting lists

Instead of spending hours manually researching, fixing broken scrapers, or piecing together data from multiple sources, simply specify what information you need and the target website, and let the Firecrawl handle the entire retrieval process.

Specifically, you can:
- Extract structured data from entire websites using URL wildcards (https://example.com/*)
- Define custom schemas to capture exactly what you need—from simple product details to complex organizational structures
- Guide the extraction with custom prompts to ensure the LLM focuses on your target information
- Deploy anywhere with comprehensive support for Python, Node, cURL, and other popular tools. For no-code workflows, just connect via Zapier or use our API to set up integrations with other tools.

This versatility translates into a wide range of real-world applications—enabling you to enrich web data for just about any use case.

Limitations - (and the road ahead)
- Let's be honest - while /extract is pretty awesome at grabbing web data, it's not perfect yet. Here's what we're still working on:
- Big sites are tricky - It can't (yet!) grab every single product on Amazon in one go
- Complex searches need work - Things like "find all posts from 2025" aren't quite there
- Sometimes, it's a bit quirky - Results can vary between runs, though it usually gets what you need
- **But here's the exciting part: we're seeing the future of web scraping take shape**

Try it out
Curious to try /extract out for yourself?
Visit our [playground](https://www.firecrawl.dev/playground?mode=extract) to try out /extract - you get **500,000 tokens for free**
Dive into our [Extract Beta documentation](https://docs.firecrawl.dev/features/extract-beta) for detailed technical guidance and API reference
Want a no-code solution? Connect /extract to thousands of applications through our enhanced [Zapier integration](https://zapier.com/apps/firecrawl/integrations)

That's all for now! Happy Extracting from the whole Firecrawl team 🔥

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v.1.3.0...v1.4.0

v.1.3.0
What's Changed
* feat: new snips test framework (FIR-414) by mogery in https://github.com/mendableai/firecrawl/pull/1033
* (feat/extract) New re-ranker + multi entity extraction by nickscamara in https://github.com/mendableai/firecrawl/pull/1061
* __experimental_streamSteps by nickscamara in https://github.com/mendableai/firecrawl/pull/1063

**Full Changelog**: https://github.com/mendableai/firecrawl/compare/v1.2.1...v.1.3.0

Page 1 of 2

Releases

Has known vulnerabilities

Firecrawl-py

Page 1 of 2

1.5.0

1.4.4

1.4.3

1.4.2

1.4.1

1.4.0

Page 1 of 2

Links

Releases