Scaledp

Latest version: v0.2.2

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.2.2

What's Changed
* Integrated with Spark PDF DataSource
* Added Object detection by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/47
* Update LLM extractors by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/48
* Improve LLM extractors and another workflows by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/50
* Added LLM Ocr by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/52
* Add LLMNer by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/54
* Added run Black and Ruff in actions by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/55
* Updated VisualLLMExtractor by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/58
* Updated tutorials by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/60
* Improve VisualLLMextractor by mykolamelnykml in https://github.com/StabRise/ScaleDP/pull/62

Related posts:
* [Structured Data Extraction from PDFs with AI](https://stabrise.com/blog/scaledp-with-spark-pdf/)


![image](https://github.com/user-attachments/assets/75754899-9d89-499b-8d78-288728b869a5)



**Full Changelog**: https://github.com/StabRise/ScaleDP/compare/0.1.0rc10...0.2.2

0.1.0rc10

What's Changed
* Added EasyOcr in https://github.com/StabRise/spark-pdf/pull/39
* Added DocTR in https://github.com/StabRise/spark-pdf/pull/44
* Added Surya Ocr in https://github.com/StabRise/spark-pdf/pull/38
* Added support PyTesseract lib for binding to tesseract in https://github.com/StabRise/spark-pdf/pull/4
* Added line width param to the ImageDrawBoxes n https://github.com/StabRise/spark-pdf/pull/5
* Added textSize param to the ImageDrawBoxes in https://github.com/StabRise/spark-pdf/pull/7
*Added list of displayed data to the imagedrawregions in https://github.com/StabRise/spark-pdf/pull/9
* Initialize sphinx docs in https://github.com/StabRise/spark-pdf/pull/20
* Improved test coverage in https://github.com/StabRise/spark-pdf/pull/22
* Added TextToDocument transformer in https://github.com/StabRise/spark-pdf/pull/27
* Refactoring in https://github.com/StabRise/spark-pdf/pull/30
* Changed Ner transformer for work with raw text in https://github.com/StabRise/spark-pdf/pull/31
* Added dockerfile in https://github.com/StabRise/spark-pdf/pull/37, https://github.com/StabRise/spark-pdf/pull/16

**Full Changelog**: https://github.com/StabRise/spark-pdf/commits/0.1.0rc10

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.