Python-doctr

Latest version: v0.11.0

Safety actively analyzes 723132 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 4

0.3.0

This release adds support for PyTorch backend & rotated text elements.

*Release brought to you by fg-mindee & charlesmindee*

**Note**: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights
[beta] Welcome PyTorch :tada:
This release comes with exciting news: we added support of PyTorch for the whole library!

<p align="center"><img src="https://pytorch.org/assets/images/pytorch-logo.png" width="200" height="200"></p>

If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the `USE_TORCH` and `USE_TF` environment variables.
shell
export USE_TORCH='1'

Then DocTR will do the rest for you to play along with PyTorch:
python
import torch
from doctr.models import db_resnet50
model = db_resnet50(pretrained=True).eval()
with torch.no_grad():
out = model(torch.rand(1, 3, 1024, 1024))

More pretrained models to come in the next releases!

Support of rotated boxes
Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

![Rotated bounding boxes](https://user-images.githubusercontent.com/70526046/121030560-df055080-c7a9-11eb-8b19-a3a1a55cf145.png)

Page reconstruction
Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.

python
import matplotlib.pyplot as plt
from doctr.utils.visualization import synthesize_page
from doctr.documents import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
Analyze
result = model(doc)

Reconstruct the first page
reconstructed_page = synthesize_page(result.export()[0])
plt.imshow(reconstructed_page); plt.show()

![Original image](https://user-images.githubusercontent.com/70526046/122777414-4e9c3500-d2ac-11eb-8870-109deb1e28a9.png) ![Page reconstruction](https://user-images.githubusercontent.com/70526046/122777419-4f34cb80-d2ac-11eb-8dba-d4546071f361.png)

Using the predictions from our models, we try to synthesize the document with only its textual information!

Breaking changes

Renamed LinkNet

While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed `linknet` into `linknet16`

0.2.1

This patch release fixes issues with preprocessor and greatly improves text detection models.

*Brought to you by fg-mindee & charlesmindee*

**Note**: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.

Highlights
Improved text detection
With this iteration, DocTR brings you a set of newly pretrained parameters for `db_resnet50` which was trained using a much wider range of data augmentations!

architecture | FUNSD recall | FUNSD precision | CORD recall | CORD precision
-- | -- | -- | -- | --
db_resnet50 + crnn_vgg16_bn (v0.2.0) | 64.8 | 70.3 | 67.7 | 78.4
db_resnet50 + crnn_vgg16_bn (v0.2.1) | 70.08 | 74.77 | 82.19 | 79.67

![OCR sample](https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png)

Sequence prediction confidence
Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Using the following image:
![reco_sample](https://user-images.githubusercontent.com/76527547/117133599-c073fa00-ada4-11eb-831b-412de4d28341.jpeg)

with this snippet
python
from doctr.documents import DocumentFile
from doctr.models import recognition_predictor
predictor = recognition_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/reco_sample.jpg")
print(predictor(doc))

will get you a list of tuples (word value, sequence confidence):

0.2.0

This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).

*Release handled by fg-mindee & charlesmindee*

**Note**: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.

Highlights
New pretrained weights
Enjoy our newly trained detection and recognition models with improved robustness and performances!
Check our fully benchmark in the [documentation](https://mindee.github.io/doctr/latest/models.html#end-to-end-ocr) for further details.

Improved Line & block detection
This release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:

Before | After
-- | --
![Before](https://user-images.githubusercontent.com/70526046/116271250-1979d780-a780-11eb-99cc-f4564fa4c3f0.png) | ![After](https://user-images.githubusercontent.com/70526046/116271231-15e65080-a780-11eb-965f-3636de849ae6.png)

File reading from any source
You can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new `DocumentFile` class methods

python
from doctr.documents import DocumentFile
PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
Web page
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()

If by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside
python
from doctr.documents import DocumentFile
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
Retrieve bounding box and text information
words = pdf_doc.get_words()
`

Reference scripts for training
By adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!

Text detection script (additional details available in [README](https://github.com/mindee/doctr/blob/main/references/detection/README.md))
shell
python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Text recognition script (additional details available in [README](https://github.com/mindee/doctr/blob/main/references/recognition/README.md))
shell
python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Minimal API
If you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!

Run it as follows in a docker container:
shell
PORT=8050 docker-compose up -d --build

Your API is now running locally on port 8050! Navigate to http://localhost:8050/redoc to check your documentation
![API doc](https://user-images.githubusercontent.com/76527547/117133559-b225de00-ada4-11eb-96ba-bd56c1e8d3f3.png)

Or start making your first request!
python
import requests
import io
with open('/path/to/your/image.jpeg', 'rb') as f:
data = f.read()
response = requests.post("http://localhost:8050/recognition", files={'file': io.BytesIO(data)})

Breaking changes

0.1.1

This release patch fixes several bugs, introduces OCR datasets and improves model performances.

*Release handled by fg-mindee & charlesmindee*

**Note**: doctr 0.1.1 requires TensorFlow 2.3.0 or higher.

Highlights
Introduction of vision datasets
Whether this is for training or evaluation purposes, DocTR provides you with objects to easily download and manipulate datasets. Access OCR datasets within a few lines of code:

from doctr.datasets import FUNSD
train_set = FUNSD(train=True, download=True)
img, target = train_set[0]

Model evaluation
While DocTR 0.1.0 gave you access to pretrained models, you had no way to find the performances of these models apart from computing them yourselves. As of now, we have added a performance benchmark in our documentation for all our models and made the evaluation script available for seamless reproducibility:

python scripts/evaluate.py ocr_db_crnn_vgg

Demo app
Since we want to make DocTR a convenience for you to build OCR-related applications and services, we made a minimal [Streamlit](https://streamlit.io/) demo app to showcase its text detection capabilities. You can run the demo with the following commands:

streamlit run demo/app.py

Here is how it renders performing text detection on a sample document:
![doctr_demo](https://user-images.githubusercontent.com/76527547/111645201-c4ea5080-8800-11eb-9807-fd69459e1067.png)

Breaking changes
Metric update & summary
For improved clarity, the evaluation metrics' methods were renamed.

0.1.0

This first release adds pretrained models for end-to-end OCR and document manipulation utilities.

*Release handled by fg-mindee & charlesmindee*

**Note**: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.

Highlights
Easy & high-performing document reading
Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.

PDF reading is a wrapper around [PyMuPDF](https://github.com/pymupdf/PyMuPDF) back-end for fast file reading

from doctr.documents import read_pdf
from path
doc = read_pdf("path/to/your/doc.pdf")
from stream
with open("path/to/your/doc.pdf", 'rb') as f:
doc = read_pdf(f.read())

while image reading is using [OpenCV](https://github.com/opencv/opencv) backend

from doctr.documents import read_img
page = read_img("path/to/your/img.jpg")

Pretrained End-to-End OCR predictors
Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features

Text detection
Currently, only [DBNet](https://arxiv.org/pdf/1911.08947.pdf)-based architectures are supported, more to come in the next releases!

from doctr.documents import read_pdf
from doctr.models import db_resnet50_predictor
model = db_resnet50_predictor(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model(doc)

Text recognition
There are two architectures implemented for recognition: [CRNN](https://arxiv.org/pdf/1602.05875.pdf), and [SAR](https://arxiv.org/pdf/1811.00751.pdf)

from doctr.models import crnn_vgg16_bn_predictor
model = crnn_vgg16_bn_predictor(pretrained=True)

End-to-End OCR
Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])

New features

Documents
Documentation reading and manipulation
- Added PDF (8, 18, 25, 83) and image (30, 79) reading utilities
- Added document structured elements for export (16, 26, 61, 102)

Models
Deep learning model building and inference
- Added model export methods (10)
- Added preprocessing module (20, 25, 36, 50, 55, 77)
- Added text detection model and post-processing (24, 32, 36, 43, 49, 51, 84): DBNet
- Added image cropping function (33, 44)
- Added model param loading function (49, 60)
- Added text recognition post-processing (35, 36, 37, 38, 43, 45, 49, 51, 63, 65, 74, 78, 84, 101, 107, 108, 111, 112): SAR & CRNN
- Added task-specific predictors (39, 52, 58, 62, 85, 98, 102)
- Added VGG16 (36), Resnet31 (70) backbones

Utils
Utility features relevant to the library use cases.
- Added page interactive prediction visualization (54, 82)
- Added custom types (87)
- Added abstract auto-repr object (102)
- Added metric module (110)

Test
Verifications of the package well-being before release
- Added pytest unittests (7, 59, 75, 76, 80, 92, 104)

Documentation
Online resources for potential users
- Updated README (9, 48, 67, 68, 95)
- Added CONTRIBUTING (7, 29, 48, 67)
- Added sphinx built documentation (12, 36, 55, 86, 90, 91, 93, 96, 99, 106)

Others
Other tools and implementations
- Added python package setup (7, 21, 67)
- Added CI verifications (7, 67, 69, 73)
- Added dockerized environment with library installed (17, 19)
- Added issue template (34)
- Added environment collection script (81)
- Added analysis script (85, 95, 103)

v0.1-models
This release is only a mirror for pretrained detection & recognition models.

v0.1-models

v0.1-models

Page 4 of 4

Releases

Has known vulnerabilities

Python-doctr

Page 4 of 4

0.3.0

0.2.1

0.2.0

0.1.1

0.1.0

Page 4 of 4

Links

Releases