Python-doctr

Latest version: v0.11.0

Safety actively analyzes 715033 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.6.0

<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>

Highlights of the release:

**Note**: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

![hf](https://assets.st-note.com/production/uploads/images/35450010/rectangle_large_type_2_7f287c8bb8ad90f69c4a537719b32ace.png?fit=bounds&quality=85&width=1280)

- Loading from hub:


from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)


- Pushing to the hub:


from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task


from doctr.datasets import CORD
Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

- classification: VisionTransformer (ViT)
- recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

Bug fixes recognition models

- MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

ONNX support (experimential)

- All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

- our demo is now also PyTorch compatible, thanks to odulcy-mindee
- it is now possible to detect the language of the extracted text, thanks to aminemindee


What's Changed
Breaking Changes 🛠
* feat: :sparkles: allow beam width > 1 in the CRNN postprocessor by khalidMindee in https://github.com/mindee/doctr/pull/630
* [Fix] TensorFlow SAR_Resnet31 implementation by felixdittrich92 in https://github.com/mindee/doctr/pull/925
New Features
* [onnx] classification models export by felixdittrich92 in https://github.com/mindee/doctr/pull/830
* feat: Added Vietnamese entry in VOCAB by calibretaliation in https://github.com/mindee/doctr/pull/878
* feat: Added Czech to the set of vocabularies in datasets/vocabs.py by Xargonus in https://github.com/mindee/doctr/pull/885
* feat: Add ability to upload PT/TF models to Huggingface Hub by felixdittrich92 in https://github.com/mindee/doctr/pull/881
* [feature][tf/pt] integrate from_hub for all tasks by felixdittrich92 in https://github.com/mindee/doctr/pull/892
* [feature] Part 2 from use datasets for recognition by felixdittrich92 in https://github.com/mindee/doctr/pull/891
* [datasets] Add MJSynth (Synth90K) by felixdittrich92 in https://github.com/mindee/doctr/pull/827
* [docu]: add documentation for datasets by felixdittrich92 in https://github.com/mindee/doctr/pull/905
* add a Slack Community badge by fharper in https://github.com/mindee/doctr/pull/936
* Feat/add language detection by aminemindee in https://github.com/mindee/doctr/pull/1023
* add ViT as classification model TF and PT by felixdittrich92 in https://github.com/mindee/doctr/pull/1050
* [models] add ViTSTR TF and PT and update ViT to work as backbone by felixdittrich92 in https://github.com/mindee/doctr/pull/1055
Bug Fixes
* [PyTorch][references] fix pretrained with different vocabs by felixdittrich92 in https://github.com/mindee/doctr/pull/874
* [classification] Fix cfgs by felixdittrich92 in https://github.com/mindee/doctr/pull/883
* docs: Fixed typo in installation instructions by frgfm in https://github.com/mindee/doctr/pull/901
* [Fix] imgur5k test by felixdittrich92 in https://github.com/mindee/doctr/pull/903
* fix: Fixed load_pretrained_params in PyTorch when ignoring keys by frgfm in https://github.com/mindee/doctr/pull/902
* [Fix]: Documentation add missing in vocabs and correct tab in sharing models by felixdittrich92 in https://github.com/mindee/doctr/pull/904
* Fix links in readme by jsn5 in https://github.com/mindee/doctr/pull/937
* [Fix] PyTorch MASTER implementation by felixdittrich92 in https://github.com/mindee/doctr/pull/941
* [Fix] MJSynth dataset: filter corrupted or missing images by felixdittrich92 in https://github.com/mindee/doctr/pull/956
* [Fix] SVT dataset: clip box values and add shape and label check by felixdittrich92 in https://github.com/mindee/doctr/pull/955
* [Fix] Tensorflow MASTER implementation by felixdittrich92 in https://github.com/mindee/doctr/pull/949
* [FIX] MASTER AMP and onnxruntime issue with master PT by felixdittrich92 in https://github.com/mindee/doctr/pull/986
* pytest-api test: fix ping server step by odulcy-mindee in https://github.com/mindee/doctr/pull/997
* docs/index: fix two minor typos by mara004 in https://github.com/mindee/doctr/pull/1002
* Fix orientation details export by aminemindee in https://github.com/mindee/doctr/pull/1022
* Changed return type of multithread_exec to iterator by mtvch in https://github.com/mindee/doctr/pull/1019
* [datasets] Fix recognition parts of SynthText and IMGUR5K by felixdittrich92 in https://github.com/mindee/doctr/pull/1038
* [Fix] rotation classifier input move to model device by felixdittrich92 in https://github.com/mindee/doctr/pull/1039
* [models] Vit: fix intermediate size scale and unify TF to PT by felixdittrich92 in https://github.com/mindee/doctr/pull/1063
Improvements
* chore: Applied post release modifications v0.5.1 by felixdittrich92 in https://github.com/mindee/doctr/pull/870
* [refactor][fix]: Part1 from use datasets for recognition task by felixdittrich92 in https://github.com/mindee/doctr/pull/889
* ci: Add swagger ping in API CI job by frgfm in https://github.com/mindee/doctr/pull/906
* [docs] Add naming conventions for upload models to hf hub by felixdittrich92 in https://github.com/mindee/doctr/pull/921
* docs: Improved error message of encode_string by frgfm in https://github.com/mindee/doctr/pull/929
* [Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by felixdittrich92 in https://github.com/mindee/doctr/pull/930
* Add support page in README by jonathanMindee in https://github.com/mindee/doctr/pull/946
* [references] Add eval recognition and update eval detection scripts by felixdittrich92 in https://github.com/mindee/doctr/pull/933
* update pypdfium2 dep and improve code quality by felixdittrich92 in https://github.com/mindee/doctr/pull/953
* docs: Moved need help section after code snippet by frgfm in https://github.com/mindee/doctr/pull/959
* chore: Updated TF requirements to fix grouped convolutions on CPU by frgfm in https://github.com/mindee/doctr/pull/963
* style: Fixed mypy and moved tool configs to pyproject.toml by frgfm in https://github.com/mindee/doctr/pull/966
* Updating the readme by Atomme1 in https://github.com/mindee/doctr/pull/938
* Update docs in `using_doctr` by odulcy-mindee in https://github.com/mindee/doctr/pull/993
* feat: add a basic example of text detection by ianardee in https://github.com/mindee/doctr/pull/999
* Add pytorch demo by odulcy-mindee in https://github.com/mindee/doctr/pull/1008
* [build] move requirements to pyproject.toml by felixdittrich92 in https://github.com/mindee/doctr/pull/1031
* Migrate static data from github to monitoring middleware. by marvinmindee in https://github.com/mindee/doctr/pull/1033
* Changes needed to be able to use doctr on AWS Lambda by mtvch in https://github.com/mindee/doctr/pull/1017
* [Fix] unify recognition dataset parts return signature by felixdittrich92 in https://github.com/mindee/doctr/pull/1041
* Updated README.md for custom fonts by carl-krikorian in https://github.com/mindee/doctr/pull/1051
* [refactor] detection script by felixdittrich92 in https://github.com/mindee/doctr/pull/1060
* [models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by felixdittrich92 in https://github.com/mindee/doctr/pull/1072
* upgrade pypdfium2 by felixdittrich92 in https://github.com/mindee/doctr/pull/1075
* ViTSTR disable pretrained backbone by default by felixdittrich92 in https://github.com/mindee/doctr/pull/1080
Miscellaneous
* [Refactor] commit tags by felixdittrich92 in https://github.com/mindee/doctr/pull/871
* Update `io/pdf.py` to new pypdfium2 API by mara004 in https://github.com/mindee/doctr/pull/944
* docs: Documentation the reason for keras version specifier by frgfm in https://github.com/mindee/doctr/pull/958
* [datasets] update IC / SROIE / FUNSD / CORD by felixdittrich92 in https://github.com/mindee/doctr/pull/983
* [datasets] revert whitespace filtering and fix svhn reco by felixdittrich92 in https://github.com/mindee/doctr/pull/987
* fix: update tensorflow-addons to match tensorflow version by ianardee in https://github.com/mindee/doctr/pull/998
* move transformers implementation to modules by felixdittrich92 in https://github.com/mindee/doctr/pull/1013
* [FIX] revert dev deps mistake by felixdittrich92 in https://github.com/mindee/doctr/pull/1047
* [models] update vit and transformer layer norm by felixdittrich92 in https://github.com/mindee/doctr/pull/1059
* make pretrained backbone flexible in predictor by felixdittrich92 in https://github.com/mindee/doctr/pull/1061
* handle LocalizationConfusion memory consuption and upgrade min weasyprint version by felixdittrich92 in https://github.com/mindee/doctr/pull/1062
* Fixed small typo in references recognition by carl-krikorian in https://github.com/mindee/doctr/pull/1070
* [docs] install extras for MacBooks with M1 chip by felixdittrich92 in https://github.com/mindee/doctr/pull/1076
* update version for minor release by felixdittrich92 in https://github.com/mindee/doctr/pull/1073

New Contributors
* calibretaliation made their first contribution in https://github.com/mindee/doctr/pull/878
* Xargonus made their first contribution in https://github.com/mindee/doctr/pull/885
* khalidMindee made their first contribution in https://github.com/mindee/doctr/pull/630
* frgfm made their first contribution in https://github.com/mindee/doctr/pull/901
* jsn5 made their first contribution in https://github.com/mindee/doctr/pull/937
* fharper made their first contribution in https://github.com/mindee/doctr/pull/936
* jonathanMindee made their first contribution in https://github.com/mindee/doctr/pull/946
* Atomme1 made their first contribution in https://github.com/mindee/doctr/pull/938
* odulcy-mindee made their first contribution in https://github.com/mindee/doctr/pull/993
* ianardee made their first contribution in https://github.com/mindee/doctr/pull/998
* aminemindee made their first contribution in https://github.com/mindee/doctr/pull/1022
* mtvch made their first contribution in https://github.com/mindee/doctr/pull/1019
* marvinmindee made their first contribution in https://github.com/mindee/doctr/pull/1033
* carl-krikorian made their first contribution in https://github.com/mindee/doctr/pull/1051

**Full Changelog**: https://github.com/mindee/doctr/compare/v0.5.1...v0.6.0

0.5.1

<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>

This minor release includes: improvement of the documentation thanks to felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to fg-mindee !

**Note**: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improvement of the documentation

The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.
This how it renders:

![doc](https://user-images.githubusercontent.com/70526046/159456296-48529ffd-9fd7-4517-bcd4-3d4de9368419.png)
![Capture d’écran de 2022-03-22 11-08-31](https://user-images.githubusercontent.com/70526046/159457048-abd970b9-436e-40dd-b940-ec16baadb53b.png)

Rotated text detection extended to Tensorflow backend

We provide weights for the `linknet_resnet18_rotation` model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.
All these improvements led to much better results, and the pretrained model is now very robust.

Preserving the aspect ratio in the detection task

You can now choose to preserve the aspect ratio in the detection_predictor:


>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

This option can also be activated in the high level end-to-end predictor:


>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)


Integration within the HugginFace Hub

The artefact detection model is now available on the [HugginFace Hub,](https://huggingface.co/mindee/fasterrcnn_mobilenet_v3_large_fpn) this is amazing:

![Capture d’écran de 2022-03-22 11-33-14](https://user-images.githubusercontent.com/70526046/159462918-0ce6807b-4096-44f9-b238-60279ac9034b.png)

On DocTR, you can now use the .`from_hub()` method so that those 2 snippets are equivalent:


Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)

and:


HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")



Breaking changes

Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible

We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.


Full changelog
What's Changed
Breaking Changes 🛠
* fix: polygon orientation + line aggregation by charlesmindee in https://github.com/mindee/doctr/pull/801
* refactor: Switched from PyMuPDF to pypdfium2 by fg-mindee in https://github.com/mindee/doctr/pull/829
New Features
* feat: Added RandomHorizontalFLip in TF by SiddhantBahuguna in https://github.com/mindee/doctr/pull/779
* Imgur5k dataset integration by felixdittrich92 in https://github.com/mindee/doctr/pull/785
* feat: Added support of GPU for predictors in PyTorch by fg-mindee in https://github.com/mindee/doctr/pull/808
* Add SynthWordGenerator to text reco training scripts by felixdittrich92 in https://github.com/mindee/doctr/pull/825
* fix: Fixed some ResNet architecture imprecisions by fg-mindee in https://github.com/mindee/doctr/pull/828
* feat: Added shadow augmentation for all backends by fg-mindee in https://github.com/mindee/doctr/pull/811
* feat: Added loading method for PyTorch artefact detection models from HF Hub by fg-mindee in https://github.com/mindee/doctr/pull/836
* feat: add rotated linknet_resnet18 tensorflow ckpts by charlesmindee in https://github.com/mindee/doctr/pull/817
Bug Fixes
* fix: Fixed rotation of img + target by fg-mindee in https://github.com/mindee/doctr/pull/784
* fix: show sample when batch size is 1 by charlesmindee in https://github.com/mindee/doctr/pull/787
* ci: Fixed PR label check job by fg-mindee in https://github.com/mindee/doctr/pull/792
* ci: Fixed typo in the script ref by fg-mindee in https://github.com/mindee/doctr/pull/794
* [datasets] fix description by felixdittrich92 in https://github.com/mindee/doctr/pull/795
* fix: linknet target computation by charlesmindee in https://github.com/mindee/doctr/pull/803
* ci: Fixed issue templates by fg-mindee in https://github.com/mindee/doctr/pull/806
* fix: Reverted mistake in demo by fg-mindee in https://github.com/mindee/doctr/pull/810
* Restore remap boxes by Rob192 in https://github.com/mindee/doctr/pull/812
* fix: Fixed SAR model for training and inference in PyTorch by fg-mindee in https://github.com/mindee/doctr/pull/831
* fix: Fixed expand_line for horizontal & vertical cases by fg-mindee in https://github.com/mindee/doctr/pull/842
* fix: Fixes inplace target modifications for AbstractDatasets by fg-mindee in https://github.com/mindee/doctr/pull/848
* fix: Fixed landing page and title underlines by fg-mindee in https://github.com/mindee/doctr/pull/860
* docs: Fixed HTML title by fg-mindee in https://github.com/mindee/doctr/pull/864
Improvements
* docs: Updated headers of python files by fg-mindee in https://github.com/mindee/doctr/pull/781
* [datasets] unify np_dtype and fix comments by felixdittrich92 in https://github.com/mindee/doctr/pull/782
* fix: Clip in rotation transform + eval_straight mode for training by charlesmindee in https://github.com/mindee/doctr/pull/786
* refactor: Avoids instantiating orientation predictor when unnecessary by fg-mindee in https://github.com/mindee/doctr/pull/809
* feat: add straight-eval arg in evaluate script by charlesmindee in https://github.com/mindee/doctr/pull/793
* feat: add dice loss in linknet by charlesmindee in https://github.com/mindee/doctr/pull/816
* feat: add shrinked target in linknet + dilation in postprocessing by charlesmindee in https://github.com/mindee/doctr/pull/822
* feat: replace bce by focal loss in linknet loss by charlesmindee in https://github.com/mindee/doctr/pull/824
* docs: add rotation in docs by charlesmindee in https://github.com/mindee/doctr/pull/846
* feat: add aspect ratio for ocr predictor by charlesmindee in https://github.com/mindee/doctr/pull/835
* feat: add target to resize transform for aspect ratio training (detection task) by charlesmindee in https://github.com/mindee/doctr/pull/823
* update bug report ticket with Active backend field by felixdittrich92 in https://github.com/mindee/doctr/pull/853
* Theme + css 1 by felixdittrich92 in https://github.com/mindee/doctr/pull/856
* docs: Adds illustration in the docstrings of doctr.datasets by felixdittrich92 in https://github.com/mindee/doctr/pull/857
* docs: Updated docstrings of io, transforms & utils by felixdittrich92 in https://github.com/mindee/doctr/pull/859
* docs: Updated folder hierarchy of doc source and nootbooks to rst file by felixdittrich92 in https://github.com/mindee/doctr/pull/862
* Doc models 5 by felixdittrich92 in https://github.com/mindee/doctr/pull/861
* fix: linknet hyperparameters postprocessing + demo for rotation model by charlesmindee in https://github.com/mindee/doctr/pull/865
Miscellaneous
* chore: Applied post release modifications by fg-mindee in https://github.com/mindee/doctr/pull/780
* Switch to new pypdfium2 API by mara004 in https://github.com/mindee/doctr/pull/845

New Contributors
* mara004 made their first contribution in https://github.com/mindee/doctr/pull/845

**Full Changelog**: https://github.com/mindee/doctr/compare/v0.5.0...v0.5.1

0.5.0

<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>

This release adds support of rotated documents, and extends both the model & dataset zoos.

**Note**: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

:upside_down_face: :smiley: Rotation-aware text detection :upside_down_face: :smiley:

It's no secret: this release focus was to bring the same level of performance to rotated documents!

![predictions](https://user-images.githubusercontent.com/70526046/147554907-93f403ba-686b-4029-9ef2-5adc821e7776.png)

docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:

Straightening pages before text detection
Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to Rob192 for his contribution on this part :pray:

_This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness._

Text detection training with rotated images

![doctr_sample](https://user-images.githubusercontent.com/76527547/147919531-74077940-ac3f-4a9a-acfb-22a0e7881c03.png)


The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.

Crop orientation resolution

![rot2](https://user-images.githubusercontent.com/76527547/147919416-a4d8f9d0-b986-4886-aaba-42baf722876f.png)

Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!

:zebra: A wider pretrained classification model zoo :zebra:

The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated :rocket:
_Those were trained using our synthetic character classification dataset, for more details cf. [Character classification training](https://github.com/mindee/doctr/tree/main/references/classification)_

:framed_picture: New public datasets join the fray

Thanks to felixdittrich92, the list of supported datasets has considerably grown :partying_face:
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here :point_right: 587

Synthetic text recognition dataset

Additionally, we followed up on the existing `CharGenerator` by introducing `WordGenerator`:
- generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
- you can even pass a list of fonts so that each word font family is randomly picked among them

Below are some samples using a `font_size=32`:
![wordgenerator_sample](https://user-images.githubusercontent.com/76527547/147415761-05a5346c-03ef-494a-a6ce-1138072b60fa.png)

:bookmark_tabs: New notebooks

Two new notebooks have made their way into the [documentation](https://mindee.github.io/doctr/latest/notebooks.html):
- producing searchable PDFs from docTR analysis results
- introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

![image](https://user-images.githubusercontent.com/76527547/147834457-81e54fd7-5aa6-4e48-b6a4-dc103ba9845a.png)


Breaking changes

Revamp of classification models

With the retraining of all classification backbones, several changes have been introduced:
- Model naming: `linknet16` --> `linknet_resnet18`
- Architecture changes: all classification backbones are available with a classification head now.

Enforcing relative coordinates in datasets

In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!

0.4.1

<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>

This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.

**Note**: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Automatic Mixed Precision (AMP) :zap:

Training scripts with [PyTorch back-end](https://pytorch.org/docs/stable/notes/amp_examples.html) now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!

Artefact detection :flying_saucer:

Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.

Here are some early results:

![2x3_art(1)](https://user-images.githubusercontent.com/76527547/142852701-c220664a-8cd1-4a71-83b0-df8e6beb0485.jpg)

This release comes with a training & validation set [DocArtefacts](https://mindee.github.io/doctr/latest/datasets.html#doctr.datasets.DocArtefacts), and a reference [training script](https://github.com/mindee/doctr/blob/main/references/obj_detection/train_pytorch.py). Keep an eye for models we will be releasing in the next release!


Get more of docTR with Colab tutorials :book:

You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of [jupyter notebooks](https://jupyter.org/) that you can open and run locally or on [Google Colab](https://research.google.com/colaboratory/) for instance!

Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html

Breaking changes

Deprecated support of FP16 for datasets

Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type `float32` has a lower resolution counterpart `float16` which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.

However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:

0.4.0

<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>


This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

**Note**: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

No more width limitation for text recognition

Some documents such as French ID card include very long strings that can be challenging to transcribe:

![fr_id_card_sample (copy)](https://user-images.githubusercontent.com/76527547/135622390-f7725f84-aa0d-40a9-b109-06555b45eed3.jpg)

This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

The following snippet:
python
from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])


used to yield:


Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='1XXXXXX', confidence=0.0023),
Word(value='1XXXX', confidence=0.0018),
]
)]
(artefacts): []
)]
)


and now yields:


Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
]
)]
(artefacts): []
)]
)


Framework specific predictors

PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified :raised_hands: Predictors are designed to be the recommended interface for inference with your models!

0.3.1

This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).

*Brought to you by fg-mindee & charlesmindee*

**Note**: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improved pretrained parameters for your favorite models :rocket:
Which each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:
- improved params for `crnn_vgg16_bn` & `sar_resnet31`
- evaluation results on a new private dataset (US tax forms)

Lighter backbones for faster architectures :zap:
Without any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of [MobileNet V3](https://arxiv.org/pdf/1905.02244.pdf) and pretrained it for character classification for both PyTorch & TensorFlow.

Speeding up preprocessors & datasets :train2:
Whether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading/processing by leveraging multi-threading!

Better demo app :art:
We value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:

![new_demo](https://github.com/mindee/doctr/releases/download/v0.3.0/demo_update.png)

Page selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback :hugs:

[beta] Character classification
As DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible.
![char_classif](https://user-images.githubusercontent.com/76527547/131171140-8cb5846b-c976-4202-8ef1-031adef69deb.png)

So this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data :hushed:


Breaking changes

Default dtype of TF datasets

In order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.