Python-doctr

Latest version: v0.11.0

Safety actively analyzes 722460 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

2.4.0

In order to ensure that all compression features are fully functional in DocTR, support for TensorFlow < 2.4.0 has been dropped.

Less confusing predictor's inputs
`OCRPredictor` used to be taking a list of documents as input, and now only takes list of pages.

2.0.0

What's Changed
Soft Breaking Changes (TensorFlow backend only) 🛠
* Changed the saving format from `/weights` to `.weights.h5`

**NOTE:** Please update your custom trained models and HuggingFace hub uploaded models, this will be the last release supporting manual loading from `/weights`.

New features
* Added numpy 2.0 support felixdittrich92
* New and updated notebooks was added felixdittrich92 --> [notebooks](https://mindee.github.io/doctr/latest/notebooks.html)
* Custom orientation model loading felixdittrich92
* Additional functionality to control the pipeline when dealing with rotated documents milosacimovic felixdittrich92
* Bulit-in datasets can now be loaded directly for detection with `detection_task=True` comparable to the existing `recognition_task=True` felixdittrich92

Disable page orientation classification

* If you deal with documents which contains only small rotations (~ -45 to 45 degrees), you can disable the page orientation classification to speed up the inference.
* This will only have an effect with `assume_straight_pages=False` and/or `straighten_pages=True` and/or `detect_orientation=True`.

python
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)

Disable crop orientation classification

* If you deal with documents which contains only horizontal text, you can disable the crop orientation classification to speed up the inference.
* This will only have an effect with `assume_straight_pages=False` and/or `straighten_pages=True`.

python
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)

Loading custom exported orientation classification models

You can now load your custom trained orientation models, the following snippet demonstrates how:

python
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor

custom_page_orientation_model = mobilenet_v3_small_page_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>")
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>"))

predictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)

Overwrite the default orientation models
predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)



What's Changed
Breaking Changes 🛠
* [TF] First changes on the road to Keras v3 by felixdittrich92 in https://github.com/mindee/doctr/pull/1724
* [Build] update minor version & update torch to >= 2.0 by felixdittrich92 in https://github.com/mindee/doctr/pull/1747
New Features
* Disable page and crop orientation by milosacimovic in https://github.com/mindee/doctr/pull/1735
Bug Fixes
* [Bug] fix straighten pages by felixdittrich92 in https://github.com/mindee/doctr/pull/1697
* [Fix] Remove image padding after rotation correction with `straighten_pages=True` by felixdittrich92 in https://github.com/mindee/doctr/pull/1731
* [datasets] Allow detection task for built-in datasets by felixdittrich92 in https://github.com/mindee/doctr/pull/1717
* [Bug] Fix eval scripts + possible overflow in Resize by felixdittrich92 in https://github.com/mindee/doctr/pull/1715
* [demo] Add missing viz dep for demo by felixT2K in https://github.com/mindee/doctr/pull/1751
Improvements
* [Datasets] Add Vietnamese letters by MinhChien9 in https://github.com/mindee/doctr/pull/1693
* feat: added ukrainian vocab by holyCowMp3 in https://github.com/mindee/doctr/pull/1700
* [orientation] Enable usage of custom trained orientation models by felixdittrich92 in https://github.com/mindee/doctr/pull/1708
* [demo] Automate doctr demo update via CI job by felixdittrich92 in https://github.com/mindee/doctr/pull/1742
* [TF] Move model building & unify train scripts by felixdittrich92 in https://github.com/mindee/doctr/pull/1744
* [demo/docs] Update notebook docs & minor demo update / fix by felixT2K in https://github.com/mindee/doctr/pull/1755
* [Reconstitution] Improve reconstitution by felixdittrich92 in https://github.com/mindee/doctr/pull/1750
Miscellaneous
* [misc] post release 0.9.1 by felixT2K in https://github.com/mindee/doctr/pull/1689
* [build] NumPy 2.0 support by felixdittrich92 in https://github.com/mindee/doctr/pull/1709

New Contributors
* MinhChien9 made their first contribution in https://github.com/mindee/doctr/pull/1693
* holyCowMp3 made their first contribution in https://github.com/mindee/doctr/pull/1700
* milosacimovic made their first contribution in https://github.com/mindee/doctr/pull/1735

**Full Changelog**: https://github.com/mindee/doctr/compare/v0.9.0...v0.10.0

1.0

I/O module
Whether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the `doctr.documents` module was repurposed into `doctr.io`.

0.9302278757095337

More comprehensive representation of predictors
For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.

The following snippet:
python
from doctr.models import ocr_predictor
print(ocr_predictor())

now yields a much cleaner representation of the predictor composition

OCRPredictor(
(det_predictor): DetectionPredictor(
(pre_processor): PreProcessor(
(resize): Resize(output_size=(1024, 1024), method='bilinear')
(normalize): Compose(
(transforms): [
LambdaTransformation(),
Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
]
)
)
(model): DBNet(
(feat_extractor): IntermediateLayerGetter()
(fpn): FeaturePyramidNetwork(channels=128)
(probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
(threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
(postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
)
)
(reco_predictor): RecognitionPredictor(
(pre_processor): PreProcessor(
(resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
(normalize): Compose(
(transforms): [
LambdaTransformation(),
Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
]
)
)
(model): CRNN(
(feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
(decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
(postprocessor): CTCPostProcessor(vocab_size=118)
)
)
(doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
)

Breaking changes

Metrics' granularity

Renamed `ExactMatch` to `TextMatch` since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.

0.98341835

Full changelog
Breaking Changes 🛠
* refacto: :wrench: postprocessing with rotated boxes by charlesmindee in https://github.com/mindee/doctr/pull/641
* refactor: Refactored LinkNet by fg-mindee in https://github.com/mindee/doctr/pull/733
* refactor: Renamed DataLoader arg "workers" into "num_workers" by fg-mindee in https://github.com/mindee/doctr/pull/737
* refactor: Unified return_preds flags across all tasks by fg-mindee in https://github.com/mindee/doctr/pull/741
* refactor: Introduces img + target transforms in Datasets by fg-mindee in https://github.com/mindee/doctr/pull/750
* refactor: refactoring rotated boxes by charlesmindee in https://github.com/mindee/doctr/pull/731
* refactor: Enforced relative coordinates for all dataset geometries by fg-mindee in https://github.com/mindee/doctr/pull/775

New Features
* SynthText dataset integration by felixdittrich92 in https://github.com/mindee/doctr/pull/624
* [notebooks] add export_as_pdfa notebook by felixdittrich92 in https://github.com/mindee/doctr/pull/650
* ICDAR2003 dataset integration by felixdittrich92 in https://github.com/mindee/doctr/pull/653
* feat: Implements erosion & dilation in PyTorch & TF by fg-mindee in https://github.com/mindee/doctr/pull/669
* Rotate page by Rob192 in https://github.com/mindee/doctr/pull/488
* feat: Added option to use AMP with TF scripts by fg-mindee in https://github.com/mindee/doctr/pull/682
* feat: Added support of FasterRCNN for PyTorch by fg-mindee in https://github.com/mindee/doctr/pull/691
* ICDAR2013 dataset integration by felixdittrich92 in https://github.com/mindee/doctr/pull/662
* feat: Added LR finder option in PyTorch training scripts by fg-mindee in https://github.com/mindee/doctr/pull/703
* feat: Added line reading for source PDFs by fg-mindee in https://github.com/mindee/doctr/pull/707
* feat: Added plot_samples support to visualize the images along with the targets by SiddhantBahuguna in https://github.com/mindee/doctr/pull/704
* SVHN dataset integration by felixdittrich92 in https://github.com/mindee/doctr/pull/634
* feat: Added checkpoint for obj_detection by SiddhantBahuguna in https://github.com/mindee/doctr/pull/713
* feat: add classification module for crop orientation by charlesmindee in https://github.com/mindee/doctr/pull/721
* feat: Added inference+post processing script for artefact detection by SiddhantBahuguna in https://github.com/mindee/doctr/pull/728
* feat: Added latency evaluation scripts for all tasks by fg-mindee in https://github.com/mindee/doctr/pull/746
* docs: Added colab link in the Read me for artefact detection by SiddhantBahuguna in https://github.com/mindee/doctr/pull/755
* feat: Added LR Finder for TensorFlow scripts by fg-mindee in https://github.com/mindee/doctr/pull/747
* feat: Added latency evaluation & benchmark for image classification by fg-mindee in https://github.com/mindee/doctr/pull/757
* feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by fg-mindee in https://github.com/mindee/doctr/pull/758
* feat: Added WordGenerator dataset by fg-mindee in https://github.com/mindee/doctr/pull/760
* feat: Added dedicated evaluation scripts for text detection by fg-mindee in https://github.com/mindee/doctr/pull/761
* feat: Refactored & retrained all classification models by fg-mindee in https://github.com/mindee/doctr/pull/763
* feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by charlesmindee in https://github.com/mindee/doctr/pull/743
* feat: Added torchvision photometric augmentations in artefact detection training by SiddhantBahuguna in https://github.com/mindee/doctr/pull/764
* feat: Added random noise augmentation to object detection by SiddhantBahuguna in https://github.com/mindee/doctr/pull/654
* feat: add rotation option to both detection training scripts by charlesmindee in https://github.com/mindee/doctr/pull/765
* feat: Added ChannelShuffle transformation and fixes RandomCrop by fg-mindee in https://github.com/mindee/doctr/pull/768
* feat: Added Gaussian Noise implementation in Tensorflow by SiddhantBahuguna in https://github.com/mindee/doctr/pull/771
* feat: Added Random Horizontal Flip augmentation by SiddhantBahuguna in https://github.com/mindee/doctr/pull/773
* ci: Added release helper actions by fg-mindee in https://github.com/mindee/doctr/pull/776

Bug Fixes
* docs: Fixed documentation build by fg-mindee in https://github.com/mindee/doctr/pull/644
* fix: :bug: bug canvas dtype for threshold target by charlesmindee in https://github.com/mindee/doctr/pull/645
* fix: :bug: assume_straight_pages in predictor by charlesmindee in https://github.com/mindee/doctr/pull/647
* ci: Fixed silent isort failure by fg-mindee in https://github.com/mindee/doctr/pull/655
* fix: Fixed W&B config log by fg-mindee in https://github.com/mindee/doctr/pull/656
* fix: Updates Makefile to match CI by fg-mindee in https://github.com/mindee/doctr/pull/661
* docs: Fixed typo in the docstrings of metrics by fg-mindee in https://github.com/mindee/doctr/pull/664
* fix: rotation arg in training scripts by charlesmindee in https://github.com/mindee/doctr/pull/657
* feat: Added missing output classes param in DBNet by fg-mindee in https://github.com/mindee/doctr/pull/666
* fix: Fixed LinkNet target & loss computation by fg-mindee in https://github.com/mindee/doctr/pull/670
* fix: box angle rectification according to the quadrant by charlesmindee in https://github.com/mindee/doctr/pull/667
* fix: rotate_boxes angle by charlesmindee in https://github.com/mindee/doctr/pull/678
* fix: Fixed param override of backbone by fg-mindee in https://github.com/mindee/doctr/pull/689
* fix: Added missing AMP flags in training scripts by fg-mindee in https://github.com/mindee/doctr/pull/690
* fix: Added a 0-sized crop safeguard in split_crops by fg-mindee in https://github.com/mindee/doctr/pull/693
* fix: Fixed MASTER recognition architecture by fg-mindee in https://github.com/mindee/doctr/pull/687
* fix: Added safeguard for extreme aspect ratio in Resize by fg-mindee in https://github.com/mindee/doctr/pull/695
* fix: Fixed W&B logger in object detection training script by fg-mindee in https://github.com/mindee/doctr/pull/697
* fix: Fixed geometry utils for polygon <--> rbox conversions by fg-mindee in https://github.com/mindee/doctr/pull/700
* fix: Fixed build_target for detection models with rotated targets by fg-mindee in https://github.com/mindee/doctr/pull/698
* fix: box computing when assume straight pages is false by charlesmindee in https://github.com/mindee/doctr/pull/720
* test: Fixed TF loss unittest by fg-mindee in https://github.com/mindee/doctr/pull/725
* fix: Fixed edge cases of DB loss in PyTorch by fg-mindee in https://github.com/mindee/doctr/pull/726
* fix: Fixed computation of Mean IoU by fg-mindee in https://github.com/mindee/doctr/pull/734
* fix: Fixed detection training script by fg-mindee in https://github.com/mindee/doctr/pull/742
* fix: Fixed the bin_thresh of LinkNet by fg-mindee in https://github.com/mindee/doctr/pull/745
* test: Increased flexibility of loss test by fg-mindee in https://github.com/mindee/doctr/pull/744
* fix: Fixed mask computation of DBNet by fg-mindee in https://github.com/mindee/doctr/pull/753
* test: Fixed TensorFlow predictor unittest by fg-mindee in https://github.com/mindee/doctr/pull/767
* fix: Fixed the box cropping from RandomCrop by fg-mindee in https://github.com/mindee/doctr/pull/772
* ci: Fixed CI training job for TF by fg-mindee in https://github.com/mindee/doctr/pull/770
* docs: Fixed README link & update documentation by fg-mindee in https://github.com/mindee/doctr/pull/774
* fix: target DB by charlesmindee in https://github.com/mindee/doctr/pull/777

Improvements
* style: Fixed isort and typing checks by fg-mindee in https://github.com/mindee/doctr/pull/643
* docs: Added TFJS demo ref in README by fg-mindee in https://github.com/mindee/doctr/pull/651
* fix: Added automatic worker resolution to remaining training scripts by fg-mindee in https://github.com/mindee/doctr/pull/649
* feat: Added rbox_iou function with a memory-savy option by fg-mindee in https://github.com/mindee/doctr/pull/659
* style: Cleaned codebase with Codacy hints by fg-mindee in https://github.com/mindee/doctr/pull/665
* feat: Added file existence check in DetectionDataset by fg-mindee in https://github.com/mindee/doctr/pull/672
* fix: pymupdf version by charlesmindee in https://github.com/mindee/doctr/pull/673
* [refactor] SROIE dataset by felixdittrich92 in https://github.com/mindee/doctr/pull/660
* fix: target_ar split crops by charlesmindee in https://github.com/mindee/doctr/pull/681
* feat: add line resolution for rotated boxes by charlesmindee in https://github.com/mindee/doctr/pull/677
* feat: add rboxes rectification in Linknet postprocessing by charlesmindee in https://github.com/mindee/doctr/pull/679
* docs: Added minimal docstring sanity check by fg-mindee in https://github.com/mindee/doctr/pull/686
* fix: Fixed deprecation warnings from numpy & PyMuPDF by fg-mindee in https://github.com/mindee/doctr/pull/692
* refactor: Removed postprocessor from high-level init by fg-mindee in https://github.com/mindee/doctr/pull/688
* feat: Added possibility to change the cache dir of datasets by fg-mindee in https://github.com/mindee/doctr/pull/694
* Mock Sroie / Funsd / Cord / Synthtext / DocArtefacts / IIIT5K / SVT / IC03 (all ^^) by felixdittrich92 in https://github.com/mindee/doctr/pull/722
* refactor: Refactored detection post-processing by fg-mindee in https://github.com/mindee/doctr/pull/724
* ci: Fixed CI job name and ignored .idea files by fg-mindee in https://github.com/mindee/doctr/pull/727
* feat: integration of the classifier in the ocr predictor by charlesmindee in https://github.com/mindee/doctr/pull/723
* test: Switch to a fully mocked PDF for unittests by fg-mindee in https://github.com/mindee/doctr/pull/735
* test: Silenced PyMuPDF warnings by fg-mindee in https://github.com/mindee/doctr/pull/740
* refactor: Removed contiguous param since it's included in torch>=1.7 by fg-mindee in https://github.com/mindee/doctr/pull/756
* feat: add preserve aspect ratio to predictor and vizualisation utils by charlesmindee in https://github.com/mindee/doctr/pull/766
* ci: Optimized CI jobs to speed up development process by fg-mindee in https://github.com/mindee/doctr/pull/759
* feat: Updated timing to more accurate one by fg-mindee in https://github.com/mindee/doctr/pull/769
Miscellaneous
* chore: Applied post release modifications by fg-mindee in https://github.com/mindee/doctr/pull/642

**Full Changelog**: https://github.com/mindee/doctr/compare/v0.4.1...v0.5.0

0.75

Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.

New features

Models
Deep learning model building and inference
- Added detection features of faces (258), bar codes (260)
- Added new pretrained weights for `db_resnet50` (277)
- Added sequence probability in text recognition (284)

Utils
Utility features relevant to the library use cases.
- Added granularity on recognition metrics (274)
- Added visualization option to display artefacts (273)

Transforms
Data transformations operations
- Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (277)

Test
Verifications of the package well-being before release
- added unittests for artefact detection (258, 260)
- added detailed unittests for granular metrics (274)
- Extended unittests for resizing (277)

Documentation
Online resources for potential users
- Added installation instructions for Mac & Windows users (268)
- Added benchmark of models on private datasets (269)
- Added changelog to the documentation (279)
- Added BibTeX citation in README (279)
- Added parameter count in performance benchmarks (280)
- Added OCR illustration in README (283) and documentation (285)

References
Reference training scripts
- Added support of Weights & biases logging for training scripts (286)
- Added option to start using pretrained models (286)

Others
Other tools and implementations
- Added CI job to build for MacOS & Windows (268)

Bug fixes
Datasets
- Fixed blank image handling in `OCRDataset` (270)

Documents
- Fixed channel order for PDF render into images (276)

Models
- Fixed normalization step in preprocessors (277)

Utils
- Fixed `OCRMetric` update edge case (267)

Transforms
- Fixed `Resize` when preserving aspect ratio (266)
- Fixed `RandomSaturation` (277)

Documentation
- Fixed documentation of `OCRDataset` (274)
- Improved documentation of `doctr.documents.elements` (274)

References
- Fixed resizing in recognition script (266)

Others
- Fixed demo for multi-page examples (276)
- Fixed image decoding in API routes (282)
- Fixed preprocessing in API routes (282)

Improvements

Datasets
- Added file existence check in dataset constructors (277)
- Refactored dataset methods (278)

Models
- Improved DBNet box computation (272)
- Refactored preprocessors using transforms (277)
- Improved repr of preprocessors and models (277)
- Removed `ignore_case` and `ignore_accents` from recognition postprocessors (284)

Documents
- Updated performance benchmarks (272, 277)

Documentation
- Updated badges in README & documentation versions (254)
- Updated landing page of documentation (279, 285)
- Updated repo folder description in CONTRIBUTING (282)
- Improved the README's instructions to run the API (282)

Tests
- Improved unittest of resizing transforms (266)
- Improved unittests of OCRMetric (267)
- Improved unittest of PDF rendering (276)
- Extended unittest of `OCRDataset` (278)
- Updated unittest of `DocumentBuilder` and recognition models (284)

References
- Updated training scripts (284)

Others
- Updated requirements (274)
- Updated evaluation script (277, 284)

Page 1 of 4

Releases

Has known vulnerabilities

Python-doctr

Page 1 of 4

2.4.0

2.0.0

1.0

0.9302278757095337

0.98341835

0.75

Page 1 of 4

Links

Releases