This first release adds pretrained models for end-to-end OCR and document manipulation utilities.
*Release handled by fg-mindee & charlesmindee*
**Note**: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.
Highlights
Easy & high-performing document reading
Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.
PDF reading is a wrapper around [PyMuPDF](https://github.com/pymupdf/PyMuPDF) back-end for fast file reading
from doctr.documents import read_pdf
from path
doc = read_pdf("path/to/your/doc.pdf")
from stream
with open("path/to/your/doc.pdf", 'rb') as f:
doc = read_pdf(f.read())
while image reading is using [OpenCV](https://github.com/opencv/opencv) backend
from doctr.documents import read_img
page = read_img("path/to/your/img.jpg")
Pretrained End-to-End OCR predictors
Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features
Text detection
Currently, only [DBNet](https://arxiv.org/pdf/1911.08947.pdf)-based architectures are supported, more to come in the next releases!
from doctr.documents import read_pdf
from doctr.models import db_resnet50_predictor
model = db_resnet50_predictor(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model(doc)
Text recognition
There are two architectures implemented for recognition: [CRNN](https://arxiv.org/pdf/1602.05875.pdf), and [SAR](https://arxiv.org/pdf/1811.00751.pdf)
from doctr.models import crnn_vgg16_bn_predictor
model = crnn_vgg16_bn_predictor(pretrained=True)
End-to-End OCR
Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document
from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn
model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
New features
Documents
Documentation reading and manipulation
- Added PDF (8, 18, 25, 83) and image (30, 79) reading utilities
- Added document structured elements for export (16, 26, 61, 102)
Models
Deep learning model building and inference
- Added model export methods (10)
- Added preprocessing module (20, 25, 36, 50, 55, 77)
- Added text detection model and post-processing (24, 32, 36, 43, 49, 51, 84): DBNet
- Added image cropping function (33, 44)
- Added model param loading function (49, 60)
- Added text recognition post-processing (35, 36, 37, 38, 43, 45, 49, 51, 63, 65, 74, 78, 84, 101, 107, 108, 111, 112): SAR & CRNN
- Added task-specific predictors (39, 52, 58, 62, 85, 98, 102)
- Added VGG16 (36), Resnet31 (70) backbones
Utils
Utility features relevant to the library use cases.
- Added page interactive prediction visualization (54, 82)
- Added custom types (87)
- Added abstract auto-repr object (102)
- Added metric module (110)
Test
Verifications of the package well-being before release
- Added pytest unittests (7, 59, 75, 76, 80, 92, 104)
Documentation
Online resources for potential users
- Updated README (9, 48, 67, 68, 95)
- Added CONTRIBUTING (7, 29, 48, 67)
- Added sphinx built documentation (12, 36, 55, 86, 90, 91, 93, 96, 99, 106)
Others
Other tools and implementations
- Added python package setup (7, 21, 67)
- Added CI verifications (7, 67, 69, 73)
- Added dockerized environment with library installed (17, 19)
- Added issue template (34)
- Added environment collection script (81)
- Added analysis script (85, 95, 103)
v0.1-models
This release is only a mirror for pretrained detection & recognition models.