<p align="center">
<img src="https://user-images.githubusercontent.com/76527547/135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif" width="50%">
</p>
This release adds support of rotated documents, and extends both the model & dataset zoos.
**Note**: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
:upside_down_face: :smiley: Rotation-aware text detection :upside_down_face: :smiley:
It's no secret: this release focus was to bring the same level of performance to rotated documents!

docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:
Straightening pages before text detection
Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to Rob192 for his contribution on this part :pray:
_This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness._
Text detection training with rotated images

The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.
Crop orientation resolution

Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!
:zebra: A wider pretrained classification model zoo :zebra:
The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated :rocket:
_Those were trained using our synthetic character classification dataset, for more details cf. [Character classification training](https://github.com/mindee/doctr/tree/main/references/classification)_
:framed_picture: New public datasets join the fray
Thanks to felixdittrich92, the list of supported datasets has considerably grown :partying_face:
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here :point_right: 587
Synthetic text recognition dataset
Additionally, we followed up on the existing `CharGenerator` by introducing `WordGenerator`:
- generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
- you can even pass a list of fonts so that each word font family is randomly picked among them
Below are some samples using a `font_size=32`:

:bookmark_tabs: New notebooks
Two new notebooks have made their way into the [documentation](https://mindee.github.io/doctr/latest/notebooks.html):
- producing searchable PDFs from docTR analysis results
- introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

Breaking changes
Revamp of classification models
With the retraining of all classification backbones, several changes have been introduced:
- Model naming: `linknet16` --> `linknet_resnet18`
- Architecture changes: all classification backbones are available with a classification head now.
Enforcing relative coordinates in datasets
In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!