- Add HTML representation to extracted tables - Call OCR only on pages/images containing tables - Bump Pillow requirements for vulnerabilities
1.2.2
Not secure
- Add option to pass keyword arguments for PaddleOCR/EasyOCR/docTR constructors - Fix bug with PaddleOCR on blank images - Line filtering for borderless table recognition - Update deprecated polars code
1.2.1
Not secure
- Fix issues related to latest polars release - Improve detection of columns in document layout - Fix rare bug leading to no detected lines - Add coherency checks on borderless tables - Wrap text when exporting to xlsx
1.2.0
Not secure
- Improvement on document layout analysis in order to detect borderless table areas - Modification of handling of optional dependencies
1.0.11
Not secure
- Add support for docTR - Fixes on line detection
1.0.10
Not secure
- Drop Python 3.7 support - Allow PaddleOCR on Python 3.11 - Improve detection of intersection between lines and words