- Supports every format supported by PyMuPDF : pdf, epub, xps, mobi, fb2, cbz and svg - Fix bugs - and some refactoring
0.1alpha
Features
- PDF is the only supported format - Extract text only - Rely on PyMuPDF for the ordering of the lines - Use color rarity, font rarity and size to assign an importance to each blocks of text of the document - Export data to json