First - thanks everyone for bearing with us as we've made some notable architectural changes over the past several releases.
A big part of doing this was orienting the package towards better long-term development and where DataFog is being used today and likely in the future within API services.
- Implement Pytesseract: significant speed and accuracy in OCR text extraction from Donut!
- Allows for better image and PDF extraction
- Enhanced test suite coverage
- Refactored definitions to support async (for API integration)
- Refactored classes/functions around ImageService, TextService, SparkService