- fuzzy-search (`search.py` `get_context` fn) should preserve original text formatting (it was losing newlines) - docling: get per-page markdown, with image-refs for figures - enrichment: better enrichment delimiter
0.37.1
fix: docling pdf_parser - first split into pages, then conv each to markdown
0.37.0
feat: new pdf parsers: `docling` and `pymupdf4llm`
See full list of available `PdfParsingConfig` options here:
Also: - removing `pdfplumber` due to outdated + conflicting dependency constraints. - update to newwer OpenAI embeddings models announced 25 Jan 2025 (`text-embedding-3-small`, `...-large`)
0.36.1
fix: `DocChatAgent` and related classes/fns: improve chunking to retain formatting, improve citation format.
0.36.0
feat: Weaviate vector-db support -- thanks abab-dev!