Initial Release of markitdown
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
* PDF (.pdf)
* PowerPoint (.pptx)
* Word (.docx)
* Excel (.xlsx)
* Images (EXIF metadata, and OCR)
* Audio (EXIF metadata, and speech transcription)
* HTML (special handling of Wikipedia, etc.)
* Various other text-based formats (csv, json, xml, etc.)
The API is simple:
python
from markitdown import MarkItDown
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)