Overview
Version 0.1.0 (previously 0.1.0a6) is a large release, bringing many improvements over the previous 0.0.2 version.
High-level changes include:
* Organized dependencies into feature groups — install only the converters you need, or get everything with `pip install markitdown[all]`
* A new plugin-based architecture, allowing 3rd-party developers to add functionality to MarkItDown (see the [sample plugin](https://github.com/microsoft/markitdown/tree/main/packages/markitdown-sample-plugin))
* All conversions are performed in-memory — no more temporary files
* Support for new formats including EPUB
* Option to keep data URIs in converted Markdown
* Option to override MIME type, extension, and charset in the command-line interface (useful when reading input from a pipe or stdin)
Breaking changes
* As noted above, dependencies are now organized into optional feature groups. `Use pip install markitdown[all]` for backward-compatible behavior.
* `convert_stream()` now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, which also accepted text file-like objects, like io.StringIO.
* The `DocumentConverter` class interface has changed to read from file-like streams rather than file paths. No temporary files are created anymore. If you are the maintainer of a plugin or custom DocumentConverter, you likely need to update your code. Otherwise, if you're only using the MarkItDown class or CLI (as in these examples), you should not need to change anything.
Detailed list of contributions
* Cleanup and refactor, in preparation for plugin support. by afourney in https://github.com/microsoft/markitdown/pull/318
* Skip generating md links in 'pre' blocks by t-kalinowski in https://github.com/microsoft/markitdown/pull/322
* Fix a typo in sample RTF plugin by rickygao in https://github.com/microsoft/markitdown/pull/320
* Added priority argument to all converter constructors. by afourney in https://github.com/microsoft/markitdown/pull/324
* Doc Intelligence fixes for refactored code by KennyZhang1 in https://github.com/microsoft/markitdown/pull/325
* Added CLI tests. by afourney in https://github.com/microsoft/markitdown/pull/327
* Fix UnboundLocalError in MarkItDown._convert by menezesandre in https://github.com/microsoft/markitdown/pull/1038
* add necessary imports by tanreinama in https://github.com/microsoft/markitdown/pull/861
* fix: Implement retry logic for YouTube transcript fetching and fix URL decoding issue by iw4p in https://github.com/microsoft/markitdown/pull/1035
* Add Support For PPTX Shape Groups (Fix in code design to not miss out on slide content) by C0dingMast3r in https://github.com/microsoft/markitdown/pull/331
* Make sure extensions are unique in MarkItDown's convert methods. by afourney in https://github.com/microsoft/markitdown/pull/1076
* Don't have ZipConverter accept OOXML files. by afourney in https://github.com/microsoft/markitdown/pull/1078
* Print and log better exceptions when file conversions fail. by afourney in https://github.com/microsoft/markitdown/pull/1080
* Exceptions should subclass Exception not BaseException. by afourney in https://github.com/microsoft/markitdown/pull/1082
* [Draft] Exploring ways to allow Optional dependencies by afourney in https://github.com/microsoft/markitdown/pull/1079
* Fixed property name by afourney in https://github.com/microsoft/markitdown/pull/1085
* Update converter API, user streams rather than filepaths by afourney in https://github.com/microsoft/markitdown/pull/1088
* Bump version. by afourney in https://github.com/microsoft/markitdown/pull/1094
* Fixed loading of plugins. by afourney in https://github.com/microsoft/markitdown/pull/1096
* Fixed version. by afourney in https://github.com/microsoft/markitdown/pull/1097
* fix(README): correct pip install command formatting by Piero24 in https://github.com/microsoft/markitdown/pull/1090
* Fixed deepcopy failure when passing llm_client by scalabreseGD in https://github.com/microsoft/markitdown/pull/1089
* Fixed formatting. by afourney in https://github.com/microsoft/markitdown/pull/1098
* feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order by richardye101 in https://github.com/microsoft/markitdown/pull/1104
* feat(docker): improve dockerfile build by syaghoubi00 in https://github.com/microsoft/markitdown/pull/220
* Fix exiftool in well-known paths. by afourney in https://github.com/microsoft/markitdown/pull/1106
* fix typo in well-known path list by 0xmohit in https://github.com/microsoft/markitdown/pull/1109
* Switch from puremagic to magika. by afourney in https://github.com/microsoft/markitdown/pull/1108
* Minimize guesses when guesses are compatible. by afourney in https://github.com/microsoft/markitdown/pull/1114
* Added CLI options for extension, mime-types, and charset. by afourney in https://github.com/microsoft/markitdown/pull/1115
* Fix string formatting in FileConversionException error message by yushihang in https://github.com/microsoft/markitdown/pull/1121
* Handle not supported plot type in pptx by EmanueleMeazzo in https://github.com/microsoft/markitdown/pull/1122
* Small fixes for autogen integration. by afourney in https://github.com/microsoft/markitdown/pull/1124
* Added epub test file. by afourney in https://github.com/microsoft/markitdown/pull/1130
* Fix remaining mypy errors. by afourney in https://github.com/microsoft/markitdown/pull/1132
* Have magika read from the stream. by afourney in https://github.com/microsoft/markitdown/pull/1136
* EPub Support. Adapted 123 to not use epublib. by afourney in https://github.com/microsoft/markitdown/pull/1131
* Consider anything with a charset as plain text-convertible. by afourney in https://github.com/microsoft/markitdown/pull/1142
* Adjust warning filters and update dependencies by afourney in https://github.com/microsoft/markitdown/pull/1143
* Add support for preserving base64 encoded images by BetterAndBetterII in https://github.com/microsoft/markitdown/pull/1140
* Resolve a console encoding error. by afourney in https://github.com/microsoft/markitdown/pull/1149
* Bump version to 0.1.0 by afourney in https://github.com/microsoft/markitdown/pull/1150
New Contributors
* t-kalinowski made their first contribution in https://github.com/microsoft/markitdown/pull/322
* rickygao made their first contribution in https://github.com/microsoft/markitdown/pull/320
* menezesandre made their first contribution in https://github.com/microsoft/markitdown/pull/1038
* tanreinama made their first contribution in https://github.com/microsoft/markitdown/pull/861
* iw4p made their first contribution in https://github.com/microsoft/markitdown/pull/1035
* C0dingMast3r made their first contribution in https://github.com/microsoft/markitdown/pull/331
* Piero24 made their first contribution in https://github.com/microsoft/markitdown/pull/1090
* scalabreseGD made their first contribution in https://github.com/microsoft/markitdown/pull/1089
* richardye101 made their first contribution in https://github.com/microsoft/markitdown/pull/1104
* syaghoubi00 made their first contribution in https://github.com/microsoft/markitdown/pull/220
* 0xmohit made their first contribution in https://github.com/microsoft/markitdown/pull/1109
* yushihang made their first contribution in https://github.com/microsoft/markitdown/pull/1121
* EmanueleMeazzo made their first contribution in https://github.com/microsoft/markitdown/pull/1122
* BetterAndBetterII made their first contribution in https://github.com/microsoft/markitdown/pull/1140
**Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.0.2...v0.1.0