Betterocr

Latest version: v1.2.0

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

1.2.0

|:---:|:---:|
| <img width="794" alt="Screenshot 2023-11-02 at 10 22 01 AM" src="https://github.com/junhoyeo/BetterOCR/assets/32605822/966895a3-787f-43fc-a653-d2b28ad5f0f7"> | <img width="791" alt="Screenshot 2023-11-02 at 10 21 45 AM" src="https://github.com/junhoyeo/BetterOCR/assets/32605822/ebe7aa69-6217-4d96-b7f5-828f3d01ae35"> |

- Improved English/Korean text recognition with new Pororo OCR support! 🎉
- Special thanks to black7375 for https://github.com/black7375/korean_ocr_using_pororo (where he suggested using EasyOCR for text detection and BrainOCR (Pororo's OCR module) for text recognition) and https://github.com/junhoyeo/BetterOCR/issues/2.
- Also kudos to the kakaobrain team and yunwoong7.

Notes
Pororo is used only if the language options (`lang`) specified include either 🇺🇸 English (`en`) or 🇰🇷 Korean (`ko`). Also additional dependencies listed in <a href="https://github.com/junhoyeo/BetterOCR/blob/main/pyproject.toml#L22"><code>[tool.poetry.group.pororo.dependencies]</code></a> must be available. (If not, it'll automatically be excluded from enabled engines.)

What's Changed
* [ImgBot] Optimize images by imgbot in https://github.com/junhoyeo/BetterOCR/pull/7
* Write parser tests by junhoyeo in https://github.com/junhoyeo/BetterOCR/pull/9
* EasyPororoOCR Integration by junhoyeo in https://github.com/junhoyeo/BetterOCR/pull/8

New Contributors
* imgbot made their first contribution in https://github.com/junhoyeo/BetterOCR/pull/7

**Full Changelog**: https://github.com/junhoyeo/BetterOCR/compare/v1.1.2...v1.2.0

1.1.2

BetterOCR
> 🔍 **BetterOCR** combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.

What's Changed
* Fix Incorrect Inclusion of API_KEY in chat.completions.create Call by snacsnoc in https://github.com/junhoyeo/BetterOCR/pull/3

New Contributors
* snacsnoc made their first contribution in https://github.com/junhoyeo/BetterOCR/pull/3

**Full Changelog**: https://github.com/junhoyeo/BetterOCR/compare/v1.1.1...v1.1.2

1.1.1

BetterOCR
> 🔍 **BetterOCR** combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.

What's Changed
- Improved prompt in Box Detection
- Fix bug inside `detect_boxes`'s fallback logic (when LLM output format is invalid)

**Full Changelog**: https://github.com/junhoyeo/BetterOCR/compare/v1.1.0...v1.1.1

1.1.0

BetterOCR
> 🔍 **BetterOCR** combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.

What's Changed
* Box Detection (`detect_boxes`) had been implemented by junhoyeo in https://github.com/junhoyeo/BetterOCR/pull/1
* Demo: https://github.com/junhoyeo/BetterOCR#-box-detection (Example Script: https://github.com/junhoyeo/BetterOCR/blob/main/examples/detect_boxes.py)

| Original | Detected |
|:---:|:---:|
| <img src="https://github.com/junhoyeo/BetterOCR/raw/main/.github/images/demo-1.png" width="500px" /> | <img src="https://github.com/junhoyeo/BetterOCR/raw/main/.github/images/boxes-0.png" width="500px" /> |

**Full Changelog**: https://github.com/junhoyeo/BetterOCR/commits/v1.1.0

1.0.1

<p align="center">
<a href="https://github.com/junhoyeo/BetterOCR">
<img src="https://github.com/junhoyeo/BetterOCR/raw/main/.github/images/logo.png" width="256px" />
</a>
</p>
<h1 align="center">BetterOCR</h1>

> 🔍 Better text detection by combining multiple OCR engines with 🧠 LLM.

OCR _still_ sucks! ... Especially when you're from the _other side_ of the world (and face a significant lack of training data in your language) — or just not thrilled with noisy results.

**BetterOCR** combines results from multiple OCR engines with an LLM to correct & reconstruct the output.

- **🔍 OCR Engines**: Currently supports [EasyOCR](https://github.com/JaidedAI/EasyOCR) and [Tesseract](https://github.com/tesseract-ocr/tesseract).
- **🧠 LLM**: Supports [Chat models](https://github.com/openai/openai-python#chat-completions) from OpenAI.
- **📒 Custom Context**: Allows users to provide an optional context to use specific keywords such as proper nouns and product names. This assists in spelling correction and noise identification, ensuring accuracy even with rare or unconventional words.

Head over to [💯 Examples](https://github.com/junhoyeo/BetterOCR#-examples) to view performace by languages (🇺🇸, 🇰🇷, 🇮🇳).

Coming Soon: improved interface, async support, box detection, and more.

Releases

Has known vulnerabilities