New Features
* **Automatic population of metadata**: PDF metadata is automatically retrieved from a variety of providers, including adding bibtex, citation counts, journal quality assessments, and noting retractions
* **full-text search**: A major difference between our published work and this repo is ability to search over all of scientific literature. We've brought the OSS version closer by adding full-text keyword search via [tantivy](https://github.com/quickwit-oss/tantivy). Now you can index and search many papers before embdding, making it feasible to ingest many papers.
* **unified settings management**: You can now save/load settings and that makes it easier for us to distribute settings reflecting various tasks with PaperQA2. Examples are writing wikipedia articles, identifying contradictions, and obtaining structured data
* **CLI**: We've made a CLI that uses persistent parsings/indexes and makes it much easier to just ask questions of a folder of PDFs
* **Litellm**: We've adopted [litellm](https://github.com/BerriAI/litellm) as the LLM wrapper of choice. This means we now support many LLM APIs directly with only the model string changing. It also means we have "routers" now that can do fallbacks, api rate limiting, and retries.
Improvements
* More modern agent frameworks
* Reduction in dependencies
* Removed code duplicated by litellm
* Many improvements on code style and best practices
Regressions/Deprecation
We've removed the following features to keep our library focused:
* `doc_match` - we do not have enough data to support that this method actually helps for very large corpuses
* LangchainVectorStore - We no longer support more complex vector stores via Langchain like FAISS. Instead, we only support Numpy vector stores. We never found the paradigm of very large vector stores to be better than keyword search -> vector search -> LLM reranking and thus removed the code
Detailed Changes:
* typo by oganm in https://github.com/Future-House/paper-qa/pull/303
* Updated readme and models by mskarlin in https://github.com/Future-House/paper-qa/pull/305
* Add Client (external API) Module For Enhanced Metadata by mskarlin in https://github.com/Future-House/paper-qa/pull/306
* Agentic workflows, locally indexed search, and CLI by mskarlin in https://github.com/Future-House/paper-qa/pull/309
* Add new unpaywall provider by mskarlin in https://github.com/Future-House/paper-qa/pull/310
* Rollback search fields to `list` and dynamically compute md5 hash in tests by mskarlin in https://github.com/Future-House/paper-qa/pull/311
* Refactor to breakout config from rest of code by whitead in https://github.com/Future-House/paper-qa/pull/289
* Changed to rely on litellm for computing cost by whitead in https://github.com/Future-House/paper-qa/pull/321
* Fixing `LLMModel.axyz_iter` type hints by jamesbraza in https://github.com/Future-House/paper-qa/pull/324
* CLI Fixes by whitead in https://github.com/Future-House/paper-qa/pull/322
* `black`ened code to prevent IDE scrolling by jamesbraza in https://github.com/Future-House/paper-qa/pull/330
* Optimized import paths by jamesbraza in https://github.com/Future-House/paper-qa/pull/331
* Removed `pytest-mock` plugin by jamesbraza in https://github.com/Future-House/paper-qa/pull/328
* Adding `pytest-xdist` plugin by jamesbraza in https://github.com/Future-House/paper-qa/pull/329
* Passing `mypy` by jamesbraza in https://github.com/Future-House/paper-qa/pull/332
* Removing `make_chain` in favor of `run_prompt` by jamesbraza in https://github.com/Future-House/paper-qa/pull/325
* Readme updates by mskarlin in https://github.com/Future-House/paper-qa/pull/323
* Adding `refurb` tool, and `lint` CI by jamesbraza in https://github.com/Future-House/paper-qa/pull/333
* Fixing arg ordering after 325 by jamesbraza in https://github.com/Future-House/paper-qa/pull/334
* Fixing `parse_text` after 332 by jamesbraza in https://github.com/Future-House/paper-qa/pull/335
* Fixing union attr error by jamesbraza in https://github.com/Future-House/paper-qa/pull/338
* Check if a journal name starts with `the` by geemi725 in https://github.com/Future-House/paper-qa/pull/320
* Fixing two more tests by jamesbraza in https://github.com/Future-House/paper-qa/pull/340
* All Ruff `ANN` autofixes by jamesbraza in https://github.com/Future-House/paper-qa/pull/341
* Adding in `.mailmap` by jamesbraza in https://github.com/Future-House/paper-qa/pull/342
* Remove cassettes which aren't needed by mskarlin in https://github.com/Future-House/paper-qa/pull/339
* Add configs for contracrow + wikicrow by mskarlin in https://github.com/Future-House/paper-qa/pull/336
* Removed `LangchainVectorStore`, `llms` extra, and fixing up `README` by jamesbraza in https://github.com/Future-House/paper-qa/pull/343
* Dropping `requests` dependency by jamesbraza in https://github.com/Future-House/paper-qa/pull/346
* Removed `html2text` requirement by jamesbraza in https://github.com/Future-House/paper-qa/pull/347
* Requiring Python 3.11+ by jamesbraza in https://github.com/Future-House/paper-qa/pull/348
* Did one revision at README by whitead in https://github.com/Future-House/paper-qa/pull/344
* Renaming fitz to pymupdf by mskarlin in https://github.com/Future-House/paper-qa/pull/350
* Better control flow in `litellm_get_search_query` by jamesbraza in https://github.com/Future-House/paper-qa/pull/351
* Recurse into directories; catch empty documents by sidnarayanan in https://github.com/Future-House/paper-qa/pull/352
* Move configure_cli_logging such that it's not called twice by mskarlin in https://github.com/Future-House/paper-qa/pull/353
* Cleaning up dependencies by jamesbraza in https://github.com/Future-House/paper-qa/pull/354
* Fixed code in README by whitead in https://github.com/Future-House/paper-qa/pull/355
* Added citation and paper URL by whitead in https://github.com/Future-House/paper-qa/pull/357
* `aviary` and `ldp` for agents over `langchain` by jamesbraza in https://github.com/Future-House/paper-qa/pull/358
* Adds retraction status by geemi725 in https://github.com/Future-House/paper-qa/pull/314
* Adding `pylint` by jamesbraza in https://github.com/Future-House/paper-qa/pull/349
* Added account for cost info by whitead in https://github.com/Future-House/paper-qa/pull/360
New Contributors
* oganm made their first contribution in https://github.com/Future-House/paper-qa/pull/303
* geemi725 made their first contribution in https://github.com/Future-House/paper-qa/pull/320
* sidnarayanan made their first contribution in https://github.com/Future-House/paper-qa/pull/352
**Full Changelog**: https://github.com/Future-House/paper-qa/compare/v4.9.0...vnew