Highlights
In all of v5 before this release, we defined the presence of 1+ answer generations not containing the substring `"cannot answer"` as the agent loop's end. However, this (suboptimally) leads to the agent loop terminating early on partial answers like "Based on the sources provided, it appears no one has done x." We realized this, and have resolved this issue by:
- No longer coupling our done condition with the substring `"cannot answer"` being not present in 1+ generated answers
- No longer implicitly depending on clients mentioning this `"cannot answer"` sentinel in the input `qa` prompt
We also fixed several (bad) bugs:
- We support parallel tool calling (2+ `ToolCall`s in one `action: ToolRequestMessage`). However, our tools (notably `gather_evidence`) are not actually concurrent-safe. Our tool schemae instructed not to call certain tools in parallel, nonetheless we observed agents specifying `gather_evidence` to be called in parallel. So now we force our tools to be non-concurrently executed to work around this race condition
- When using `LitQAEvaluation` and the same `GradablePaperQAEnvironment` 2+ times, we repeatedly added the "unsure" option to the target multiple choice question, degrading performance over time
- When using `PaperQAEnvironment` 2+ times, each `reset` was not properly wiping the `Docs` object
- The reward distribution of `LitQAEvaluation` was mixing up "unsure" reward of `0.1` with the "incorrect" reward of `-1.0`, not properly incentivizing learning
There are a bunch of other minor features, cleanups, and bugfixes here too, see the full list below.
What's Changed
* Deprecation cycle for `AgentSettings.should_pre_search` by jamesbraza in https://github.com/Future-House/paper-qa/pull/679
* Moved agent prompts to `prompts.py` by jamesbraza in https://github.com/Future-House/paper-qa/pull/681
* Refactor to remove `skip_system` from `LLMModel.run_prompt` by jamesbraza in https://github.com/Future-House/paper-qa/pull/680
* Resolving `evidence_detailed_citations` and `Answer` deprecations by jamesbraza in https://github.com/Future-House/paper-qa/pull/682
* Fixed agent prompt names and contents after 681 mess up by jamesbraza in https://github.com/Future-House/paper-qa/pull/683
* Removed `tool_names` validation for `gen_answer` being present by jamesbraza in https://github.com/Future-House/paper-qa/pull/685
* Fixing `test_evaluation` logic bugs by jamesbraza in https://github.com/Future-House/paper-qa/pull/686
* Removed `GenerateAnswer.FAILED_TO_ANSWER` as its unnecessary by jamesbraza in https://github.com/Future-House/paper-qa/pull/691
* Allowing serialized `Settings` in `get_settings` by jamesbraza in https://github.com/Future-House/paper-qa/pull/688
* Fixed LDP runner's `TRUNCATED` not calling `gen_answer`, and documented `AgentStatus` by jamesbraza in https://github.com/Future-House/paper-qa/pull/690
* Removed `gen_answer`'s dead argument `question` by jamesbraza in https://github.com/Future-House/paper-qa/pull/689
* Making sure we copy distractors by sidnarayanan in https://github.com/Future-House/paper-qa/pull/694
* Created `complete` tool to allow unsure answers by jamesbraza in https://github.com/Future-House/paper-qa/pull/684
* Added missing `test_from_question` cassette by jamesbraza in https://github.com/Future-House/paper-qa/pull/696
* Moved `fake` agent to LLM propose `complete` tool by jamesbraza in https://github.com/Future-House/paper-qa/pull/695
* Default to ordered tool calls, w env variable control by mskarlin in https://github.com/Future-House/paper-qa/pull/697
* Lock file maintenance by renovate in https://github.com/Future-House/paper-qa/pull/699
* Refactored `TestGradablePaperQAEnvironment` for DRY code by jamesbraza in https://github.com/Future-House/paper-qa/pull/702
* Fixing `PaperQAEnvironment.reset` respecting `mmr_lambda` and `text_hashes` by jamesbraza in https://github.com/Future-House/paper-qa/pull/703
* Removed `"cannot answer"` literals and added `reset` tool by jamesbraza in https://github.com/Future-House/paper-qa/pull/698
* Update all non-major dependencies by renovate in https://github.com/Future-House/paper-qa/pull/705
* Fixing `LitQAEvaluation` bugs: incorrect reward indices, not using LLM's native knowledge by jamesbraza in https://github.com/Future-House/paper-qa/pull/708
* Adding filters to paper-qa Docs by whitead in https://github.com/Future-House/paper-qa/pull/707
* Fixed mutably defaulted `NumpyVectorStore.texts` by jamesbraza in https://github.com/Future-House/paper-qa/pull/711
**Full Changelog**: https://github.com/Future-House/paper-qa/compare/v5.4.0...v5.5.0