What's Changed
:rocket: Features / Enhancements
- Added Hugging Face Agent support; see an [example](https://github.com/IBM/ibm-generative-ai/blob/main/examples/user/huggingface_agent.py).
- Drastically improve the speed of `generate_async` method - the concurrency limit is now automatically inferred from the API. (custom setting of `ConnectionManager.MAX_CONCURRENT_GENERATE` will be ignored). In case you want to slow down the speed of generating, just pass the following parameter to the method: `max_concurrency_limit=1` or any other value.
- Increase the default tokenize processing limits from 5 requests per second to 10 requests per second (this will be increased in the future).
:bug: Bug fixes
- Throws on unhandled exceptions during the `generate_async` calls.
Correctly cleanups the async HTTP clients when the task/calculation is being cancelled (for instance, you call generate_async in Jupyter - Notebook and then click the stop button). This should prevent receiving the `Can't have two active async_generate_clients` error.
- Fix async support for newer LangChain versions (`>=0.0.300`)
- Fix LangChain PromptTemplate import warning in newer versions of LangChain
- Correctly handle server errors when streaming
- Fix `tune_methods` method