Support for concurrent predictions
This release introduces support for concurrent processing of predictions through the use of an async predict function.
To enable the feature add the new `concurrency.max` entry to your cog.yaml file:
concurrency:
max: 32
And update your predictor to use the `async def predict` syntax:
python
class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")
async def predict(self) -> str:
print("async predict");
return "hello world";
Cog will now process up to 32 predictions simultaneously, once at capacity subsequent predictions will return a 409 HTTP response.
Iterators
If your model is currently using `Iterator` or `ConcatenateIterator` it will need to be updated to use `AsyncIterator` or `AsyncConcatenateIterator` respectively.
python
from cog import AsyncConcatenateIterator, BasePredictor
class Predict(BasePredictor):
async def predict(self) -> AsyncConcatenateIterator[str]:
for fruit in ["apple", "banana", "orange"]:
yield fruit