- speech mining with [SpeechMatrix](
- [ALTI+](
- many tests for the mining pipeline and different modules of `stopes`
- the `Launcher` can now retry jobs when running on a flaky slurm cluster
- different margin implementations in mining
- possibility to take the best neighbour when running the margin instead of the first one (fast)
- mine large datasets by splitting them in sub-languages
- when mining, keep metadata about what pairs come from the forward and backward pass
- when mining, choose if you want to do only forward, backward or both passes
- embeddings for mining are now stored in real npy files with headers
- `StopesModule` is not `async` anymore, just the APIs of `Launcher`. You should write your `run` function as
a normal non-async function
- mining neighbours is now optimized to have a smaller memory load
- progress bar of pipelines is simplified to avoid overly busy logs
- do not rely on existing line count files and compute them as part of the pipeline in the mining
- many improvements in the mining code
- many fixes in the NMT eval pipeline