1. --rerank/--no-rerank (enable/disable reranking): Re-ranking, which uses gpt-4o-mini to re-order / filter results after the initial retrieval stage, is now on by default.
2. --radius: The maximum L2 distance for the semantic search, defaults to 1, which was chosen pretty informally but seems to work well. Intuitively, the average L2 distance between two random vectors in a high-dimensional space is approximately the square root of 2, because usually these vectors will be orthogonal. In practice, if you randomly sample embedding vectors of objects from the same codebase, they're somewhat normally distributed around a value slightly less than this, for a number of reasons that I'll leave as an exercise for the reader. Previously, semantic search used only "top k", but this would include irrelevant results if there was not a good match.
Also, I increased the default top k, which is now used only for full-text search, to 32. This ended up being the number that worked well for me during testing, especially after re-ranking, so I wanted to make it the default.
In the future, top k might be increased or removed entirely.