Codebased

Latest version: v0.6.2

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.6.2

This release updates all tree-sitter grammars to 0.23, their most recent version at time of release.
This allows getting rid of some code that did pointer arithmetic in Python using the ctypes library, which threatened various memory safety issues.
Not quite included in the code, but it also adds tests for all supported languages.
These are fairly basic and will be expanded upon in the future, but should prevent anything super silly from happening.

Also, this release reduced the concurrency of making requests to the OpenAI API to avoid exceeding rate limits.

Essentially, making requests from multiple threads is great for performance on small projects, because you can make requests at higher than the sustainable throughput (1M, 5M, 10M tokens per minute) for under 1 minute.
i.e. if your rate limit is 1M tokens per minute, you could potentially make 500k tokens of requests in 5 seconds, which is 6x your sustained rate limit.
This was bad for larger projects, so I'm temporarily reducing request concurrency to 1, but the most effective strategy (already in place) for boosting performance is to batch requests, so we're able to sustain approximately 100% of the allowed throughput on the OpenAI API (5M tokens, at least for tiers 3 and 4).

0.6.1

Fixed an issue with parsing C++ header files after I noticed something was off w/ livegrep on the demo site: https://codebased.sh.
For better or for worse, C header files are parsed as C++ header files now.
This should work because C++ is a syntactic superset of C (say that five times fast).
In turn, our C++ parser *extends* the C parser now.
This uncovered a bug in how C-style constructs were parsed in C++.
All in all, this change fixes a few bugs and enhances the parser's test coverage.

0.6.0

- Improved the default radius argument to include more semantic search results and account for difference in vector spaces.
- Respect top k AND radius in semantic search to avoid huge result sets that overload the reranker.
- Fix parsing of TypeScript (and JavaScript) constants. Thanks to sridatta for reporting this issue.

0.5.2

Fix bug where we would stop a thread before it was started and edit reranking system prompt to less aggressively filter results. Before it would unhelpfully stop around 10 results.

0.5.1

This is a small bug-fix release where I fixed an issue with the preview not being removed when the result set became empty.

0.5.0

1. --rerank/--no-rerank (enable/disable reranking): Re-ranking, which uses gpt-4o-mini to re-order / filter results after the initial retrieval stage, is now on by default.
2. --radius: The maximum L2 distance for the semantic search, defaults to 1, which was chosen pretty informally but seems to work well. Intuitively, the average L2 distance between two random vectors in a high-dimensional space is approximately the square root of 2, because usually these vectors will be orthogonal. In practice, if you randomly sample embedding vectors of objects from the same codebase, they're somewhat normally distributed around a value slightly less than this, for a number of reasons that I'll leave as an exercise for the reader. Previously, semantic search used only "top k", but this would include irrelevant results if there was not a good match.
Also, I increased the default top k, which is now used only for full-text search, to 32. This ended up being the number that worked well for me during testing, especially after re-ranking, so I wanted to make it the default.
In the future, top k might be increased or removed entirely.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.