There has been a lot of important enhancement in this release. Here are the highlights:
- DataFrame transform:
- Support vector search tool. The current execution accuracy of WikiSQL benchmark is 97%, which is higher than the top result of the Leaderboard of WikiSQL. (https://github.com/pyspark-ai/pyspark-ai/pull/119)
- use simple code generation when LLM is not GPT-4 (https://github.com/pyspark-ai/pyspark-ai/pull/171)
- DataFrame plotting:
- Implement Auto-Retry Mechanism for Python Code Generation (https://github.com/pyspark-ai/pyspark-ai/pull/159)
- Project infra
- Group dependencies for user-facing pip installation (https://github.com/pyspark-ai/pyspark-ai/pull/162)
![sql](https://github.com/pyspark-ai/pyspark-ai/assets/1097932/0ee1be56-b417-4d28-a757-f65fec247b0d)
![python](https://github.com/pyspark-ai/pyspark-ai/assets/1097932/c127b3f3-f7c2-4b9f-a6e4-9f563e924d28)
What's Changed
* Update Notice by dennyglee in https://github.com/pyspark-ai/pyspark-ai/pull/130
* Add more details into Notice file by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/131
* Add SimilarValueTool by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/119
* Exclude SimilarValueTool By Default by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/133
* Introduce document website by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/134
* Add github action to publish doc site by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/135
* Add reverse svg logo by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/136
* Add a simple end-to-end test case by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/139
* Add tests for SimilarValueTool by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/140
* Added three end2end tests of UDF generation. by SemyonSinchenko in https://github.com/pyspark-ai/pyspark-ai/pull/141
* Example notebook: generation of UDFs by SemyonSinchenko in https://github.com/pyspark-ai/pyspark-ai/pull/145
* Add CNAME file for doc website by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/146
* Add github action to publish doc changes on doc website by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/147
* Use BAAI/bge-base-en-v1.5 model by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/144
* Update UDF generation docs by SemyonSinchenko in https://github.com/pyspark-ai/pyspark-ai/pull/148
* Fix syntax error in .github/workflows/website.yml by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/150
* Update API docs for plotting and UDF by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/151
* Return type of df.ai.verify from None to bool by SemyonSinchenko in https://github.com/pyspark-ai/pyspark-ai/pull/154
* Improve prompt to help English SDK query correct column by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/153
* Tranform filter prompt improvements by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/155
* Add test data by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/156
* LRU Vector Store Policy by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/157
* Implement Auto-Retry Mechanism for Python Code Generation by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/159
* Fix regression on the plotting API when verbose is False by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/161
* Add example notebook for vector similarity search by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/165
* Check for existing files in vector_store_dir location by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/160
* Add explanation of when to use SUM by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/158
* Use inconsistent format for similarity search input by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/166
* Group dependencies for user-facing pip installation by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/162
* Modify transform prompt depending if vector store is enabled by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/163
* Enhance README by adding badges and update vector search example by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/169
* Update deprecated langchain imports by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/168
* DataFrame transform: use simple code generation when LLM is not GPT-4 by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/171
* Remove prompt about similar_value when tool not enabled by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/172
* Fix dependency groups for pip install by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/174
* Condense README examples by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/175
* Update schema and sample val input to improve query generation by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/176
* Skip test for transform without similar_value by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/178
* Ignore unstable end-to-end case test_laplace_random_udf by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/179
* Run wikisql end-to-end tests on GPT-4 only by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/180
* Ignore end-to-end test case test_array_udf_output by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/181
* Add pyspark-ai[all] group for pip install by asl3 in https://github.com/pyspark-ai/pyspark-ai/pull/182
* Bump version to v0.1.19 by gengliangwang in https://github.com/pyspark-ai/pyspark-ai/pull/183
New Contributors
* dennyglee made their first contribution in https://github.com/pyspark-ai/pyspark-ai/pull/130
* SemyonSinchenko made their first contribution in https://github.com/pyspark-ai/pyspark-ai/pull/141
**Full Changelog**: https://github.com/pyspark-ai/pyspark-ai/compare/v0.1.17...v0.1.19