- More fixes; performance improvements - Added classification features - infer using HF transformers, OpenAI (Chat), and train custom transformer models in 1 line - More demo datasets
0.1.7
- Default param changes in train model function - Added splitting string arrays before feeding to OpenAI embeddings API to account for token limits - Doc changes - bug fixes in the function evaluate_pairs - Added an option to train on a dataset of cluster text and labels
0.1.6
- Bug fixes in model uploads to the hub (missing training config)
0.1.5
- Guardrails added to preprocessing before training to group by keys in case column id is not specified instead of consdiering every pair unique
0.1.4
- Bug fixes in training pipeline - Added a function to do merge k nearest neighbours instead of just one - Bug fixes in model card generation and upload to hugging face hub - Preprocessing data for training now gracefully handles cases where an id is not specified for left or right columns. It now groups by the two columns to handle exact duplicates. Id is still recommended.