Chainforge

Latest version: v0.3.4.7

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 4

0.1.5.3

This is an emergency release to add basic support for the new OpenAI models and 'function call' ability. It also includes support for Azure OpenAI endpoints, closing Issue 53 .

OpenAI function calls
You can now specify the newest models of ChatGPT, 0613:

<img width="545" alt="Screen Shot 2023-06-13 at 5 36 50 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a47cbb12-2744-4566-9781-09fe4eeb5ce2">

In addition, you can set the value of `functions` by passing a valid JSON schema object. This will be passed to the `functions` of the OpenAI chat completions call:

<img width="646" alt="Screen Shot 2023-06-13 at 5 36 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/6106aab6-88f6-488e-a6ad-eff6b4108870">

I've created a basic example flow to **detect when a given prompt triggers a function call**, using OpenAI's `get_current_weather` example in their press release:

<img width="1432" alt="Screen Shot 2023-06-13 at 5 39 50 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a4d9bd0f-0f51-4a48-9784-c5e3ac342f2b">

In the coming weeks, we will think about making this user experience more streamlined, but for now, enjoy being able to mess around!

Azure OpenAI API support

Thanks to community members chuanqisun , bhctest123 , and levidehaan , we now have added Azure OpenAI support:

![245616817-23e0fcb3-5cee-4d76-8eeb-eb83f5b5fabc](https://github.com/ianarawjo/ChainForge/assets/5251713/df25a6c7-5a52-4274-8a46-d1c9c39a2cc2)

To use Azure OpenAI, you just need to set your keys in ChainForge Settings:

<img width="478" alt="Screen Shot 2023-06-13 at 5 57 24 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/34db6db5-a447-4745-a425-5cc9077e6af7">

And then make sure you set the right Deployment Name in the individual model settings. The settings also includes OpenAI function calls (not sure if you can deploy 0613 models on Azure yet, but it's there).

As always, let us know if you run into any issues.

Collapsing duplicate responses

As part of this release, duplicate LLM responses when num generations `n>1` are now detected and automatically collapsed in Inspectors. The number of duplicates is indicated in the top-right corner:

<img width="386" alt="Screen Shot 2023-06-13 at 12 03 54 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/6f489ff0-6f33-438f-be25-102ce69deb15">

0.1.5

We've added Tabular Data to ChainForge, to help conduct ground truth evaluations. Full release notes below.

Tabular Data Nodes 🗂️

You can now input and import tabular data (spreadsheets) into ChainForge. Accepted formats are `jsonl`, `xlsx`, and `csv`. Excel and CSV files must have a header row with column names.

Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is **ground truth evaluation**, where we have some inputs to a prompt, and an "ideal" or expected answer:

<img width="1377" alt="Screen Shot 2023-06-10 at 2 23 13 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/e3dd6941-47d4-4eee-b8b1-d9007f7aae15">

Here, we see **variables `{first}`, `{last}`, and `{invention}` "carry together" when filling the prompt template**: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters.

Accessing tabular data, even if it's not input into the prompt directly
Alongside tabular data is a new property of `response` objects in Evaluation nodes: the `meta` dict. This allows you to get access to column data that is associated with inputs to a prompt template, _but was not itself directly input into the prompt template_. For instance, in the new example flow for ground truth evaluation of math problems:

<img width="1770" alt="Screen Shot 2023-06-11 at 11 51 28 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/1611a9e4-c7d8-4f3f-92ff-a7c41bb230cf">

Notice the evaluator uses `meta` to get "Expected", which is _associated_ with the prompt input variable `question` by virtue of it being on the same row of the table.

python
def evaluate(response):
return response.text[:4] == \
response.meta['Expected']

Example flows

Tabular data allows us to run many more types of LLM evaluations. For instance, here is the ground truth evaluation `multistep-word-problems` from [OpenAI evals](https://github.com/openai/evals), loaded into ChainForge:

<img width="1465" alt="Screen Shot 2023-06-10 at 9 08 05 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/12609e9f-e23e-4028-9b7e-ee2dc7d31147">

We've added an Example Flow for ground truth evaluation that provides a good starting point.

--------------------------
Evaluation Node output 📟

Curious what the format of a `response` object is like? You can now `print` inside `evaluate` functions to print output directly to the browser:

<img width="375" alt="Screen Shot 2023-06-10 at 8 26 48 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/264a5661-4ae9-4468-9fd6-607ab95aa1f5">

In addition, Exceptions raised inside your evaluation function will also print to the node out:

<img width="377" alt="Screen Shot 2023-06-10 at 8 29 38 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/70a96161-6fce-451b-9219-4a4aa31948bd">

--------------------------
Slight styling improvements in Response Inspectors

We removed the use of blue Badges to display unselected prompt variable and replaced them with text that blends into the background:

<img width="327" alt="Screen Shot 2023-06-11 at 12 52 51 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/8f42ea57-7de0-4d1e-8a11-bb68e689baae">

The fullscreen inspector also displays slightly larger font size for readability:

<img width="1461" alt="Screen Shot 2023-06-11 at 12 51 49 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/3c2ab466-09d8-4edd-a5a3-bcc6c32e34cd">

--------------------------
Final thoughts / comments
- Tabular Data was a major feature, as it enables many types of LLM evaluation. Our goal now is to illustrate what people can currently do in ChainForge through better documentation and connecting to existing datasets (e.g. OpenAI evals). We also will focus on quality-of-life improvements to the UI and adding more models/extensibility.
- We know there is a minor layout issue with the table not autosizing to the best fit the width of cell content. This happens as some browsers do not appear to autofit column widths properly when `<textarea>` is an element of a table cell. We are working on a fix so columns are automatically sized based on their content.

Want to see a feature / have a comment? Start a [Discussion](https://github.com/ianarawjo/ChainForge/discussions) or submit an [Issue](https://github.com/ianarawjo/ChainForge/issues)!

0.1.4

This release includes the following features:

Selective Failure on API requests ♨️
ChainForge now has selective failure on `PromptNodes`: API calls that fail no longer stop the remaining requests, but rather collect in red error bars within the progress bars:

https://github.com/ianarawjo/ChainForge/assets/5251713/957b9909-359d-4e1e-8d7e-7dfd1f3fb6aa

An error message will display all errors once all API requests return (whether successfully or with errors). This saves $$ and time. (As always, ChainForge cache's responses the moment it receives them, so you don't need to worry about re-running prompt nodes re-calling APIs.)

Inspector Pop-up 🔍
In addition, we've added an Inspector pop-up which you can access by clicking the response preview box on a `PromptNode`:

https://github.com/ianarawjo/ChainForge/assets/5251713/4036a3bf-561d-46b1-8de4-80a7eb22c890

This makes it much easier to inspect responses without needing to attach a dedicated Inspect Node. We're going to build this out (and add it to the `EvaluatorNode`) soon, but for now I hope you find this feature useful.

LLM Color Consistency 🌈
Now, each LLM you create has a dedicated color that remains consistent across `VisNode` plots and Inspector responses.

Firefox Support 🦊
Due to demand for more browsers, we've added support for FireFox. This involved a minor change to how model settings forms work.
As well, (though it isn't formatted exactly right) other browsers should now work too, as we removed a dependency on Regex lookaheads/behinds which was causing some browsers like Safari to not load the app at all.

Website
As an aside, we've created a website at [chainforge.ai](http://chainforge.ai). It's not much yet, but it's a start. We will add tutorials in the near future for new users.

Upcoming features
Major priorities right now are:
* Tabular data nodes: Load tabular data and reference columns in `EvaluatorNode` code
* Ground truth example flows: An example flow that evaluates responses against a 'ground truth' which differs per prompt parameter value
* Azure support: Yes, we heard you! :) I am hoping to get this very soon.

0.1.3.1

Make it very easy to import example flows:

https://github.com/ianarawjo/ChainForge/assets/5251713/df54594b-a704-4ef7-8103-23e0ed5b1477

Other additions:
- Added a "Compare System Prompts" example, with one-shot versus zero-shot versus "threaten a fictional kitten" examples. (See Twitter thread that inspired this use case: https://twitter.com/ShriramKMurthi/status/1664978520131477505?s=20 )
- Made it possible to switch between GPT3.5 and GPT4 after initially adding one of them (removed separation between these types of models)
- Improved the 'add node' UI to look sleeker

0.1.3

Proud to announce we now have model settings in ChainForge. 🥳
You can now compare across different versions of the same model, in addition to nicknaming models and choosing more specific models.

To install, do `pip install chainforge --upgrade`. Full changelog below.

More supported models 🤖
Along with model settings, we now have support for all OpenAI, Anthropic, Google PaLM (chat and text), Dalai-hosted models. For instance, you can now compare Llama.65B to PaLM text completions, if you were so inclined. For the full list, see [models.py](https://github.com/ianarawjo/ChainForge/blob/main/chainforge/promptengine/models.py).

Here is comparing Google PaLM's text-bison to chat-bison for the same prompt:

<img width="808" alt="Screen Shot 2023-06-01 at 2 27 54 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/eb581e22-1dfb-4c55-8a93-8e595ea5fbaf">

Customizable model settings (and emojis! 😑)
Once you add a model to a `PromptNode`, now you can tap the 'settings' icon on a `PromptNode` to bring up a form with all settings for that base model. You can adjust the exact model used (for instance, `text-bison-001` in PaLM, or Dalai-hosted `llama.30B`):

<img width="1298" alt="Screen Shot 2023-06-01 at 2 09 49 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/66215673-4615-4fe6-92cd-70daaa922f2e">

Temperature appears next to model names by default. For ease of reference, temperature is displayed on a sliding color scale from cyan `00ffff` (coldest) to violet `ff00ff` (lukewarm) to red `ff0000` (hottest). The percentage respects min and max temperature settings for individual models.

You can now also nickname models in `PromptNode`s. Names must be unique. Each nickname will appear elsewhere in Chainforge (e.g. in plots). You can also set the Emoji used. For instance, here is a comparison between two ChatGPT models at different temperatures, which I've renamed `hotgpt` and `coldgpt` with the emojis 🔥 and 🥶:

<img width="333" alt="Screen Shot 2023-06-01 at 2 07 54 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/55ff7e24-5530-4f95-bbbd-0274a0b321a5">

Note about importing previous flows
Unfortunately, this code rewrite involved a _*breaking change*_ for how flows are imported and exported (`.cforge` file format). You may still be able to import old flows, but you need to re-populate each model list and re-query LLMs. I hope to avoid this, but in this case it was necessary to store model settings information and redo how the backend cache's responses.

Note about Dalai-hosted models
Currently, you cannot query multiple Dalai models/settings at once, since a locally run model can only take one request at a time. We're working on fixing this for the next minor release; for now, just choose one model at a time, and if you want more than one, add it to the list and re-query the prompt node (it will use the previously cache'd responses from the first Dalai model).

Encounter any bugs?
There was a _lot_ to change for this release, and it's likely that at least one thing broke in the process that we haven't detected. If you encounter a bug or problem, open an Issue or respond to the Discussion about this release! 👍

0.1.2.4

This release contains two minor but important changes:

- Template variable hooks in `PromptNode` and `TextFieldNode` no longer auto-uppercase the template variable name by default:
<img width="336" alt="Screen Shot 2023-05-28 at 11 23 07 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/f9dcef9e-156c-4d10-a132-e16f6b341c7c">

- Selected variables in `InspectNode` now display up to 144 characters of their value (which is much more informative than the previous 12 characters in uppercase):
<img width="447" alt="Screen Shot 2023-05-28 at 11 19 21 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/e79731dd-ad60-4eb7-9054-6826565b634e">

We are working on custom model settings for the next major release (0.1.3), so you can change the temperature and other settings of individual models. We are structuring the code to be easily extensible to add more (hopefully much more) models in the future, including user-specified models and settings forms (through `react-json-schema`).

Page 4 of 4

Releases

Has known vulnerabilities

Chainforge

Page 4 of 4

0.1.5.3

0.1.5

0.1.4

0.1.3.1

0.1.3

0.1.2.4

Page 4 of 4

Links

Releases