Chainforge

Latest version: v0.3.4.7

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 4

0.2.8.9

Dalai support has been replaced by [Ollama](https://github.com/jmorganca/ollama) 🦙. You can now add local [models hosted via Ollama](https://ollama.ai/library), including llama2, mistral, codellama, vicuna, etc.

Thanks to laurenthuberdeau ! 🎉

0.2.8

Adds purple sparkly generative AI button to TextFields and Items Nodes, courtesy of shawseanyang !

This button gives you easy access to generating input values using LLM calls to OpenAI GPT-3.5 and GPT-4 models. You can access to two features:
- **Replace**: Replace the existing fields with new fields, given the prompt entered.
- **Extend**: Given the existing fields, extrapolate the pattern and extend the list the best you can.

![image](https://github.com/ianarawjo/ChainForge/assets/5251713/fb92921f-2ef8-4e89-910b-575059f9ed13)

Try it out on [chainforge.ai/play](https://chainforge.ai/play/), or install locally via `pip install chainforge --upgrade` (BYO OpenAI API key!)

_Note: You must have input an OpenAI API key (either directly in Settings, or via environment variables) to use generative AI features. In the future, we might support letting users change the model used for this feature, if there's interest (or if you submit a PR! :)_

Yes, prompts generate too... sort of!

Replace will also consider if you ask for prompts and can generate prompt templates. Extend will also consider prompt templates and try to keep to your existing template variables. If you use prompt generation, you are probably best off Extending after manually entering 2-3 prompts/prompt templates as examples.

Note that prompt template generation is very experimental atm, and we're working on improving this aspect.

<img width="581" alt="Screen Shot 2023-12-12 at 10 34 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/2874ae94-81c4-43b2-9c57-6f21b6810647">

This feature is in BETA. There may be rough edges, mistakes in generation, etc.
However, we've been enjoying it greatly and found it very helpful for speeding up input data generation. Play around and let us know what you think!

_If you don't want to see the buttons, you can toggle off AI support in the Settings Window. Also, you can toggle on the Autocomplete feature on TextFields nodes, if you're feeling experimental! :)_

This feature represents a semester's worth of work from shawseanyang! Thank you Sean! 🎉🥳

---------------------
Other small changes in this release:
- the max `Num of responses per prompt` counter on Prompt Nodes has been increased to 999
- In-browser autosaving now disables if you start working with lots of LLM responses (talking files upwards of 20MB or more). `localStorage` can only handle so much.
- Relatedly, when you tab away from the ChainForge browser tab, autosaving will not occur in the background. This is to save you performance and help with the check above.
- API keys are now loaded from environment variables only upon load of the application, rather than every call, for consistency.

0.2.6.5

We added a 🔗**Join Node**, our first Processor node, which lets you concatenate responses and/or input data, within or across LLMs.

<img width="673" alt="Screen Shot 2023-10-23 at 3 10 49 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/ee91cb2f-2b86-4a6e-8506-c9fe17ae8d81">

For instance, consider:
<img width="1731" alt="Screen Shot 2023-10-23 at 3 29 26 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/bbaa40c8-b0f0-4e93-b2c1-a3efcce38a6e">

We translate words one-by-one in the first Prompt Node:

<img width="329" alt="Screen Shot 2023-10-23 at 3 29 41 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a135a00b-2ad8-4055-ab97-78227109936e">

Then we can join the responses by category, fruit or dessert. Here I've opted for "double newline" formatting:

<img width="665" alt="Screen Shot 2023-10-23 at 3 29 46 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/72ee5a48-d287-4b4b-bb12-65f75aea2033">

Finally we chain these lists of items into another Prompt Node, to have an LLM tell us which one is the sweetest item of the list:

<img width="659" alt="Screen Shot 2023-10-23 at 3 30 06 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/503f885e-51ad-4ad4-a63d-80864cf53416">

Questions? Comments?

The Join Node is a bit of an experimental node. It does a few things, but, please let us know if it doesn't fit your use case or is too limited. And, as always, you can just implement the changes you want, and submit a Pull Request --this will be faster if the change is minor (e.g., adding another formatting option to the Join Node).

0.2.6

For weeks, many of you have asked for the ability to query custom models or providers in ChainForge. Given how fast this space is evolving --and also how idiosyncratic some of these APIs are --we decided it best to make ChainForge extensible.

You can now [add custom providers](https://chainforge.ai/docs/custom_providers/) by writing simple completion functions in Python. Custom providers will be added to the list of providers in Prompt, Chat Turn, and LLM Scorer nodes. Added provider scripts are automatically cached, and persist across runs of ChainForge.

Here's [an example script to add the Cohere API](https://github.com/ianarawjo/ChainForge/blob/main/chainforge/examples/custom_provider_cohere.py), complete with a JSON schema defining custom settings options. You add this script by simply dropping it into the new "Custom Providers" tab in the ChainForge Settings window:

![custom-providers](https://github.com/ianarawjo/ChainForge/assets/5251713/70f363d0-1a59-47aa-bea9-650738c4e3e0)

You can then query the custom provider like normal:

![custom-provider-query](https://github.com/ianarawjo/ChainForge/assets/5251713/0fc6e042-75e5-43c8-b7ac-6fd33b538217)

Note that only the local version of ChainForge (via `pip install`) supports custom providers.

[For extensive information, see the new "Adding a custom provider" page in the docs.](https://chainforge.ai/docs/custom_providers/)
As always, let us know if you encounter any problems! :)

docs
ChainForge now has documentation! Go here:

https://chainforge.ai/docs/

Let us know what you think!

<img width="1592" alt="Screen Shot 2023-08-05 at 11 30 59 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/bcf82540-e2c9-4c76-b22b-4f2157653f7a">

0.2.5

We're excited to release two new nodes: **Chat Turns** and **LLM Scorers**. These nodes came from feedback during user sessions:
- Some users wanted to first tell chat models 'how to act', and then wanted to put their real prompt in the second turn.
- Some users wanted a quicker, cheaper way to 'evaluate' responses and visualize results.

We describe these new nodes below, as well as a few quality-of-life improvements.

🗣️ Chat Turn nodes
Chat models are all the rage (in fact, they are so important that [OpenAI announced it would no longer support plain-old text generation models going forward](https://openai.com/blog/gpt-4-api-general-availability).) Yet strikingly, very few prompt engineering tools let you evaluate LLM outputs beyond a prompt.

Now with Chat Turn nodes, you can continue conversations beyond a single prompt. In fact, you can:

Continue multiple conversations simultaneously across multiple LLMs

Just connect the Chat Turn to your initial Prompt Node, and voilà:

<img width="1421" alt="Screen Shot 2023-07-25 at 6 39 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/9039ce6b-a16d-4694-89fa-47a22636cd8a">

Here, I've first prompted four chat models: GPT3.5, GPT-4, Claude-2, and PaLM with the question: "What was the first {game} game?". Then I ask a follow-up question, "What was the second?" By default, Chat Turns continue the conversation with all LLMs that were used before, allowing you to follow-up on LLM responses in parallel. (You can also toggle that off, if you want to query different models --more details below).

Template chat messages, just like prompts

You can do everything you can with Chat Turns that you could with Prompt Nodes, including prompt templating and adding input variables. For instance, here's a prompt template as a follow-up message:

<img width="1184" alt="Screen Shot 2023-07-25 at 1 22 15 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/497b5c6d-a830-4af6-b7fe-f9c5b5b6a132">

> **Note**
> In fact, Chat Turns are merely modified Prompt Nodes, and use the underlying `PromptNode` class.

Start a conversation with one LLM, and continue it with a different LLM

Chat Turns include a toggle of whether you'd like to continue chatting with the same LLMs, or query different ones, passing chat context to the new models. With this, you can start a conversation with one LLM and continue it with another (or several):

<img width="1146" alt="Screen Shot 2023-07-25 at 12 46 52 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/17e96f80-3344-49ff-b236-5a2cea017efd">

Supported chat models

Simple in concept, chat turns were the result of 2 weeks' work, revising many parts of the ChainForge backend to store and carry chat context. Chat history is automatically translated to the appropriate format for a number of providers:
- OpenAI chat models
- Anthropic models (Claude)
- Google PaLM2 chat
- HuggingFace (you need to set 'Model Type' in Settings to 'chat', and choose a Conversation model or custom endpoint. Currently there's only one chat model listed in ChainForge dropdown: `microsoft/DialoGPT`. Go to the HuggingFace site to find more!)

> **Warning**
> If you use a non-chat, text completions model like GPT-2, chat turns will still function, but the chat context won't be passed into the text completions model.

Let us know what you think!

🤖 LLM Scorer nodes

More commonly called "LLM evaluators", LLM scorer nodes allow you to use an LLM to 'grade'/score outputs of other LLMs:

<img width="342" alt="Screen Shot 2023-07-25 at 6 44 01 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a48d458c-9383-4040-888d-24a7c37a8f47">

Although ChainForge supported this functionality before via prompt chaining, it was not straightforward and required an additional chain to a code evaluator node for postprocessing. You can now connect the output of the scorer directly to a Vis Node to plot outputs. For instance, here's GPT-4 scoring whether different LLM responses apologized for a mistake:

<img width="1640" alt="Screen Shot 2023-07-25 at 12 31 52 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/da7acda9-1d26-4fbf-ad73-a3b422455876">

Note that LLM scores are finicky --if one score isn't in the right format (true/false), visualization nodes won't work properly, because they'll think the outputs are notof boolean type but categorical. We'll work on improving this, but, for now, enjoy LLM scorers!

❗ Why we're not calling LLM scorers 'LLM evaluators'
We thought long and hard about what to call LLMs that score outputs of other LLMs. Ultimately, using LLMs to score outputs is helpful, and can save time when it's hard to write code to achieve the same effect. However, LLMs are imperfect. Although the AI community currently uses the term 'LLM evaluator,' we ultimately decided not to use that term, for a few reasons:
1. LLM scores should not be blindly trusted. They are helpful if you already have a sense of what you're looking for, and want to grade hundreds of responses and don't care about picture-perfect accuracy. This is especially true after playing with LLM scorer nodes for a while and finding that small tweaks to the scoring prompt can result in vast differences in results.
2. Evaluators, like 'graders' or 'annotations,' is a term that has connotations with humans (i.e. human evaluator). We want to avoid anthropomorphizing LLMs, which contributes to peoples' over-trust in them. 'Scorers' still has human connotations, but arguably less so, and less authoritative ones than 'evaluator'.
3. Evaluators is a term in ChainForge that refers to programs that score responses. Calling LLM scorers 'evaluators' loosely equates them with programmatic evaluators, suggesting they carry the same authority. Although code can be wrong or incorrect, the scoring process for code is inspectable and auditable --not so with LLMs.

Fundamentally, then, we disagree with the positions taken by projects like LangChain, which tend to emphasize LLM scorers as the go-to solution for evaluation. We believe this is a massive mistake that tends to mislead people and causing them to over-trust AI outputs, including [ML researchers at MIT](https://news.ycombinator.com/item?id=36370685). In choosing the term Scorers, we aim to --at the very least --distance ourselves from such positions.

Other changes

* Inspecting true/false scored responses (in Evaluators or LLM scorers) will now show false in red, to easily eyeball failure cases:
<img width="1575" alt="Screen Shot 2023-07-25 at 6 33 00 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/29c6886b-7fab-4c8f-adf8-f5f76db4eede">

* In Response Inspectors, the term "Hierarchy" has been replaced with "Grouped List". Grouped Lists are again the default.
* In table view of the response inspector, you can now choose what variable to use for columns. With this method you can compare across prompt templates or indeed anything of interest:
<img width="1583" alt="Screen Shot 2023-07-25 at 6 48 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/023e2809-1b2b-4bdf-8848-cdd334e699f9">

Future Work

Chat Turns opened up a whole new can of worms, both for the UI, and for evaluation. Some open questions are:
* How can we display Chat History in response inspectors? Right now, you'll only see the latest response from the LLM. There's more design work to do such that you can view the chat context of specific responses.
* Should there be a Chat History node so you can predefine/preset chat histories to test on, without needing to query an LLM?

We hope to prioritize such features based on user feedback. If you use Chat Turns or LLM Scorers, let us know what you think --open an Issue or start a Discussion! 👍

0.2.1.3

I've added `--host` and `--port` flags when you're running ChainForge locally. You can specify what hostname and port to run it on like so:

chainforge serve --host 0.0.0.0 --port 3400

The front-end app also knows you're running it from Flask (locally) regardless of what the hostname and port is.

Page 2 of 4

Releases

Has known vulnerabilities

Previous Next

Chainforge

Page 2 of 4

0.2.8.9

0.2.8

0.2.6.5

0.2.6

0.2.5

0.2.1.3

Page 2 of 4

Links

Releases