Chainforge

Latest version: v0.3.3.0

Safety actively analyzes 707009 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.2.1

We've made several quality-of-life improvements from 0.2 to this release.

Prompt previews

You can now inspect what generated prompts will be sent off to LLMs. For a quick glance, simply hover over the 'list' icon on Prompt Nodes:

![hover-over-prompt-preview](https://github.com/ianarawjo/ChainForge/assets/5251713/32e47b32-38f0-4354-9c20-2f6f31c99806)

For full inspection, just click the button to bring up a popup inspector.

Thanks to Issue https://github.com/ianarawjo/ChainForge/issues/90 raised by profplum700 !

Ability To Enable/Disable Prompt Variables in Text Fields Without Deleting Them

You can now enable/disable prompt variables selectively:

https://github.com/ianarawjo/ChainForge/assets/5251713/92f9c869-8201-43d0-a4a5-8aee7524319e

Thanks to Issue https://github.com/ianarawjo/ChainForge/issues/93 raised by profplum700 !

Anthropic model Claude-2

We've also added the newest Claude model, Claude-2. All prior models remain supported; however, strangely, Claude-1 and 100k context models have disappeared from the Anthropic API documentation. So, if you are using earlier Claude models, just know that they may stop working at some future point.

Bug fixes

There have also been numerous bug fixes, including:
- braces { and } inside Tabular Data tables are now escaped by default when data is pulled from the nodes, so that they are never treated as prompt templates
- escaping template braces \{ and \} now removes the escape slash when generating prompts for models
- outputs of Prompt Nodes, when chained into other Prompt Nodes, now escape the braces in LLM responses by default. Note that whenever prompts are generated, the escaped braces are cleaned up to just { and }. In response inspectors, input variables will appear with escaped braces, as input variables in ChainForge may themselves be templates.

Future Goals

We've been running pilot studies internally at Harvard HCI and getting some informal feedback.
- One point that keeps coming up echoes Issue https://github.com/ianarawjo/ChainForge/issues/56 , raised by jjordanbaird : the ability to keep chat context and evaluate multiple chatbot turns. We are thinking to implement this as a `Chat Turn Node`, where optionally, one can provide "past conversation" context as input. The overall structure will be similar to Prompt Nodes, except that only Chat Models will be available. See https://github.com/ianarawjo/ChainForge/issues/56 for more details.
- Another issue we're aware of is the need for better documentation on what you can do with ChainForge, particularly on the rather unique feature of chaining prompt templates together.

As always, if you have any feedback or comments, open an Issue or start a Discussion.

0.2

> **Note**
> This release includes a breaking change regarding cache'ing responses. If you are working on a current flow, export your ChainForge flow to a `cforge` file before installing the new version.

We're closer than ever to hosting ChainForge on [chainforge.ai](http://chainforge.ai), so that no installation is required to try it out. Latest changes below.

The entire backend has been rewritten in TypeScript πŸ₯·πŸ§‘β€πŸ’»οΈ

Thousands of lines of Python code, comprising nearly the entire backend, has been rewritten in TypeScript. The mechanism for generating prompt permutations, querying LLMs and cache'ing responses is performed now in the front-end (entirely in the browser). Tests were added in `jest` to ensure the outputs of the TypeScript functions performed the same as their original Python versions. There are additional performance and maintainability benefits to adding static type checking. We've also added ample docstrings, which should help devs looking to get involved.

Functionally, you should not experience any difference (expect maybe a slight speed boost).

Javascript Evaluator Nodes 🧩

Because the application logic has moved to the browser, we added JavaScript evaluator nodes. These let you write evaluation functions in JavaScript, and function the same as Python evaluators.

Here is a side-by-side comparison of JavaScript and Python evaluator nodes, showing semantically equivalent code and the in-node support for displaying console.log and print output:

<img width="678" alt="Screen Shot 2023-06-30 at 12 08 27 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/09da964e-fd07-4cf2-a4c7-04fc0080b722">

When you are running ChainForge on `localhost`, you can still use Python evaluator nodes, which will execute on your local Flask server (the Python backend) as before. JavaScript evaluators run entirely in the browser (specifically, `eval` sandboxed inside an `iframe`).

HuggingFace Models πŸ€—

We added support for querying text generation models hosted on the [HuggingFace Inference API](https://huggingface.co/inference-api). For instance, here is [falcon.7b.instruct](https://huggingface.co/tiiuae/falcon-7b-instruct), an open-source model:

<img width="1107" alt="Screen Shot 2023-06-30 at 2 15 46 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/344fbc65-f4a4-4b9f-9496-3ddb427db34c">

For HF models, there is a 250 token limit. This can sometimes be rather limiting, so we've added a "number of continuations" setting to help with that. You can set it to > 0 to feed the response back into the API for text completions models, which will generate longer completions, for up to 1500 tokens.

We also support [HF Inference Endpoints](https://huggingface.co/inference-endpoints) for text generation models. Simply put the API call URL in the `custom_model` field of the settings window.

Comment Nodes ✏️

You can write comments about your evaluation using a comment node:

<img width="306" alt="Screen Shot 2023-06-30 at 2 18 03 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/e96df294-4b47-4575-9559-61883973d238">

'Browser unsupported' error πŸ’’

If you load ChainForge on a mobile device or unsupported browser, it will now display an error message:

<img width="500" alt="Screen Shot 2023-06-30 at 2 28 32 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/ecfc0b79-9859-4612-8ad2-f8f9bc459469">

This helps for our public release. If you'd like ChainForge to support more browsers, open an Issue or (better yet) make a Pull Request.

Fun example

Finally, I wanted to share a fun practical example: an evaluation to **check if the LLM reveals a secret key**. This evaluation, including all API calls and JavaScript evaluation code, was run entirely in the browser:

<img width="1788" alt="Screen Shot 2023-06-30 at 2 47 39 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/36cab316-419b-4257-980b-f6f6a3c82571">

Questions, concerns?

Open an Issue or start a Discussion!

This was a major, serious change to ChainForge. Although we've written tests, it's possible we have missed something, and there's a bug somewhere. **Note that unfortunately, Azure OpenAI πŸ”· support is again untested following the rewrite, as we don’t have access to it. Someone in the community, let me know if it works for you! (Also, if you work at Microsoft and can give us access, let us know!)**

A browser-based, hosted version of ChainForge will be publicly available July 5th (next Wednesday) on chainforge.ai πŸŒπŸŽ‰

0.1.7.2

This minor release includes two features:

Autosaving

Now, ChainForge autosaves your work to `localStorage` every 60 seconds.
This helps tremendously in case you accidentally close the window without exporting the flow, your systme crashes, or you encounter a bug.

To create a new flow now, just click the New Flow button to get a new canvas.

Plots now have clear with y-axis, x-axis, groupBy selectors on Vis Node

We've added a header bar to the Vis Node, clarifying what is plotted on each axis / dimension:

<img width="588" alt="Screen Shot 2023-06-23 at 9 32 58 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/74fc0f47-9390-4937-836d-77d24daad380">

In addition, as you see above, the y-labels can be up to two lines (~40 chars long), making it easier to read.

Finally, when num of generations per prompt is 1, we now output bar charts by default:

<img width="729" alt="Screen Shot 2023-06-23 at 9 35 06 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/7a1266b2-622a-480a-938d-889950c6e90e">

Box-and-whiskers plots are still used whenever num generations n > 1.

Note that improving the Vis Nodes is a work-in-progress, and functionally, everything is the same as before.

0.1.7

We've made a number of improvements to the inspector UI and beyond.

Side-by-side comparison across LLM responses
Responses now appear side-by-side for up to five LLMs queried:

<img width="1387" alt="Screen Shot 2023-06-21 at 9 27 45 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/e739845c-cee5-422a-8567-505a195331dc">

Collapseable response groups
You can also collapse LLM responses grouped by their prompt template variable, for easier selective inspection. Just click on a response group header to show/hide:

https://github.com/ianarawjo/ChainForge/assets/5251713/452ab3ae-7a74-4b6c-a568-a6f14351b93d

Accuracy plots by default

Boolean (true/false) evaluation metrics now use accuracy plots by default. For instance, for ChainForge's prompt injection example:

<img width="602" alt="Screen Shot 2023-06-21 at 9 27 58 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/2509ca98-3b88-4b36-9e8c-8078b854871a">

This makes it extremely easy to see differences across models for the specified evaluation. Stacked bar charts are still used when a prompt variable is selected. For instance, here is plotting a meta-variable, 'Domain', across two LLMs, testing whether or not the code outputs had an `import` statement (another new feature):

<img width="487" alt="Screen Shot 2023-06-21 at 10 22 51 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/41158437-ad54-4ba2-a989-a5fe071d6408">

Added 'Inspect results' footer to both Prompt and Eval nodes

The tiny response previews footer in the Prompt Node has been changed to 'Inspect Responses' button that brings up a fullscreen response inspector. In addition, evaluation results can be easily inspected by clicking 'Inspect results':

<img width="1560" alt="Screen Shot 2023-06-21 at 10 12 34 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a3b642a7-ca34-475b-b8e7-a42b3f51d03c">

Evaluation scores appear in bold at the top of each response block:

<img width="1392" alt="Screen Shot 2023-06-21 at 10 13 54 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/af4c9e00-576c-4dfd-9308-42f985d46471">

In addition, both Prompt and Eval nodes now load cache'd results upon initialization. Simply load an example flow and click the respective Inspect button.

Added `asMarkdownAST` to `response` object in Evaluator node

Given how often developers wish to parse markdown, we've added a function `asMarkdownAST()` to the `ResponseInfo` class that uses the [`mistune` library](https://mistune.lepture.com/en/latest/) to parse markdown as an abstract syntax tree (AST).

For instance, here's code which detects if an 'import' statement appeared anywhere in the codeblocks of a chat response:

<img width="510" alt="Screen Shot 2023-06-21 at 10 19 51 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/c12c46e3-3371-415b-8ae5-c5819a24fd6a">

0.1.6

Added 188 OpenAI Evals to Example Flows

We've added **188** example flows generated directly from [OpenAI evals](https://github.com/openai/evals) benchmarks.
In Example Flows, navigate to the "OpenAI Evals" tab, and click the benchmark you wish to load:

https://github.com/ianarawjo/ChainForge/assets/5251713/7a498255-3f44-411a-ae9c-dfdb4b789a7b

The code in each Evaluator is the appropriate code for each evaluation, as referenced from the [OpenAI eval-templates doc](https://github.com/openai/evals/blob/main/docs/eval-templates.md).

Example: Tetris problems
For example, I was able to compare GPT-4's performance on `tetris` problems with GPT3.5, simply by loading the eval, adding GPT-4, and pressing run:

<img width="1691" alt="Screen Shot 2023-06-15 at 4 10 36 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/8ea3b4e9-8fbd-44e2-88b1-f9c930717916">

I was curious whether the custom system message had any effect on GPT3.5's performance, so I added a version without it, and in 5 seconds found out that the system message had no effect:

<img width="1684" alt="Screen Shot 2023-06-15 at 4 13 38 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/fbed5c5f-b0a9-4fe8-a910-8b1a1b01385b">

Supported OpenAI evals

A large subset of OpenAI evals are supported. We currently display OpenAI evals with:
- a common system message
- a single 'turn' (prompt)
- evaluation types of 'includes', 'match', and 'fuzzy match',
- and a reasonable number of prompts (e.g., spanish-lexicon, which is not included, has 53,000 prompts)

We hope to add those with model evaluations (e.g., Chain-of-thought prompting) in the near future.

The `cforge` flows were precompiled from the `oiaevals` registry. To save space, the files are not included in the PyPI chainforge package, but rather fetched from GitHub on an as-needed basis. We precompiled the evals to avoid forcing users to install OpenAI evals, as it requires Git LFS, Python 3.9+, and a large number of dependencies.

Note finally that responses are not cache'd for these flows, unlike the other examples --you will need to query OpenAI models yourself to run them.

-----------------
Minor Notes
This release also:
- Changed `Textareas` to contenteditable `p` tags inside Tabular Data Nodes. Though this compromises usability _slightly_, there is a huge gain in performance when loading large tables (e.g., 1000 rows or more), which is required for some OpenAI evals in the examples package.
- Fixed a bug in `VisNode` where a plot was not displaying when a single LLM was present, the number of prompt variables >= 1, and no variables were selected

If you run into any problems using OpenAI evals examples, or with any other part of CF, please let us know.
We could not manually test all of the new example flows, due to how many API calls would be required. Happy ChainForging!

0.1.5.3

This is an emergency release to add basic support for the new OpenAI models and 'function call' ability. It also includes support for Azure OpenAI endpoints, closing Issue 53 .

OpenAI function calls
You can now specify the newest models of ChatGPT, 0613:

<img width="545" alt="Screen Shot 2023-06-13 at 5 36 50 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a47cbb12-2744-4566-9781-09fe4eeb5ce2">

In addition, you can set the value of `functions` by passing a valid JSON schema object. This will be passed to the `functions` of the OpenAI chat completions call:

<img width="646" alt="Screen Shot 2023-06-13 at 5 36 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/6106aab6-88f6-488e-a6ad-eff6b4108870">

I've created a basic example flow to **detect when a given prompt triggers a function call**, using OpenAI's `get_current_weather` example in their press release:

<img width="1432" alt="Screen Shot 2023-06-13 at 5 39 50 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a4d9bd0f-0f51-4a48-9784-c5e3ac342f2b">

In the coming weeks, we will think about making this user experience more streamlined, but for now, enjoy being able to mess around!

Azure OpenAI API support

Thanks to community members chuanqisun , bhctest123 , and levidehaan , we now have added Azure OpenAI support:

![245616817-23e0fcb3-5cee-4d76-8eeb-eb83f5b5fabc](https://github.com/ianarawjo/ChainForge/assets/5251713/df25a6c7-5a52-4274-8a46-d1c9c39a2cc2)

To use Azure OpenAI, you just need to set your keys in ChainForge Settings:

<img width="478" alt="Screen Shot 2023-06-13 at 5 57 24 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/34db6db5-a447-4745-a425-5cc9077e6af7">

And then make sure you set the right Deployment Name in the individual model settings. The settings also includes OpenAI function calls (not sure if you can deploy 0613 models on Azure yet, but it's there).

As always, let us know if you run into any issues.

Collapsing duplicate responses

As part of this release, duplicate LLM responses when num generations `n>1` are now detected and automatically collapsed in Inspectors. The number of duplicates is indicated in the top-right corner:

<img width="386" alt="Screen Shot 2023-06-13 at 12 03 54 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/6f489ff0-6f33-438f-be25-102ce69deb15">

Page 3 of 4

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.