Chainforge

Latest version: v0.3.1.8

Safety actively analyzes 623866 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.3.1.5

This is the first release adding the `MultiEval` node to ChainForge proper, alongside:
- improvements to response inspector table view to display multi-criteria scoring in column view
- table view is now default when multiple evaluators are detected

Voilà:

<img width="1321" alt="Screen Shot 2024-03-17 at 12 21 37 AM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/28dcd7e5-8214-4afc-8691-e7182f8ae2f0">

As you can see, Multi-Eval allows you to define multiple per-response evaluators inside the same node. You can use this to evaluate responses across multiple criteria. Evaluators can be a mix of code or LLM evaluators, as you see fit, and you can change the LLM scorer model on a per-evaluator basis.

This is a "beta" version of the `MultiEval` node, for two reasons:
- The output handle of MultiEval is disabled, since it doesn't yet work with VisNodes to plot data across multiple criteria. That is a separate issue that I didn't want holding up this push. It is coming.
- There are no genAI features in MultiEval, yet, like there are in Code Evaluator nodes. I want to do this right (beyond EvalGen, which is another matter). The idea is that you can describe the criteria in a prompt and the AI will add an evaluator to the list that it thinks is the best, on a per-criteria basis. For now as a workaround, you can use the genAI feature to generate code inside single Code Evaluators and port that code over.

The [`EvalGen` wizard](https://arxiv.org/abs/2404.12272) is also coming, to help users automatically generate evaluation metrics with human supervision. We have a version of this on the `multi-eval` branch (which due to the TypeScript front-end rewrite, we cannot directly merge into `main`), but it doesn't integrate Shreya's fixes.

0.3.1

This change has been in the works for over a month. The most significant change comprises a rewrite of the entire front-end of ChainForge, which is tens of thousands of lines of code, into TypeScript. More details below.

Support for image models (with Dall-E OpenAI models being the first)

<img width="714" alt="Screen Shot 2024-03-30 at 7 03 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/25d3d977-d3b4-4172-ac81-3094dd842e69">

Images are compressed by default using `compressorjs`, with no visible impact on quality and average compression around 60% of the original. Users can turn off compression in the Advanced tab of the Settings window. It is recommended to keep it on, however.

Custom Providers for Image Models
Your custom providers can return image data instead of text. From your provider, you would return a JSON dict in the format: `{ t: "img", d: <base64_str_png_encoded_image> }`. (Only put the base64 data, not the "data...image:png," etc metadata before it.)

Note that we don't yet support:
* Exporting images into cells of an Excel sheet (Export to Excel will be disabled if it detects an image)

> [!WARNING]
> Be warned that images eat up browser storage space fast and will quickly disable autosaving.

Rewrite of the front-end into TypeScript

The entire front-end has been converted to `tsx` files, with the appropriate typing. Other nice refactorings were made along the way.

Among identifying bugs or never reached code, we have worked to simplify and standardize the ways LLMResponses are stored on the backend and front-end. It is not perfect, but now `LLMResponse` and `LLMResponseData` make the format of stored responses transparent to developers.

This change makes it easier for developers to extend ChainForge with confidence, chiefly because TypeScript flags side-effects of changes to core code. For instance, TypeScript enabled us to add image models in only 2 hours, since all that was required was changing the type `LLMResponseData` from a string to a string | object-with-image-data. This change would not have been easy to perform with confidence without the visibility TypeScript provides into the downstream effects of changing a core datatype. In the future, it will help us add support for multi-modal vision models.

Custom Right-click Actions on Nodes

Right-clicking on a node can now present more options.
- TextFields can be converted into Items Nodes
- Items Nodes can be converted into TextFields Nodes
- Prompt and Chat Turn node LLM response **cache can be cleared**:

<img width="420" alt="Screen Shot 2024-03-30 at 7 41 37 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/5dae6ef0-c043-4d7f-9d00-66b5a67ebd9c">

Amazon Bedrock Support

Thanks to massi-ang , Amazon Bedrock hosted models have been added to ChainForge. We've just added these endpoints, and I wasn't able to test them directly (I don't have access), so if you encounter problems please open an Issue and poke massi-ang to let him know.

Nested Add Model Menu on Prompt nodes

<img width="277" alt="Screen Shot 2024-03-30 at 10 47 06 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/5987f4c6-8684-4ef0-a533-5300b38472bd">

Thanks also to massi-ang , clicking Add+ to add a model now brings up a nested menu. The list is still limited (use the Settings to access more specific models), but we were facing a growing problem as the number of providers increased.

Better Rate Limiting

Rate limiting has been improved to use [`Bottleneck`](https://www.npmjs.com/package/bottleneck). Rate limiting is now performed on a rolling timing window rather than the naive "block and wait" approach we used before. Note that this doesn't take into account what "tier" of access you have for OpenAI and Anthropic models, as there's no way for us to know that, so it's based on Tier 2 access. If you encounter a rate limit just re-run the node.

Problems? Let us know!

This change is comprehensive, and while I have tested it extensively, it is possible I have missed something. If you encounter an error, please open an [Issue](https://github.com/ianarawjo/ChainForge/issues).

0.3

Adds new Anthropic Claude 3 models.

* Backend now uses the `messages` API for Claude 2.1+ models.
* Adds the `system` message parameter in Claude settings.

Adds browser-sandboxed Python with [pyodide](https://pyodide.org/en/stable/)

You can now run Python in a safe sandbox entirely in the browser, provided you do not need to import third-party libraries.
The **web-hosted version** at [chainforge.ai/play](https://chainforge.ai/play/) now has Python evaluators unlocked:

<img width="1661" alt="Screen Shot 2024-03-05 at 11 08 46 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/a05ec44e-c99c-426e-a9e7-23b42017b359">

The **local version** of ChainForge includes a toggle to turn sandboxing on or off:

<img width="402" alt="Screen Shot 2024-03-03 at 9 23 18 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/1e2f6be3-2b63-4f57-9c0f-690a7fd62a4b">

If you turn sandboxing off, you go back to the previous Python evaluator, executed on your local machine through the Flask backend. In the non-sandboxed eval node you can import any libraries available in your Python environment.

Why sandboxing?

The benefit of sandboxing is that ChainForge can now be used to execute Python code generated by LLMs, using eval() or exec() in your evaluation function. This was possible before but dangerous and unsafe. Benchmarks that do not rely on third-party libraries, like HumanEvals at pass1 rate, could be run within ChainForge entirely in the web browser (if anyone wants to set this up, let me know!).

add-prettier
Hi folks,

Thanks to PRs https://github.com/ianarawjo/ChainForge/pull/223 and https://github.com/ianarawjo/ChainForge/pull/222 by massi-ang we have added Prettier and ESLint to ChainForge's `main` branch.

`prettier` and `eslint` are now run upon `npm run build`, and you are encouraged to run them before suggesting any PRs onto the ChainForge `main` branch.

We know this is somewhat annoying to anyone building on top of ChainForge, because it may make rebasing on top of latest `main` changes a chore. This includes myself---the changes in `multi-eval` branch, which I have been working on for a while now, are even harder to merge. However, the addition of consistent formatting and linting provides better standards for developer contributions, beyond the ad-hoc approach to writing code we had before.

Recently, I have had less time for code hygiene tasks for this project. However, I think **converting the entire front-end code to TypeScript** is the next step. This would provide more guarantees on dev contributions, may catch existing bugs, and allows us to have a standardized `ResponseObject` format across ChainForge that is enforced and extendable. The latter:
- would provide guarantees on format for people adding their own widgets, about what type of responses are
- would be easily extensible to additional data formats like images as input for GPT4-Vision, or images as responses for DALL-E

Additionally, I envision:
- Better encapsulation of how responses are displayed in Inspectors, i.e. a dedicated React component like `ResponseBox` that can then be extended to handle image outputs, if present.
- Better storage for internal responses (i.e. the one with “query”) that minimizes repeated info for LLM settings. Duplicate info in LLM settings is inflating the size of files fast. LLM at particular settings should be uids to a lookup table.
- Better / updated example flows, e.g. comparing prompts, testing JSON format, multiple evaluations
- Dev docs for how to create a new node

It doesn't seem like LLMs are going anywhere, and evaluating their output quality still suffers from the same issues. If we work together, we can make ChainForge a go-to graphical interface for "testing stuff out" ---rapid prototyping of prompt and chain ideas and rapid testing of LLM behavior, beyond ad-hoc chatting, CLIs or having to type code.

ChainForge is based on transparency and complete control. We always intend to [show the prompts](https://hamel.dev/blog/posts/prompt/) to developers. Developers should have access to the exact settings used for the model, too. If ChainForge adds, say, prompt optimization, it will be important to always show the prompts.

Let us know what you think of these changes, or what you'd like to see in the future. If you are a developer, **please consider contributing!** :)

0.2.9.5

This version includes many changes, including making it much easier to compare across system messages. The [docs have been updated](https://chainforge.ai/docs) to reflect these changes (see for instance the [FAQ](https://chainforge.ai/docs/faq/)).

Adds random sampler toggle to Tabular Data node

<img width="836" alt="random-sampling" src="https://github.com/ianarawjo/ChainForge/assets/5251713/00cd7aa0-9bee-4ee4-98d1-9b06ffe18017">

Adds [settings template variables](https://chainforge.ai/docs/prompt_templates/#settings-variables) of form `{=setting_name}` to allow users to parametrize settings just like prompts.

For instance, here's comparing across system messages:

<img width="1539" alt="compare-sys-msgs" src="https://github.com/ianarawjo/ChainForge/assets/5251713/dd170cee-bd4b-4da7-9068-ab508f1a39ac">

Here's another example, comparing across temperatures:

<img width="698" alt="settings-vars" src="https://github.com/ianarawjo/ChainForge/assets/5251713/8bc185d9-de77-49d7-855b-ab49a5b8a000">

The [docs](https://chainforge.ai/docs/prompt_templates/#settings-variables) have also been amended to explain these new functionalities.

Smaller changes / bug fixes / QOL improvements

- Removes red notification dots, which could become annoying
- Fully clears the ReactFlow state before loading in a new flow
- Debounces updating the template hooks in Prompt Nodes, when user is editing the prompt
- Keep tracks of the provenance of responses by adding a `uid` parameter. This is specifically to keep track of which batch a response came from when Num generations per prompt `n`>1. This corrects an issue in the Evaluator inspectors where `n`>1 outputs were broken up.

0.2.8.9

Dalai support has been replaced by [Ollama](https://github.com/jmorganca/ollama) 🦙. You can now add local [models hosted via Ollama](https://ollama.ai/library), including llama2, mistral, codellama, vicuna, etc.

Thanks to laurenthuberdeau ! 🎉

0.2.8

Adds purple sparkly generative AI button to TextFields and Items Nodes, courtesy of shawseanyang !

This button gives you easy access to generating input values using LLM calls to OpenAI GPT-3.5 and GPT-4 models. You can access to two features:
- **Replace**: Replace the existing fields with new fields, given the prompt entered.
- **Extend**: Given the existing fields, extrapolate the pattern and extend the list the best you can.

![image](https://github.com/ianarawjo/ChainForge/assets/5251713/fb92921f-2ef8-4e89-910b-575059f9ed13)

Try it out on [chainforge.ai/play](https://chainforge.ai/play/), or install locally via `pip install chainforge --upgrade` (BYO OpenAI API key!)

_Note: You must have input an OpenAI API key (either directly in Settings, or via environment variables) to use generative AI features. In the future, we might support letting users change the model used for this feature, if there's interest (or if you submit a PR! :)_

Yes, prompts generate too... sort of!

Replace will also consider if you ask for prompts and can generate prompt templates. Extend will also consider prompt templates and try to keep to your existing template variables. If you use prompt generation, you are probably best off Extending after manually entering 2-3 prompts/prompt templates as examples.

Note that prompt template generation is very experimental atm, and we're working on improving this aspect.

<img width="581" alt="Screen Shot 2023-12-12 at 10 34 45 PM" src="https://github.com/ianarawjo/ChainForge/assets/5251713/2874ae94-81c4-43b2-9c57-6f21b6810647">

This feature is in BETA. There may be rough edges, mistakes in generation, etc.
However, we've been enjoying it greatly and found it very helpful for speeding up input data generation. Play around and let us know what you think!

_If you don't want to see the buttons, you can toggle off AI support in the Settings Window. Also, you can toggle on the Autocomplete feature on TextFields nodes, if you're feeling experimental! :)_

This feature represents a semester's worth of work from shawseanyang! Thank you Sean! 🎉🥳

---------------------
Other small changes in this release:
- the max `Num of responses per prompt` counter on Prompt Nodes has been increased to 999
- In-browser autosaving now disables if you start working with lots of LLM responses (talking files upwards of 20MB or more). `localStorage` can only handle so much.
- Relatedly, when you tab away from the ChainForge browser tab, autosaving will not occur in the background. This is to save you performance and help with the check above.
- API keys are now loaded from environment variables only upon load of the application, rather than every call, for consistency.

Page 1 of 4

Releases

Has known vulnerabilities

Chainforge

Page 1 of 4

0.3.1.5

0.3.1

0.3

0.2.9.5

0.2.8.9

0.2.8

Page 1 of 4

Links

Releases