Falcon-evaluate

Latest version: v0.1.13.0

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.1.12.0

Evaluating the alignment of LLM outputs with Shannon Vallor's 12 techno-moral values presents a unique approach to embedding ethical considerations into AI systems. Here are the benefits of using the Falcon-evaluate Machine_ethics module for LLM output evaluation:

Enhanced Ethical Oversight:

The module offers a systematic way to ensure that the outputs of language models align with predefined ethical values. This is particularly crucial in fields like journalism, education, and public communications, where the ethical implications of generated content are significant.

2. Automated Value Alignment Check:
Manually checking LLM outputs for alignment with specific values can be time-consuming and prone to human error. An automated classifier Falcon-evaluate Machine_ethics module can provide quick and consistent assessments, making it a valuable tool for moderating content at scale.

🔍 Honesty - being truthful and transparent in one's interactions and intentions.
🧘 Self-control - the ability to regulate one's emotions, thoughts, and behaviors in the face of temptations and impulses.
🌱 Humility - recognizing and accepting one's limitations and the value and contributions of others.
⚖️ Justice - being committed to fairness and treating others with respect and equity.
🦁 Courage - the willingness to take risks or endure hardship to achieve a moral or worthwhile goal.
💞 Empathy - the capacity to understand and share the feelings of another.
🤲 Care - showing concern for the well-being of others and acting to promote and protect their interests.
🤝 Civility - showing respect for others, especially in the face of disagreement or conflict.
🔄 Flexibility - being willing and able to adapt one's beliefs and actions in response to changing circumstances or new information.
🌈 Tolerance - accepting and respecting differences and diversity in beliefs, values, and practices.
🔭 Perspective - the ability to understand and consider different viewpoints and the broader context of one's actions.
👑 Magnanimity - being generous, forgiving, and noble in spirit, especially towards rivals or those less powerful.

0.1.11.0

The primary goal of the Fairness Module is to evaluate LLMs for:

📊 Stereotypes: Assessing stereotypes through agreement on stereotypes, stereotype recognition, and stereotype query test.

🔍 Stereotypes Classes Identified

The Fairness Module - stereotypes identifies nine classes for evaluation:

🚫 unrelated: The text does not indicate any stereotype.
👫 stereotype_gender: The text indicates a gender stereotype.
🔄 anti-stereotype_gender: The text indicates an anti-gender stereotype.
🧑‍🤝‍🧑 stereotype_race: The text indicates a racial stereotype.
🔄 anti-stereotype_race: The text indicates an anti-racial stereotype.
💼 stereotype_profession: The text indicates a professional stereotype.
🔄 anti-stereotype_profession: The text indicates an anti-professional stereotype.
⛪ stereotype_religion: The text indicates a religious stereotype.
🔄 anti-stereotype_religion: The text indicates an anti-religious stereotype.

0.1.10.0

0.1.9.2

0.1.8.1

Enhancement: Multi-Dimensional Evaluation Metrics Feature
1. Falcon Performance Quadrant
2. Documentation updated

0.1.7.1

6 or GAFE-11 Enhancement: Multi-Dimensional Evaluation Metrics Feature

Description

In order to provide a more comprehensive and meaningful evaluation of our Language Models (LMs) through the Falcon Framework, I propose the addition of a multi-dimensional evaluation metrics feature. This feature should categorize and calculate various evaluation metrics organized into five distinct categories:

Readability and Complexity:

Automated Readability Index (ARI)
Flesch-Kincaid Grade Level
Language Modeling Performance:

Perplexity
Text Toxicity:

Toxicity Level
Text Similarity and Relevance:

BLEU Score
Cosine Similarity
Semantic Similarity
Jaccard Similarity
Information Retrieval:

Precision
Recall
F1-Score

Helps the ML engineers to come up with below quadrant to select the best model and config for production

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.