Prompt Chain

Evaluate Model Factuality on HuggingFace Datasets

Name: Evaluate Model Factuality on HuggingFace Datasets
Availability: OnlineOnly
Author: Promptfoo

Demonstrates evaluating model factuality using the TruthfulQA dataset from HuggingFace. Questions are crafted to elicit common misconceptions.

Copy chain

Works with huggingface github

Promptfoo

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Add to Favorites

Why it matters

Assess the factual accuracy of language models using the TruthfulQA dataset. Ensure your AI avoids generating common misconceptions and provides truthful answers.

Outcomes

What it gets done

Utilize the TruthfulQA dataset from HuggingFace.

Test language models for factual correctness.

Identify and mitigate the generation of false answers.

Evaluate model performance on questions designed to elicit misconceptions.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-huggingface-dataset-factuality | bash

Capabilities

What this chain does

Classify

Labels or categorizes text, files, or data points.

Summarize

Condenses long documents or threads into key takeaways.

Extract

Pulls structured data fields from unstructured text.

Overview

Huggingface Dataset Factuality

What it does

This example demonstrates how to evaluate model factuality using the TruthfulQA dataset from HuggingFace. The TruthfulQA dataset is designed to test whether language models can avoid generating false answers by crafting questions that might elicit common misconceptions.

How it connects

Use this example when you need to test whether language models can avoid generating false answers to questions designed to elicit common misconceptions.

Source README

Discussion