Prompt Chain

Moderate LLM Inputs and Outputs

Safeguard LLM inputs/outputs with Llama Guard. Classify content based on safety taxonomies.


91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Safeguard your RAG pipeline by moderating LLM inputs and outputs using Llama Guard. Ensure content safety and compliance with customizable or default safety taxonomies.

Outcomes

What it gets done

01

Classify LLM inputs for safety.

02

Classify LLM outputs for safety.

03

Moderate content based on predefined or custom safety categories.

04

Integrate Llama Guard into RAG pipelines.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-llama-guard-moderator | bash

Steps

Steps in the chain

01
Download the Llama Guard Moderator Pack

Download the pack to the ./llamaguard_pack directory using the download_llama_pack function from llama_index.core.llama_pack. This will download and install dependencies for the LlamaGuardModeratorPack.

02
Set Hugging Face Access Token

Before constructing the pack, set your Hugging Face access token as an environment variable: os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############". This token is required to access the LlamaGuard-7b model.

03
Construct the Pack

Construct the pack using either a blank constructor for the out-of-the-box safety taxonomy (LlamaGuardModeratorPack()) or by passing in a custom taxonomy for unsafe categories (LlamaGuardModeratorPack(custom_taxonomy)).

04
Run the Pack

Use the run() function to moderate input/output messages. Pass the message string to the function, which will moderate it through Llama Guard and return a response of 'safe' or 'unsafe'. When unsafe, it also outputs the unsafe category from the taxonomy.

Overview

Llama Guard Moderator Pack

What it does

This pack uses Llama Guard to classify LLM inputs and outputs based on safety taxonomies. It can use default taxonomies or allow for custom ones. The pack requires specific hardware and Hugging Face access.

How it connects

Use this pack for content moderation of LLM inputs and outputs based on safety taxonomies. Do not use if you lack the necessary hardware (GPU, high RAM) or Hugging Face access.

Source README

Llama Guard Moderator Pack

This pack is to utilize Llama Guard to safeguard the LLM inputs and outputs of a RAG pipeline. Llama Guard is an input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and LLM responses (response classification). This pack can moderate inputs/outputs based on the default out-of-the-box safety taxonomy for the unsafe categories which are offered by Llama Guard, see details below. It also allows the flexibility to customize the taxonomy for the unsafe categories to tailor to your particular requirements, see sample usage scenarios 3 and 4 below.

Llama Guard safety taxonomy:

  • Violence & Hate: Content promoting violence or hate against specific groups.
  • Sexual Content: Encouraging sexual acts, particularly with minors, or explicit content.
  • Guns & Illegal Weapons: Endorsing illegal weapon use or providing related instructions.
  • Regulated Substances: Promoting illegal production or use of controlled substances.
  • Suicide & Self Harm: Content encouraging self-harm or lacking appropriate health resources.
  • Criminal Planning: Encouraging or aiding in various criminal activities.

CLI Usage

You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamapack LlamaGuardModeratorPack --download-dir ./llamaguard_pack

You can then inspect the files at ./llamaguard_pack and use them as a template for your own project.

Code Usage

Prerequisites

Llama Guard's source code is located in a gated GitHub repository. What it means is that you need to request access from both Meta and Hugging Face in order to use LlamaGuard-7b, and obtain a Hugging Face access token, with write privileges for interactions with LlamaGuard-7b. The detailed instructions and form to fill out are listed on the LlamaGuard-7b model card. It took me less than 24 hours to get access from both Meta and Hugging Face.

Please note that running LlamaGuard-7b requires hardware, both GPU and high RAM. I tested in Google Colab and ran into OutOfMemory error with T4 high RAM, even V100 high RAM was on the borderline, may or may not run into memory issue depending on demands. A100 worked well.

Download the pack

You can download the pack to a the ./llamaguard_pack directory:

from llama_index.core.llama_pack import download_llama_pack

### download and install dependencies
LlamaGuardModeratorPack = download_llama_pack(
    "LlamaGuardModeratorPack", "./llamaguard_pack"
)

Construct the pack

Before constructing the pack, be sure to set your Hugging Face access token (see Prerequisites section above) as your environment variable.

os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############"

You then construct the pack with either a blank constructor, see below, which uses the out-of-the-box safety taxonomy:

llamaguard_pack = LlamaGuardModeratorPack()

Or you can construct the pack by passing in your custom taxonomy for unsafe categories (see sample custom taxonomy at the bottom of this page):

llamaguard_pack = LlamaGuardModeratorPack(custom_taxonomy)

Run the pack

From here, you can use the pack, or inspect and modify the pack in ./llamaguard_pack.

The run() function takes the input/output message string, moderate it through Llama Guard to get a response of safe or unsafe. When it's unsafe, it also outputs the unsafe category from the taxonomy.

moderator_response = llamaguard_pack.run("I love Christmas season!")

Usage Pattern in RAG Pipeline

We recommend you first define a function such as the sample function moderate_and_query below, which takes the query string as the input, moderates it against Llama Guard's default or customized taxonomy, depending on how your pack i

Step 1: Download the Llama Guard Moderator Pack

Download the pack to the ./llamaguard_pack directory using the download_llama_pack function from llama_index.core.llama_pack. This will download and install dependencies for the LlamaGuardModeratorPack.

Step 2: Set Hugging Face Access Token

Before constructing the pack, set your Hugging Face access token as an environment variable: os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############". This token is required to access the LlamaGuard-7b model.

Step 3: Construct the Pack

Construct the pack using either a blank constructor for the out-of-the-box safety taxonomy (LlamaGuardModeratorPack()) or by passing in a custom taxonomy for unsafe categories (LlamaGuardModeratorPack(custom_taxonomy)).

Step 4: Run the Pack

Use the run() function to moderate input/output messages. Pass the message string to the function, which will moderate it through Llama Guard and return a response of 'safe' or 'unsafe'. When unsafe, it also outputs the unsafe category from the taxonomy.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.