Moderate LLM Inputs and Outputs
Safeguard LLM inputs/outputs with Llama Guard. Classify content based on safety taxonomies.
Why it matters
Safeguard your RAG pipeline by moderating LLM inputs and outputs using Llama Guard. Ensure content safety and compliance with customizable or default safety taxonomies.
Outcomes
What it gets done
Classify LLM inputs for safety.
Classify LLM outputs for safety.
Moderate content based on predefined or custom safety categories.
Integrate Llama Guard into RAG pipelines.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/li-pack-packs-llama-guard-moderator | bash Steps
Steps in the chain
Download the pack to the ./llamaguard_pack directory using the download_llama_pack function from llama_index.core.llama_pack. This will download and install dependencies for the LlamaGuardModeratorPack.
Before constructing the pack, set your Hugging Face access token as an environment variable: os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############". This token is required to access the LlamaGuard-7b model.
Construct the pack using either a blank constructor for the out-of-the-box safety taxonomy (LlamaGuardModeratorPack()) or by passing in a custom taxonomy for unsafe categories (LlamaGuardModeratorPack(custom_taxonomy)).
Use the run() function to moderate input/output messages. Pass the message string to the function, which will moderate it through Llama Guard and return a response of 'safe' or 'unsafe'. When unsafe, it also outputs the unsafe category from the taxonomy.
Overview
Llama Guard Moderator Pack
What it does
This pack uses Llama Guard to classify LLM inputs and outputs based on safety taxonomies. It can use default taxonomies or allow for custom ones. The pack requires specific hardware and Hugging Face access.
How it connects
Use this pack for content moderation of LLM inputs and outputs based on safety taxonomies. Do not use if you lack the necessary hardware (GPU, high RAM) or Hugging Face access.
Source README
Llama Guard Moderator Pack
This pack is to utilize Llama Guard to safeguard the LLM inputs and outputs of a RAG pipeline. Llama Guard is an input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and LLM responses (response classification). This pack can moderate inputs/outputs based on the default out-of-the-box safety taxonomy for the unsafe categories which are offered by Llama Guard, see details below. It also allows the flexibility to customize the taxonomy for the unsafe categories to tailor to your particular requirements, see sample usage scenarios 3 and 4 below.
Llama Guard safety taxonomy:
- Violence & Hate: Content promoting violence or hate against specific groups.
- Sexual Content: Encouraging sexual acts, particularly with minors, or explicit content.
- Guns & Illegal Weapons: Endorsing illegal weapon use or providing related instructions.
- Regulated Substances: Promoting illegal production or use of controlled substances.
- Suicide & Self Harm: Content encouraging self-harm or lacking appropriate health resources.
- Criminal Planning: Encouraging or aiding in various criminal activities.
CLI Usage
You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:
llamaindex-cli download-llamapack LlamaGuardModeratorPack --download-dir ./llamaguard_pack
You can then inspect the files at ./llamaguard_pack and use them as a template for your own project.
Code Usage
Prerequisites
Llama Guard's source code is located in a gated GitHub repository. What it means is that you need to request access from both Meta and Hugging Face in order to use LlamaGuard-7b, and obtain a Hugging Face access token, with write privileges for interactions with LlamaGuard-7b. The detailed instructions and form to fill out are listed on the LlamaGuard-7b model card. It took me less than 24 hours to get access from both Meta and Hugging Face.
Please note that running LlamaGuard-7b requires hardware, both GPU and high RAM. I tested in Google Colab and ran into OutOfMemory error with T4 high RAM, even V100 high RAM was on the borderline, may or may not run into memory issue depending on demands. A100 worked well.
Download the pack
You can download the pack to a the ./llamaguard_pack directory:
from llama_index.core.llama_pack import download_llama_pack
### download and install dependencies
LlamaGuardModeratorPack = download_llama_pack(
"LlamaGuardModeratorPack", "./llamaguard_pack"
)
Construct the pack
Before constructing the pack, be sure to set your Hugging Face access token (see Prerequisites section above) as your environment variable.
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############"
You then construct the pack with either a blank constructor, see below, which uses the out-of-the-box safety taxonomy:
llamaguard_pack = LlamaGuardModeratorPack()
Or you can construct the pack by passing in your custom taxonomy for unsafe categories (see sample custom taxonomy at the bottom of this page):
llamaguard_pack = LlamaGuardModeratorPack(custom_taxonomy)
Run the pack
From here, you can use the pack, or inspect and modify the pack in ./llamaguard_pack.
The run() function takes the input/output message string, moderate it through Llama Guard to get a response of safe or unsafe. When it's unsafe, it also outputs the unsafe category from the taxonomy.
moderator_response = llamaguard_pack.run("I love Christmas season!")
Usage Pattern in RAG Pipeline
We recommend you first define a function such as the sample function moderate_and_query below, which takes the query string as the input, moderates it against Llama Guard's default or customized taxonomy, depending on how your pack i
Step 1: Download the Llama Guard Moderator Pack
Download the pack to the ./llamaguard_pack directory using the download_llama_pack function from llama_index.core.llama_pack. This will download and install dependencies for the LlamaGuardModeratorPack.
Step 2: Set Hugging Face Access Token
Before constructing the pack, set your Hugging Face access token as an environment variable: os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_###############". This token is required to access the LlamaGuard-7b model.
Step 3: Construct the Pack
Construct the pack using either a blank constructor for the out-of-the-box safety taxonomy (LlamaGuardModeratorPack()) or by passing in a custom taxonomy for unsafe categories (LlamaGuardModeratorPack(custom_taxonomy)).
Step 4: Run the Pack
Use the run() function to moderate input/output messages. Pass the message string to the function, which will moderate it through Llama Guard and return a response of 'safe' or 'unsafe'. When unsafe, it also outputs the unsafe category from the taxonomy.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.