Prompt Chain

Generate Privacy-Safe Synthetic Data

Name: Generate Privacy-Safe Synthetic Data
Availability: OnlineOnly
Author: LlamaIndex

LlamaIndex pack that generates differentially private synthetic datasets from sensitive data, preserving original attributes while minimizing performance

Copy chain

Works with openai

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 4 days ago

Version 0.14.22

Models

gpt 4o llama 3

Add to Favorites

Why it matters

Create differentially private synthetic data from sensitive datasets, enabling privacy-safe downstream processing and LLM consumption without additional privacy costs.

Outcomes

What it gets done

Generate synthetic data examples with differential privacy.

Obscure source data while preserving original attributes.

Prepare privacy-safe datasets for LLM prompt ingestion.

Integrate with LLMs that produce LogProbs for generation.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-diff-private-simple-dataset | bash

Steps

Steps in the chain

Construct DiffPrivateSimpleDatasetPack object

Create a DiffPrivateSimpleDatasetPack object with the following parameters: 1) an LLM (must return CompletionResponse), 2) its associated tokenizer, 3) a PromptBundle object containing parameters for prompting the LLM, 4) a LabelledSimpleDataset, 5) [Optional] sephamore_counter_size to reduce RateLimitError chances, 6) [Optional] sleep_time_in_seconds to reduce RateLimitError chances.

Download and customize the pack as template

Download the DiffPrivateSimpleDatasetPack as a template using download_llama_pack() function. This allows you to customize the pack further for your specific needs before instantiating it with your LLM, tokenizer, prompt_bundle, and simple_dataset.

Call the run() function

Execute the run() function which is a light wrapper around query_engine.query(). This function requires parameters including t_max (the max number of tokens) and processes the dataset to generate privacy-safe synthetic examples.

Overview

LlamaIndex Packs: `DiffPrivateSimpleDatasetPack`

What it does

A LlamaIndex pack implementing differential privacy techniques to create synthetic datasets from sensitive source data, designed for safe use in LLM workflows.

How it connects

Use when you need to generate privacy-preserving synthetic examples from labeled datasets containing sensitive information that will be passed to LLMs for downstream processing.

Source README

Description pending for li-pack-packs-diff-private-simple-dataset.

Step 1: Construct DiffPrivateSimpleDatasetPack object

Create a DiffPrivateSimpleDatasetPack object with the following parameters: 1) an LLM (must return CompletionResponse), 2) its associated tokenizer, 3) a PromptBundle object containing parameters for prompting the LLM, 4) a LabelledSimpleDataset, 5) [Optional] sephamore_counter_size to reduce RateLimitError chances, 6) [Optional] sleep_time_in_seconds to reduce RateLimitError chances.

Step 2: Download and customize the pack as template

Download the DiffPrivateSimpleDatasetPack as a template using download_llama_pack() function. This allows you to customize the pack further for your specific needs before instantiating it with your LLM, tokenizer, prompt_bundle, and simple_dataset.

Step 3: Call the run() function

Execute the run() function which is a light wrapper around query_engine.query(). This function requires parameters including t_max (the max number of tokens) and processes the dataset to generate privacy-safe synthetic examples.

Discussion