Query Redshift Data with Natural Language
GPT Action workflow that connects ChatGPT to AWS Redshift data warehouses via AWS Lambda middleware, enabling SQL query execution and data analysis.
Why it matters
Leverage ChatGPT's natural language capabilities to connect to Redshift's Data Warehouse. Enable data scientists and citizen data users to query transactional data and gain visibility into their data.
Outcomes
What it gets done
Connect to Redshift tables using natural language.
Perform data analysis directly within ChatGPT.
Ask basic questions of transactional data.
Retrieve data from Redshift as files for further analysis.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-gptactionredshift | bash Steps
Steps in the chain
Install AWS CLI (required for AWS SAM), AWS SAM CLI, Python, and yq. See documentation links for installation instructions for each tool.
Follow the steps in the AWS Middleware Action cookbook to create a function. Clone the redshift-middleware repository: git clone https://github.com/pap-openai/redshift-middleware && cd redshift-middleware
Find the network used by Redshift through the AWS console under Amazon Redshift Serverless > Workgroup configuration > your_workgroup > Data access, or through the CLI. Note the Security Group ID and Subnet IDs.
Copy env.sample.yaml to env.yaml and fill in the Redshift connection details: RedshiftHost, RedshiftPort, RedshiftUser, RedshiftPassword, RedshiftDb, SecurityGroupId, and SubnetIds (1-6).
Deploy the function using SAM with: PARAM_FILE="env.yaml" && PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")') && sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMS
From the SAM deploy command output, retrieve the URL information for your deployed function. This URL will be used for cURL requests to execute queries.
Create a Custom GPT and copy the provided Custom GPT Instructions into the Instructions panel.
In the Custom GPT Actions panel, copy the provided OpenAPI Schema. Ensure it matches the file retrieval structure and passes the query parameter. Update the function app name based on your deployment.
Follow the steps in the AWS Middleware cookbook to set up authentication for the GPT Action, ensuring users in Cognito can access the service.
Overview
GPT Action Library: AWS RedShift
What it does
AWS Lambda middleware connecting ChatGPT to Redshift for SQL query execution
How it connects
When you need to query Redshift data warehouses through ChatGPT conversations
Source README
GPT Action Library: AWS RedShift
Introduction
This page provides an instruction & guide for developers building a GPT Action for a specific application. Before you proceed, make sure to first familiarize yourself with the following information:
- Introduction to GPT Actions
- Introduction to GPT Actions Library
- Example of Building a GPT Action from Scratch
This solution enables a GPT action to retrieve data from Redshift and perform data analysis.It uses AWS Functions, performing every action from AWS ecosystem and network. The middleware (AWS function) will perform the SQL query, wait for its completion and return the data as a file. The code is provided for information purpose only and should be modified to your needs.
This solution uses the ability to retrieve files in Actions and use them as if you had uploaded them directly to a conversation.
This solution highlight a connection to Redshift serverless, the integration with a provisioned Redshift might differ slighltly to retrieve networks and set-up connection, the overall code and (minimal) integration should be similar.
Value & Example Business Use Cases
Value: Leverage ChatGPT's natural language capabilities to connect to Redshift's DWH.
Example Use Cases:
- Data scientists can connect to tables and run data analyses using ChatGPT's Data Analysis
- Citizen data users can ask basic questions of their transactional data
- Users gain more visibility into their data & potential anomalies
Application Information
Application Prerequisites
Before you get started, make sure that:
- You have access to a Redshift environment
- You have the rights to deploy AWS function in the same VPC (Virtual Private Network)
- Your AWS CLI is authenticated
Middleware Information
Install required libraries
- Install AWS CLI, required for AWS SAM (docs)
- Install AWS SAM CLI (docs)
- Install Python
- Install yq docs
Middleware function
To create a function, follow the steps in the AWS Middleware Action cookbook.
To deploy specifically an application that connects to Redshift, use the following code instead of the "hello-world" GitHub repository referenced in the Middleware AWS Function cookbook. You can either clone the repository or take the code pasted below and modify it to your needs.
This code is meant to be directional - while it should work out of the box, it is designed to be customized to your needs (see examples towards the end of this document).
To get the code, you can clone openai-cookbook repository and navigate to the redshift-middleware directory
git clone https://github.com/pap-openai/redshift-middleware
cd redshift-middleware
Retrieve VPC information
We will need to connnect our function to Redshift, therefore we need to find the network used by Redshift. You can find this on your Redshift interface the AWS console, under Amazon Redshift Serverless > Workgroup configuration > your_workgroup > Data access, or through the CLI:
Set up AWS function
Copy env.sample.yaml to env.yaml and replace with the values obtained above. You will need a Redshift user with access to your DB/schema.
cp env.sample.yaml env.yaml
Fill in env.yaml with the values retrieved by the previous command as well as your credentials to Redshift.
Alternatively, you can create a file named env.yaml manually and fill the following variables:
RedshiftHost: default-workgroup.xxxxx.{region}.redshift-serverless.amazonaws.com
RedshiftPort: 5439
RedshiftUser: username
RedshiftPassword: password
RedshiftDb: my-db
SecurityGroupId: sg-xx
SubnetId1: subnet-xx
SubnetId2: subnet-xx
SubnetId3: subnet-xx
SubnetId4: subnet-xx
SubnetId5: subnet-xx
SubnetId6: subnet-xx
This file will be used to deploy your function with parameters, as shown below:
PARAM_FILE="env.yaml"
PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")')
sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMS
The template.yaml has the following content:
Retrieve the URL information from the previous command output, you can then run a cURL request, which should return data in a file format:
ChatGPT Steps
Custom GPT Instructions
Once you've created a Custom GPT, copy the text below in the Instructions panel.
OpenAPI Schema
Once you've created a Custom GPT, copy the text below in the Actions panel.
This expects a response that matches the file retrieval structure in our doc here and passes in a query as a parameter to execute.
Make sure to follow the steps in the AWS Middleware cookbook to set up authentication.
Make sure to switch the function app name based on your function deployment.
Conclusion
You now have deployed a GPT that uses a middleware in AWS, in an authenticated manner, that's able to connect to Redsfhit. Users with access (that are in Cognito) can now query your databases to perform data analysis task:
Step 1: Install required libraries
Install AWS CLI (required for AWS SAM), AWS SAM CLI, Python, and yq. See documentation links for installation instructions for each tool.
Step 2: Create AWS middleware function
Follow the steps in the AWS Middleware Action cookbook to create a function. Clone the redshift-middleware repository: git clone https://github.com/pap-openai/redshift-middleware && cd redshift-middleware
Step 3: Retrieve VPC information
Find the network used by Redshift through the AWS console under Amazon Redshift Serverless > Workgroup configuration > your_workgroup > Data access, or through the CLI. Note the Security Group ID and Subnet IDs.
Step 4: Set up AWS function configuration
Copy env.sample.yaml to env.yaml and fill in the Redshift connection details: RedshiftHost, RedshiftPort, RedshiftUser, RedshiftPassword, RedshiftDb, SecurityGroupId, and SubnetIds (1-6).
Step 5: Deploy AWS function
Deploy the function using SAM with: PARAM_FILE="env.yaml" && PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")') && sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMSStep 6: Retrieve function URL
From the SAM deploy command output, retrieve the URL information for your deployed function. This URL will be used for cURL requests to execute queries.
Step 7: Create Custom GPT
Create a Custom GPT and copy the provided Custom GPT Instructions into the Instructions panel.
Step 8: Configure OpenAPI Schema
In the Custom GPT Actions panel, copy the provided OpenAPI Schema. Ensure it matches the file retrieval structure and passes the query parameter. Update the function app name based on your deployment.
Step 9: Set up authentication
Follow the steps in the AWS Middleware cookbook to set up authentication for the GPT Action, ensuring users in Cognito can access the service.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.