Context: Retrieval Augmented Generation (RAG) chatbots
Large Language Models (LLMs) are trained on a vast amount of data, which they learn from and can answer questions about. However pre-trained LLMs are not able to provide answers to data outside their training set. RAG is a technique that augments the user’s query with additional relevant contextual data. This allows the LLM to see additional data at query time and respond to questions about new data sources (e.g. an organization’s internal confluence pages).
Building a RAG chatbot that can query your internal company data is no longer a research topic, but possible by developers today. Tools and libraries, such as llama-index and langchain, make it easy to get started with just a few lines of code. LLM vendors such as OpenAI even allow you to upload documents to try RAG for yourself!
Challenge: Protecting Private Information
When building a prototype RAG chatbot for your organization, you will want to be able to ask questions about internal documents that might contain sensitive PII data. To get useful answers to your questions, you likely need to send this PII data outside your company's governance boundary to LLM providers like OpenAI. These external LLMs are likely to give the best responses and therefore the best user experience.
Sending sensitive data outside of your organization’s governance boundary would require appropriate approval from security and governance departments. OpenAI’s terms detail that they will not use API data for training, but they state that data is still logged and can be viewed by OpenAI engineers in some situations.
Our Solution: Encryption of PII
Our approach to building a prototype chatbot involved encrypting PII data before sending it to OpenAI. When responses are generated, we decrypt the PII back into plain text so the user experience is maintained and meaningful answers are given.
The talk will describe:
- Background primer on RAG chatbots including
-- How source data is indexed into a semantic vector store using embeddings.
-- How the vector store is used at query time to pull in relevant context to answer the user’s question.
-- How prompt engineering guides LLMs to use the additional information in a useful manner.
- The architecture of the data pipeline that ingests documents into the vector store
- Methods for detecting sensitive text in source data using both rule-based and ML-based techniques.
- Encryption & decryption methods to hide PII from LLM queries.
- Methods to implement knowledge base access controls - restricting users to only query documents they are allowed to read.