Protecting RAG Data and Apps Through Authorization (new security guide article)

It is easy in LLM apps that use RAG to overshare data — to expose data to unauthorized parties. Two examples:

  1. The app may have access to a larger set of data than the current user should be able to see or know about, but some of that extra data gets passed into the LLM and shown to the user in the LLM response.
  2. Documents with sensitive information are fetched by the app ahead of time from a file store. They are chunked, embedded, and stored in a vector database for use with user prompts in which it is relevant. However, the embedded chunks are not tied back to the source document, or the appropriate access control is not recorded, leading to sensitive data disclosure.

Oversharing can occur both from attacks and to unsuspecting users.

Proper access control on RAG-retrieved data can prevent these. It also helps prevent tampering with data stored by RAG, preventing prompt injections and other attacks.

“Protecting RAG Data and Apps Through Authorization” sheds light on the security concerns here, explores what roles authorization can play with RAG, and discusses determining the correct authorization to have in place.

This article is on the Secure by Design Education Hub. If you think this is valuable, please consider passing it along. And if you have any feed.back or suggestions, we’d love to hear from you.

1 Like