Retrieval Augmented Generation (RAG) has become a standard approach to building LLM interfaces on top of specific data, but it can struggle in cases where the input doesn't overlap meaningfully with the stored embeddings.
In this post, I want to explore a solution to this by introducing a thought-generation step. The process works as follows:
- Text chunks are first passed to a prompt to generate thoughts about the passage.
- The generated thoughts are then embedded
- RAG interface uses the generated thoughts as context
All tests are done using chatgpt-3.5-turbo
and can be run using the notebook.