RAG-Tutorial
Learn RAG from scratch!
Lecture Notes for This Youtube Class.[1]
Basic Knowledge
Retrieval-Augmented Generation (RAG) is a hybrid framework that combines the strengths of retrieval-based and generative models for natural language processing tasks. The basic principle of RAG involves two key components: a retriever and a generator.
The retriever is responsible for fetching relevant documents or information from a large corpus or knowledge base, typically using dense vector representations to find the most pertinent data for a given query. This ensures that the model has access to accurate, up-to-date external information beyond its training data.
The generator, usually a pre-trained language model, then uses this retrieved information to produce coherent and contextually appropriate responses. By grounding its outputs in retrieved facts, RAG reduces hallucinations and improves factual accuracy compared to purely generative models.
Basic Steps:
Load documents using
bs4
using Crawler.Split text using function of
RecursiveCharacterTextSplitter()
fromlangchain.text_splitter
.Embedding splits into vectors, using
Chroma
fromlangchain_community.vectorstores
andOpenAIEmbeddings
and return the vector store retriever.Define RAG-chain (including prompt and LLM)
1
2
3
4
5
6
7# Chain
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)Using the
invoke
function to get the generation text!
Query-Translation
Multi Query
Intuition: For a large problem, it is likely to encounter situations where “words fail to convey meaning,” and if the answer to the problem is relatively complex, a simple RAG system may not achieve good results through direct vector comparison. Therefore, we can decompose the problem and use parallel retrieval.
How to make the splitting works? You can use Prompt Engineering and add templates:
1 |
|
In this code, we define the workchain of generate_queries
, where we let OpenAI model to split question using prompts.
This is a demo of how AI generate:
1 |
|
1 |
|
The core chain of retrieval is retrieval_chain = generate_queries | retriever.map() | get_unique_union
, where the input questions is first splitted into subquestions using genegrate_queries
and through a mapping from the queries to the retriever (for every subquestion, search for the ans_vectors). Finally, using get_unique_union
to get the unique answer.
1 |
|
Finally, it is the RAG time! The final_rag-chain
looks as follows:
1 |
|
We first use retrieval_chain
defined above to get the related texts in the documents, add use the itemgetter
to get the original problem. Then we feed these into a LLM with the predefined prompt. Finally, we use StrOutputParser()
to get the final answer of LLM.