Retrieval Augmented Generation (RAG) Enhancement for LLM-based Prediction — RELP

Jericho Siahaya
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
5 min readOct 22, 2023

--

Image by author

Introduction

Large Language Models (LLMs) are different from search engines like Google or Bing. LLMs work by spotting patterns and making educated guesses about what comes next in a sequence. They are a useful tool for analyzing and comprehending text using advanced language processing techniques.

While LLMs are commonly used to provide answers or act as chatbot assistants, they can also be employed for tasks such as organizing text into categories or classes. However, their performance may suffer when there isn’t enough context to work with.

Artificial Context-Dependent Intelligence

One of the limitations of LLMs are they may not perform well when it doesn’t have enough context. This means if we don’t give it enough information or background, it might give us incorrect or not-so-good answers. Imagine trying to understand a story without knowing the beginning or the middle — it’s tough! Context helps the AI understand and give us better, more accurate results. So, giving the AI enough information or context is really important for getting the best outcomes.

Giving context to LLMs can be done with lots of techniques such as:

  1. External Knowledge Bases: Incorporating external sources of information, such as structured databases or reference documents, can provide the model with additional context. This can be useful for fact-checking and generating more accurate responses.
  2. Domain-Specific Fine-Tuning: Fine-tuning a pre-trained model on domain-specific data or tasks can make it more context-aware in a particular field, ensuring it understands and generates content relevant to that context.
  3. Knowledge Integration: Models can be integrated with knowledge graphs or ontologies, which provide structured information and relationships between entities. This additional knowledge enhances the context understanding of the AI system.
  4. Graph-Based Representations: Utilizing graph structures to represent relationships between entities and concepts can provide a more structured and interconnected context, improving the AI’s ability to navigate and understand complex information.
  5. Data Augmentation: By augmenting the training data with additional context, models can be exposed to a broader range of scenarios and information, helping them generalize better and adapt to diverse contexts.

These techniques have their own advantages and disadvantages, particularly in terms of efficiency. Data augmentation is a valuable method for enhancing the generalization and resilience of LLMs. However, the drawback of this technique is that it enlarges the training dataset, which can lead to increased computational costs. Training on larger datasets demands additional memory, processing capabilities, and time.

Instead of incorporating more data into the base model, we can make use of Retrieval Augmented Generation (RAG), a method that doesn’t necessitate training or fine-tuning.

Retrieval Augmented Generation (RAG)

RAG is a natural language processing technique that combines the strengths of information retrieval and language generation to improve the quality and relevance of text generation by AI systems. It is especially valuable in applications where generating content that depends on external knowledge or context is essential, such as question-answering, content generation, and dialogue systems.

Image by Scriv.ai

Zero-shot LLM Prediction

While Large Language Models (LLMs) can make predictions even without prior context, their accuracy in zero-shot learning is typically not as high as when provided with context.

Zero-shot confusion

When it comes to text classification, depending on the zero-shot approach with LLMs isn’t very reliable. In fact, even using the LLM itself may not be the best choice for this task. Therefore, it’s necessary to train the LLM (encoder-only) specifically for the downstream classification task. However, this process can be time-consuming and resource-intensive, which is where RELP comes into play.

Relp me!

The RELP Approach

In this article, I put forward the concept of merging RAG with LLM for contextual prediction. The two methods, RAG and LLM, mutually enchance each other. RAG supplies contextual similarity, which serves as a knowledge base through few-shot learning, while LLM generates contextual predictions by leveraging the acquired knowledge.

Image by author

Here’s the breakdown of the process:

  1. Input vectorization: The new input is converted into a vector format, which allows it to be input into the similarity search function.
  2. Vector similarity: This is a key component of the RAG method, used to retrieve documents similar to the input from our vector database or store.
  3. Intelligent Reranking: The results are aggregated and refined using hyper SVM ranking, ensuring more accurate outcomes.
  4. Few-shot construction: The top-k similar documents are constructed into a few-shot prompt within the LLM, serving as examples.
  5. Precise output: The LLM generates an output based on the context it has learned from the few-shot prompt.

These steps outline the process of combining vectorization, similarity search, reranking, few-shot learning, and precise output generation for improved contextual prediction.

Pros and Cons of RELP

Pros

  1. Low computational process: RELP eliminates the need for resource-intensive training or fine-tuning, making it a cost-effective option.
  2. Dynamic learning: By feeding the LLM with the most similar and recent context from the input using RAG, RELP enables dynamic learning, which enhances adaptability.
  3. Precise output: RELP uses few-shot learning to provide context and grounding for the LLM, resulting in more accurate and controlled outputs without hallucinations.

Cons

  1. Data bias: RELP relies on vector similarity search to retrieve context, which necessitates a knowledge base. Building an unbiased knowledge base can be challenging, and biased data can lead to biased output.
  2. Context window token limitation: Many LLMs have token limits, and using RELP requires allocating tokens for both the input and the retrieved knowledge as context. If the input is long, this can limit the amount of context available to the LLM, potentially reducing prediction accuracy.

Validation

RELP uses two contextual text embeddings: IndoBERT and OpenAI’s ADA-002, with 3-shot learning for contextual prediction using GPT-3.5-Turbo version. Validation was conducted on 50 testing data points, resulting in:

RELP’s f1-score and accuracy

You’re welcome to dive into the GitHub repository, explore the code, and actively participate in this project.

Conclusion

  • LLMs work by spotting patterns and making educated guesses about what comes next in a sequence.
  • One of the limitations of LLMs are they may not perform well when it doesn’t have enough context.
  • RELP proves to be a more efficient approach that demands fewer time and resources.
  • Advantages of RELP are low computational process, dynamic learning, precise output. Disadvantages of RELP are data bias, and context window token limitation.
  • The final validation results for the text classification task using RELP showed an average accuracy of 0.84 and an F1-score of 0.837.

--

--