Table of Contents

minLevel	1
maxLevel	2
outline	false
style	none
type	list
printable	true

Motivation

Retrieval-Augmented Generation systems often struggle with accurately identifying and prioritizing the most relevant information from large document collections. This can lead to irrelevant or low-quality content being included in the generated responses. Two potential approaches to address this issue are:

...

Larger token inputs provide better context, resulting in more accurate RAG outcomes.
Ideal for customers who require detailed and extensive document analysis.

...

Improving RAG with Chunk Relevancy Sort

Chunk Relevancy Sort improves RAG results by resorting the retrieved document chunks based on their relevancy compared to the user query. This approach uses a language model to evaluate the relevancy and can also be configured as a filter to consider only highly relevant chunks. It is particular useful when no additional infrastructure can be acquired and rate limits are of no concerns. Also, large language models are supporting a wide array of languages. However, this additional step increases latency and comes with additional token costs per generated response.

...

Enhances accuracy by focusing on relevant data.
Suitable for scenarios where slightly increased latency is acceptable for better results.
Does not require additional infrastructure.

...

Supports a wide-array of languages.

...

Improving RAG with Reranker

The Reranker provides improves the accuracy of the generated response by reranking the retrieved chunks based on a predicted relevancy score. This method uses a dedicated, pre-trained model to predict the relevancy score for each document chunk. It is particularly useful when rate limits are a concern, or when only a model with smaller throughput is available. However, this additional step increases latency and comes with additional costs for infrastructure. Also, the pre-trained models are language specific and usually work only with a few languages, e.g., English and German.

Performance Estimation:

Reranker with GPT-4 or GPT-4 Omni
- 7000 input tokens for context to create response
- Chat will start streaming after ~7 seconds.
  - 2 sec for search
  - 3-5 sec for reranking

...

Effective for managing rate limits in high-volume scenarios.
Provides the slowest response time but can significantly improve result accuracy.
Supports only a few languages depending on the chosen model.

...

Customer Scenarios

Processing allowed only in CH

...

More Input Tokens:
- Generate answer with GPT-4 Omni and at least 30k tokens context window
Chunk Relevancy Sort:
- Evaluate relevancy with GPT-4 Omni-Mini (potentially filter only for highly relevant chunks)
- Generate answer with GPT-4 Omni and at least 7k tokens context window
Reranker:
1. Generate answer with GPT-4 Omni and at least 7k tokens context window

PTU Purchased

...

Available Models:

GPT-4 Omni
GPT-4 Omni-Mini

...

RAG improvement option	Model used / Tokens to process	Costs per 1k input tokens	costs per month (21 120 requests)	Total cost estimation	Total cost per search
None	GPT-4 / 7k	$0.03	$4'435	$4'435	$0.21
GPT-4 32K More Input Tokens	GPT-4 32K / 30k	$0.06	$38'016	$38’016	$1.80
Chunk Relevancy Sort	GPT-4 / 7k GPT-3.5 / 100k	$0.03 $0.0005	$4'435 $1'056	$5’491	$0.26
Reranker	GPT-4 / 7k reranker / 100k	$0.03 $0.00	$4'435 $300 fixed	$4’735	$0.24

...

RAG improvement option	Model used / Tokens to process	Costs per 1k input tokens	costs per month (21 120 requests)	Total cost estimation	Total cost per search
None	GPT-4o / 7k	$0.005	$740	$740	$0.04
GPT-4 32K More Input Tokens	GPT-4o / 30k	$0.005	$3'168	$3’168	$0.15
Chunk Relevancy Sort	GPT-4o / 7k GPT-4o-min / 100k	$0.005 $0.000165	$740 $350	$1’090	$0.05
Reranker	GPT-4o / 7k reranker / 100k	$0.005 $0.00	$740 $300 fixed	$1’040	$0.05

...

Conclusion

This documentation provides a comprehensive guide for improving RAG results based on different customer scenarios. Whether dealing with rate limits, latency concerns, or cost management, the options outlined above will help you choose the best configuration for your needs.

Panel

panelIconId	2139
panelIcon	:information_source:
panelIconText	ℹ️
bgColor	#E6FCFF

For any further questions or personalized recommendations, please contact the customer success team.

Panel

panelIconId	2139
panelIcon	:information_source:
panelIconText	ℹ️
bgColor	#E6FCFF

For more information: RAG Assessment and Improvement

...

Author	Pascal Hauri

Versions Compared

Old Version 1

New Version Current

Key

Motivation

Improving RAG with Chunk Relevancy Sort

Improving RAG with Reranker

Performance Estimation:

Customer Scenarios

Processing allowed only in CH

PTU Purchased

Available Models:

Conclusion

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Motivation

Improving RAG with Chunk Relevancy Sort

Improving RAG with Reranker

Performance Estimation:

Customer Scenarios

Processing allowed only in CH

PTU Purchased

Available Models:

Conclusion