Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Larger token inputs provide better context, resulting in more accurate RAG outcomes.

  • Ideal for customers who require detailed and extensive document analysis.

...

Improving RAG with Chunk Relevancy Sort

Chunk Relevancy Sort improves RAG results by resorting the retrieved document chunks based on their relevancy compared to the user query. This approach uses a language model to evaluate the relevancy and can also be configured as a filter to consider only highly relevant chunks. It is particular useful when no additional infrastructure can be acquired and rate limits are of no concerns. However, this additional step increases latency and comes with additional token costs per generated response.

...

  • Enhances accuracy by focusing on relevant data.

  • Suitable for scenarios where slightly increased latency is acceptable for better results.

  • Does not require additional infrastructure.

...

Improving RAG with Reranker

...

The Reranker provides improves the accuracy of the generated response by reranking the retrieved chunks based on a predicted relevancy score. This method uses a dedicated, pre-trained model to predict the relevancy score for each document chunk. It is particularly useful when rate limits are a concern, or when only a model with smaller throughput is available. However, this additional step increases latency and comes with additional costs for infrastructure.

...

  1. More Input Tokens:

    • Generate answer with GPT-4 Omni and at least 30k tokens context window

  2. Chunk Relevancy Sort:

    • Evaluate relevancy with GPT-4 Omni-Mini (potentially filter only for highly relevant chunks)

    • Generate answer with GPT-4 Omni and at least 7k tokens context window

  3. Reranker:

    1. Generate answer with GPT-4 Omni and at least 7k tokens context window

PTU Purchased

...

Available Models:

  • GPT-4 Omni

  • GPT-4 Omni-Mini

...

RAG improvement option

Model used / Tokens to process

Costs per 1k input tokens

costs per month (21 120 requests)

Total cost estimation

Total cost per search

None

GPT-4 / 7k

$0.03

$4'435

$4'435

$0.21

GPT-4 32K More Input Tokens

GPT-4 32K / 30k

$0.06

$38'016

$38’016

$1.80

Chunk Relevancy Sort

GPT-4 / 7k

GPT-3.5 / 100k

$0.03

$0.0005

$4'435

$1'056

$5’491

$0.26

Reranker

GPT-4 / 7k

reranker / 100k

$0.03

$0.00

$4'435

$300 fixed

$4’735

$0.24

...

RAG improvement option

Model used / Tokens to process

Costs per 1k input tokens

costs per month (21 120 requests)

Total cost estimation

Total cost per search

None

GPT-4o / 7k

$0.005

$740

$740

$0.04

GPT-4 32K More Input Tokens

GPT-4o / 30k

$0.005

$3'168

$3’168

$0.15

Chunk Relevancy Sort

GPT-4o / 7k

GPT-4o-min / 100k

$0.005

$0.000165

$740

$350

$1’090

$0.05

Reranker

GPT-4o / 7k

reranker / 100k

$0.005

$0.00

$740

$300 fixed

$1’040

$0.05

...