Web Search Module

Functionality

This module is designed to answer a user query based on web pages retrieved using a web search engine. The module behaves as a standard GPT but has access to a web search tool. When the user asks a question that the model doesn’t know the answer, it executes the following workflow:

  1. Formulates a search query optimized for search engines (based on the entire discussion history)

  2. Executes a web search using the configured search engine.

  3. Scraps the content of top web pages returned by the search engine

  4. Divide the scrapped content into small chunks of text

    1. [Optional] Embed the chunks locally and perform a similarity search based on the generated query.

  5. Passes the relevant information to the context and answers the user’s question/query

Input

A user question

Example input:

  • “What are the recent trends and developments in Zurich’s financial and insurance markets?“

Output

An answer based on retrieved web pages, either referencing the appropriate links or stating that no information was found in the retrieved web pages.

Configuration settings (technical)

Web Search Config

Provided below is the general structure of the configuration for the Web Search module.

Field

Type

Description

Default Value

Field

Type

Description

Default Value

model_name

LanguageModelName

Model name

DEFAULT_MODEL_NAME

search_engine_name

Literal["google", "bing"]

Search Engine ['google', 'bing']

"bing"

bing_config

BingConfig

Configuration for Bing search

BingConfig()

google_config

GoogleConfig

Configuration for Google search

GoogleConfig()

data_sources_config

DataSourcesConfig

Configuration for data sources

DataSourcesConfig()

answer_generation_config

AnswerGenerationConfig

Configuration for answer generation

AnswerGenerationConfig()

query_generation_config

QueryGenerationConfig

Configuration for query generation

QueryGenerationConfig()

debugging

bool

Enable debugging mode

False

Bing Search API

Field

Type

Description

Default Value

Field

Type

Description

Default Value

search_country_code

str

Search Country Code. Default is Switzerland

"CH"

search_market

str

Search Market. Default is German Switzerland

"de-CH"

custom_search_config

BingCustomSearchConfig

Configuration for Bing Custom Search

BingCustomSearchConfig()

Bing Custom Search Config

Bing offers the possibility to customize the search results. For example:

  • Limit the search to a predefined set of domains

  • Promote a domain

  • Pin results from a specific website

  • Filter links

This Web Search module developed by Unique supports custom search through the custom_config_id parameter that can be obtained from https://www.customsearch.ai/. More details on setup can be found in the following link: https://learn.microsoft.com/en-us/bing/search-apis/bing-custom-search/how-to/quick-start

Once the id is created, it can be passed to through config to your assistant.

There is also the possibility to boost specific search domains without the need to create a custom search id. This can simply be achieved by passing the list of domains in the boosted_search_domains parameter. NOTE: contrary to custom search, this method doesn’t necessarily restrict the search domains to the list provided in boosted_search_domains.

Field

Type

Description

Default Value

Field

Type

Description

Default Value

id

str | None

Custom Config ID. Can be setup from https://www.customsearch.ai/

None

boosted_search_domains

list[str]

List of domains to boost in search results

[]

Google Search API

Field

Type

Description

Default Value

Field

Type

Description

Default Value

search_country_code

str

Search Country Code. Default is Switzerland. Must be lowercase

"ch"

custom_search_config

GoogleCustomSearchConfig

Configuration for Google Custom Search

GoogleCustomSearchConfig()

Google Custom Search Config

Google Custom search is not supported yet, but can be implemented if requested by users.

Field

Type

Description

Default Value

Field

Type

Description

Default Value

dummy_config

str

Dummy Config

"dummy"

Data Source Config

This configuration defines how the module deals with the data retrieved from the internet (e.g. number of sources, chunk_size, …)

Field

Type

Description

Default Value

Field

Type

Description

Default Value

fetch_size

int

Number of search results to fetch

4

max_workers

int

Number of workers to embed search results

10

chunk_size

int

Number of chunks to split the search results

600

chunk_overlap

int

Number of chunks to overlap

50

embedding_batch_size

int

Embedding Batch Size

128

Answer Generation Config

This configuration defines how the module make use of the retrieved pages from the internet to generate its answer.

Field

Type

Description

Default Value

Field

Type

Description

Default Value

limit_token_sources

int

Token Source Limit

5000

max_chunks_to_consider

int

Maximum Chunks to Consider

20

number_history_interactions_included

int

Number of history interactions included

2

embedding_reranking

bool

Apply reranking of chunks based on embeddings

True

Query Generation Config

The module automatically optimize the question of the user to better work with search engines. The user for example may want to always search the web in English or as in the default value, use the

Field

Type

Description

Default Value

Field

Type

Description

Default Value

query_instruction

str

Instruction of the query parameter

"The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message."

Example of configuration

{ "WebSearchConfig": { "model_name": "AZURE_GPT_4_0613", "search_engine_name": "bing", "bing_config": { "search_country_code": "CH", "search_market": "de-CH", "custom_search_config": { "id": "your_custom_config_id", "boosted_search_domains": [] } }, "data_sources_config": { "fetch_size": 4, "max_workers": 10, "chunk_size": 600, "chunk_overlap": 50, "embedding_batch_size": 128 }, "answer_generation_config": { "limit_token_sources": 5000, "max_chunks_to_consider": 20, "number_history_interactions_included": 2, "embedding_reranking": true }, "query_generation_config": { "query_instruction": "The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message." }, "debugging": false } }

Author

@Fabian Schläpfer , @Rami Azouz

 

© 2024 Unique AG. All rights reserved. Privacy PolicyTerms of Service