Web Search Module
- 1 Functionality
- 2 Input
- 3 Output
- 4 Configuration settings (technical)
- 4.1 Web Search Config
- 4.1.1 Bing Search API
- 4.1.1.1 Bing Custom Search Config
- 4.1.2 Google Search API
- 4.1.2.1 Google Custom Search Config
- 4.1.3 Data Source Config
- 4.1.4 Answer Generation Config
- 4.1.5 Query Generation Config
- 4.1.6 Example of configuration
- 4.1.1 Bing Search API
- 4.1 Web Search Config
Functionality
This module is designed to answer a user query based on web pages retrieved using a web search engine. The module behaves as a standard GPT but has access to a web search tool. When the user asks a question that the model doesn’t know the answer, it executes the following workflow:
Formulates a search query optimized for search engines (based on the entire discussion history)
Executes a web search using the configured search engine.
Scraps the content of top web pages returned by the search engine
Divide the scrapped content into small chunks of text
[Optional] Embed the chunks locally and perform a similarity search based on the generated query.
Passes the relevant information to the context and answers the user’s question/query
Input
A user question
Example input:
“What are the recent trends and developments in Zurich’s financial and insurance markets?“
Output
An answer based on retrieved web pages, either referencing the appropriate links or stating that no information was found in the retrieved web pages.
Configuration settings (technical)
Web Search Config
Provided below is the general structure of the configuration for the Web Search module.
Field | Type | Description | Default Value |
---|---|---|---|
model_name |
| Model name |
|
search_engine_name | Literal["google", "bing"] | Search Engine ['google', 'bing'] | "bing" |
bing_config |
| Configuration for Bing search |
|
google_config |
| Configuration for Google search |
|
data_sources_config |
| Configuration for data sources |
|
answer_generation_config |
| Configuration for answer generation |
|
query_generation_config |
| Configuration for query generation |
|
debugging | bool | Enable debugging mode | False |
Bing Search API
Field | Type | Description | Default Value |
---|---|---|---|
search_country_code | str | Search Country Code. Default is Switzerland | "CH" |
search_market | str | Search Market. Default is German Switzerland | "de-CH" |
custom_search_config |
| Configuration for Bing Custom Search |
|
Bing Custom Search Config
Bing offers the possibility to customize the search results. For example:
Limit the search to a predefined set of domains
Promote a domain
Pin results from a specific website
Filter links
…
This Web Search module developed by Unique supports custom search through the custom_config_id
parameter that can be obtained from https://www.customsearch.ai/. More details on setup can be found in the following link: https://learn.microsoft.com/en-us/bing/search-apis/bing-custom-search/how-to/quick-start
Once the id
is created, it can be passed to through config to your assistant.
There is also the possibility to boost specific search domains without the need to create a custom search id. This can simply be achieved by passing the list of domains in the boosted_search_domains
parameter. NOTE: contrary to custom search, this method doesn’t necessarily restrict the search domains to the list provided in boosted_search_domains.
Field | Type | Description | Default Value |
---|---|---|---|
id | str | None | Custom Config ID. Can be setup from https://www.customsearch.ai/ | None |
boosted_search_domains | list[str] | List of domains to boost in search results | [] |
Google Search API
Field | Type | Description | Default Value |
---|---|---|---|
search_country_code | str | Search Country Code. Default is Switzerland. Must be lowercase | "ch" |
custom_search_config |
| Configuration for Google Custom Search |
|
Google Custom Search Config
Google Custom search is not supported yet, but can be implemented if requested by users.
Field | Type | Description | Default Value |
---|---|---|---|
dummy_config | str | Dummy Config | "dummy" |
Data Source Config
This configuration defines how the module deals with the data retrieved from the internet (e.g. number of sources, chunk_size, …)
Field | Type | Description | Default Value |
---|---|---|---|
fetch_size | int | Number of search results to fetch | 4 |
max_workers | int | Number of workers to embed search results | 10 |
chunk_size | int | Number of chunks to split the search results | 600 |
chunk_overlap | int | Number of chunks to overlap | 50 |
embedding_batch_size | int | Embedding Batch Size | 128 |
Answer Generation Config
This configuration defines how the module make use of the retrieved pages from the internet to generate its answer.
Field | Type | Description | Default Value |
---|---|---|---|
limit_token_sources | int | Token Source Limit | 5000 |
max_chunks_to_consider | int | Maximum Chunks to Consider | 20 |
number_history_interactions_included | int | Number of history interactions included | 2 |
embedding_reranking | bool | Apply reranking of chunks based on embeddings | True |
Query Generation Config
The module automatically optimize the question of the user to better work with search engines. The user for example may want to always search the web in English or as in the default value, use the
Field | Type | Description | Default Value |
---|---|---|---|
query_instruction | str | Instruction of the query parameter | "The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message." |
Example of configuration
{
"WebSearchConfig": {
"debugging": false,
"model_name": "AZURE_GPT_4o_2024_0513",
"bing_config": {
"search_market": "de-CH",
"search_country_code": "CH",
"custom_search_config": {
"id": null,
"boosted_search_domains": []
}
},
"search_engine_name": "bing",
"data_sources_config": {
"chunk_size": 600,
"fetch_size": 10,
"max_workers": 10,
"chunk_overlap": 50,
"embedding_batch_size": 128
},
"query_generation_config": {
"query_instruction": "The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message."
},
"answer_generation_config": {
"embedding_reranking": false,
"limit_token_sources": 10000,
"max_chunks_to_consider": 500,
"number_history_interactions_included": 4
}
}
}
Author | @Rami Azouz |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service