Extended Search with UniqueQL on Metadata
Overview
UniqueQL is an advanced query language designed to enhance search capabilities within various search modes such as Vector, Full-Text Search (FTS), and Combined. This query language enables users to perform detailed searches by filtering through metadata attributes like filenames, URLs, dates, and more. UniqueQL is versatile and can be translated into different query formats for various database systems, including PostgreSQL and Qdrant.
The documentation on how PostgreSQL works with filtering JSON can be found here:
9.16. JSON Functions and Operators
The documentation on how Qdrand works with filtering data can be found here:
https://qdrant.tech/documentation/concepts/filtering/
Examples of filtering have been taken from the Qdrant examples to make the correlation between the languages more obvious for the reader.
UniqueQL Query Structure
A UniqueQL query is composed of a path, an operator, and a value. The path specifies the metadata attribute to be filtered, the operator defines the type of comparison, and the value provides the criteria for the filter.
Language definition expressed in Typescript code:
export enum Operator {
EQUALS = 'equals',
NOT_EQUALS = 'notEquals',
GREATER_THAN = 'greaterThan',
GREATER_THAN_OR_EQUAL = 'greaterThanOrEqual',
LESS_THAN = 'lessThan',
LESS_THAN_OR_EQUAL = 'lessThanOrEqual',
IN = 'in',
NOT_IN = 'notIn',
CONTAINS = 'contains',
NOT_CONTAINS = 'notContains',
IS_NULL = 'isNull',
IS_NOT_NULL = 'isNotNull',
IS_EMPTY = 'isEmpty',
IS_NOT_EMPTY = 'isNotEmpty',
NESTED = 'nested',
}
export type UniqueQL = Statement | AndStatement | OrStatement;
export type UniqueQLValue = string | number | boolean | Date | UniqueQL | string[] | number[];
export class Statement {
public path!: string[];
public operator!: Operator;
public value!: UniqueQLValue;
}
export class AndStatement {
public and: UniqueQL[] = [];
}
export class OrStatement {
public or: UniqueQL[] = [];
}
export function isAndStatement(x: unknown): x is AndStatement {
const tmp = x as AndStatement;
return 'and' in tmp;
}
export function isOrStatement(x: unknown): x is OrStatement {
const tmp = x as OrStatement;
return 'or' in tmp;
}
Supported Operators
EQUALS
: Checks if the attribute equals a specified value.IS_NULL
: Checks if the attribute is null.IS_NOT_NULL
: Checks if the attribute is not null.IS_EMPTY
: Checks if the attribute is empty.IS_NOT_EMPTY
: Checks if the attribute is not empty.GREATER_THAN
: Checks if the attribute is greater than a specified value.CONTAINS
: Checks if the attribute contains a specified substring.NOT_CONTAINS
: Checks if the attribute does not contain a specified substring.IN
: Checks if the attribute is among an array of values.NOT_IN
: Checks if the attribute is not among an array of values.NESTED
: Allows nested queries with logical operators likeAND
andOR
.
Usage in the search API
{
"chatId": "someID",
"searchString":"the thirty-six, 10-day \"weeks\" in the [Egyptian year]f",
"searchType" : "COMBINED",
"limit": 10,
"page": 1,
"metaDataFilter": {
"path": ["key"],
"operator": "equals",
"value": "other information.pdf"
}
}
Translating UniqueQL to PostgreSQL & Qdrant Queries
UniqueQL queries can be translated into PostgreSQL JSONB path expressions to search within JSONB columns. UniqueQL can also be translated into the Qdrant search engine's filter format
Below are examples of how UniqueQL queries are translated:
UniqueQL | Postgres | Qdrant |
---|---|---|
const query = {
path: ['id'],
operator: Operator.EQUALS,
value: 1,
};
|
| |
Working with Dates
Dates must be formated in the Data and in the Query like so:
This is important as Qdrant only supports this format. Else you might experience an error.
Examples
When designing a search or query system for specific use cases, UniqueQL can be applied to create precise queries that match the requirements. Below are examples of how you might structure UniqueQL queries for the use cases you've mentioned:
Writing a News Module with a cutoff date
To retrieve news articles before a certain cutoff date, you would use the LESS_THAN
or LESS_THAN_OR_EQUAL
operator (assuming it exists, as it is not listed but is commonly found in query languages).
Querying for one file in the system
You would use the EQUALS operator to find a specific file by its unique identifier (e.g., filename or ID).
Asking only for files from a certain folder
To filter files by their folder path, you would use the EQUALS
or CONTAINS
operator, depending on whether you need an exact match or a match within a hierarchy.
Asking only for files from a specific country or language
To filter files by country or language metadata, you would use the EQUALS
operator for the country or language attribute.
Asking for Files valid in the future
To find files that will become valid in the future, possibly due to upcoming directives or laws, you would use the GREATER_THAN
operator to compare against the current date.
In each of these cases, the path
refers to the attribute within the metadata you want to search against, and the value
is the criterion you're matching. The operator defines how to compare the value
to the path
attribute.
Conclusion
UniqueQL provides a powerful and flexible way to perform complex searches across different search modes and databases. By allowing users to construct detailed queries with a variety of operators and the ability to translate these into the appropriate query language for the target search engine, UniqueQL significantly enhances the search functionality of any application.
Author | @Andreas Hauri |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service