Extended Search with UniqueQL on Metadata

 

Overview

UniqueQL is an advanced query language designed to enhance search capabilities within various search modes such as Vector, Full-Text Search (FTS), and Combined. This query language enables users to perform detailed searches by filtering through metadata attributes like filenames, URLs, dates, and more. UniqueQL is versatile and can be translated into different query formats for various database systems, including PostgreSQL and Qdrant.

The documentation on how PostgreSQL works with filtering JSON can be found here:
https://www.postgresql.org/docs/current/functions-json.html
The documentation on how Qdrand works with filtering data can be found here:
https://qdrant.tech/documentation/concepts/filtering/

Examples of filtering have been taken from the Qdrant examples to make the correlation between the languages more obvious for the reader.

UniqueQL Query Structure

A UniqueQL query is composed of a path, an operator, and a value. The path specifies the metadata attribute to be filtered, the operator defines the type of comparison, and the value provides the criteria for the filter.


Language definition expressed in Typescript code:

export enum Operator { EQUALS = 'equals', NOT_EQUALS = 'notEquals', GREATER_THAN = 'greaterThan', GREATER_THAN_OR_EQUAL = 'greaterThanOrEqual', LESS_THAN = 'lessThan', LESS_THAN_OR_EQUAL = 'lessThanOrEqual', IN = 'in', NOT_IN = 'notIn', CONTAINS = 'contains', NOT_CONTAINS = 'notContains', IS_NULL = 'isNull', IS_NOT_NULL = 'isNotNull', IS_EMPTY = 'isEmpty', IS_NOT_EMPTY = 'isNotEmpty', NESTED = 'nested', } export type UniqueQL = Statement | AndStatement | OrStatement; export type UniqueQLValue = string | number | boolean | Date | UniqueQL | string[] | number[]; export class Statement { public path!: string[]; public operator!: Operator; public value!: UniqueQLValue; } export class AndStatement { public and: UniqueQL[] = []; } export class OrStatement { public or: UniqueQL[] = []; } export function isAndStatement(x: unknown): x is AndStatement { const tmp = x as AndStatement; return 'and' in tmp; } export function isOrStatement(x: unknown): x is OrStatement { const tmp = x as OrStatement; return 'or' in tmp; }

Supported Operators

  • EQUALS: Checks if the attribute equals a specified value.

  • IS_NULL: Checks if the attribute is null.

  • IS_NOT_NULL: Checks if the attribute is not null.

  • IS_EMPTY: Checks if the attribute is empty.

  • IS_NOT_EMPTY: Checks if the attribute is not empty.

  • GREATER_THAN: Checks if the attribute is greater than a specified value.

  • CONTAINS: Checks if the attribute contains a specified substring.

  • NOT_CONTAINS: Checks if the attribute does not contain a specified substring.

  • IN: Checks if the attribute is among an array of values.

  • NOT_IN: Checks if the attribute is not among an array of values.

  • NESTED: Allows nested queries with logical operators like AND and OR.

 

Usage in the search API

{ "chatId": "someID", "searchString":"the thirty-six, 10-day \"weeks\" in the [Egyptian year]f", "searchType" : "COMBINED", "limit": 10, "page": 1, "metaDataFilter": { "path": ["key"], "operator": "equals", "value": "other information.pdf" } }

Translating UniqueQL to PostgreSQL & Qdrant Queries

UniqueQL queries can be translated into PostgreSQL JSONB path expressions to search within JSONB columns. UniqueQL can also be translated into the Qdrant search engine's filter format

Below are examples of how UniqueQL queries are translated:

 

UniqueQL

Postgres

Qdrant

UniqueQL

Postgres

Qdrant

const query = { path: ['id'], operator: Operator.EQUALS, value: 1, };

 

 

Working with Dates

Dates must be formated in the Data and in the Query like so:

This is important as Qdrant only supports this format. Else you might experience an error.

Examples

When designing a search or query system for specific use cases, UniqueQL can be applied to create precise queries that match the requirements. Below are examples of how you might structure UniqueQL queries for the use cases you've mentioned:

Writing a News Module with a cutoff date

To retrieve news articles before a certain cutoff date, you would use the LESS_THAN or LESS_THAN_OR_EQUAL operator (assuming it exists, as it is not listed but is commonly found in query languages).

 

Querying for one file in the system

You would use the EQUALS operator to find a specific file by its unique identifier (e.g., filename or ID).

Asking only for files from a certain folder

To filter files by their folder path, you would use the EQUALS or CONTAINS operator, depending on whether you need an exact match or a match within a hierarchy.

Asking only for files from a specific country or language

To filter files by country or language metadata, you would use the EQUALS operator for the country or language attribute.

Asking for Files valid in the future

To find files that will become valid in the future, possibly due to upcoming directives or laws, you would use the GREATER_THAN operator to compare against the current date.

In each of these cases, the path refers to the attribute within the metadata you want to search against, and the value is the criterion you're matching. The operator defines how to compare the value to the path attribute.

Conclusion

UniqueQL provides a powerful and flexible way to perform complex searches across different search modes and databases. By allowing users to construct detailed queries with a variety of operators and the ability to translate these into the appropriate query language for the target search engine, UniqueQL significantly enhances the search functionality of any application.

 


Author

@Andreas Hauri

 

 

© 2024 Unique AG. All rights reserved. Privacy PolicyTerms of Service