Full-text SearchAdded in v4.0​
The fts built-in module contains various tools that enable full-text
search functionality in Gel.
Since full-text search is a natural language search, it may not be ideal for your use case, particularly if you want to find partial matches. In that case, you may want to look instead at ext::pg_trgm (added in 4.0).
|
fts::Language (added in 4.0) |
Common languages enum |
|
fts::PGLanguage (added in 4.0) |
Postgres languages enum |
|
fts::Weight (added in 4.0) |
Weight category enum |
|
fts::document (added in 4.0) |
Opaque document type |
|
fts::search() (added in 4.0) |
Perform full-text search on a target object. |
|
fts::with_options() (added in 4.0) |
Assign language and weight category to a document portion. |
When considering FTS functionality our goal was to come up with an interface that could support different backend FTS providers. To achieve that we've identified the following components to the FTS functionality:
-
Valid FTS targets must be indexed.
-
The expected language should be specified at the time of creating an index.
-
It should be possible to mark document parts as having different relevance.
-
It should be possible to assign custom weights at runtime so as to make searching more flexible.
-
The search query should be close to what people are already used to.
To address (1) we introduce a special fts::index. The presence of this
index in a type declaration indicates that the type in question can be subject
to full-text search. This is an unusual index as it actually affects the
results of fts::search() (added in 4.0) function. This is unlike most indexes which
only affect the performance and not the actual results. Another special
feature of fts::index is that at most one such index can be declared per
type. If a type inherits this index from a parent and also declares its own,
only the new index applies and fully overrides the fts::index inherited
from the parent type. This means that when dealing with a hierarchy of
full-text-searchable types, each type can customize what gets searched as
needed.
The language (2) is defined as part of the fts::index on expression. A
special function fts::with_options() (added in 4.0) is used for that purpose:
type Item {
required available: bool {
default := false;
};
required name: str;
description: str;
index fts::index on (
fts::with_options(
.name,
language := fts::Language.eng
)
);
}The above declaration specifies that Item is full-text-searchable,
specifically by examining the name property (and ignoring description)
and assuming that the contents of that property are in English.
Marking different parts of the document as having different relevance (3) can
also be done by the fts::with_options() (added in 4.0) function:
type Item {
required available: bool {
default := false;
};
required name: str;
description: str;
index fts::index on ((
fts::with_options(
.name,
language := fts::Language.eng,
weight_category := fts::Weight.A,
),
fts::with_options(
.description,
language := fts::Language.eng,
weight_category := fts::Weight.B,
)
));
}The schema now indicates that both name and description properties of
Item are full-text-searchable. Additionally, the name and
description have potentially different relevance.
By default fts::search() (added in 4.0) assumes that the weight categories A,
B, C, and D have the following weights: [1, 0.5, 0.25, 0.125].
This makes each successive category relevance score halved.
Consider the following:
gel>
select Item{name, description};{
default::Item {name: 'Canned corn', description: {}},
default::Item {
name: 'Candy corn',
description: 'A great Halloween treat',
},
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
},
}
gel> .... .... .... ....
with res := (
select fts::search(Item, 'corn treat', language := 'eng')
)
select res.object {name, description, score := res.score}
order by res.score desc;{
default::Item {
name: 'Candy corn',
description: 'A great Halloween treat',
score: 0.4559453,
},
default::Item {
name: 'Canned corn',
description: {},
score: 0.30396354,
},
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
score: 0.30396354,
},
}As you can see, the highest scoring match came from an Item that had the
search terms appear in both name and description. It is also apparent
that matching a single term from the search query in the name property
scores the same as matching two terms in description as we would expect
based on their weight categories. We can, however, customize the weights (4)
to change this trend:
gel> .... .... .... .... .... .... .... ....
with res := (
select fts::search(
Item, 'corn treat',
language := 'eng',
weights := [0.2, 1],
)
)
select res.object {name, description, score := res.score}
order by res.score desc;{
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
score: 0.6079271,
},
default::Item {
name: 'Candy corn',
description: 'A great Halloween treat',
score: 0.36475626,
},
default::Item {
name: 'Canned corn',
description: {},
score: 0.06079271,
},
}We can even use custom weights to completely ignore one of the properties
(e.g. name) in our search, although we also need to add a filter based on
the score to make this work properly:
gel> .... .... .... .... .... .... .... .... ....
with res := (
select fts::search(
Item, 'corn treat',
language := 'eng',
weights := [0, 1],
)
)
select res.object {name, description, score := res.score}
filter res.score > 0
order by res.score desc;{
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
score: 0.6079271,
},
default::Item {
name: 'Candy corn',
description: 'A great Halloween treat',
score: 0.30396354,
},
}Finally, the search query supports features for fine-tuning (5). By default,
all search terms are desirable, but ultimately optional. You can enclose a
term or even a phrase in "..." to indicate that this specific term is of
increased importance and should appear in all matches:
gel> .... .... .... .... .... .... ....
with res := (
select fts::search(
Item, '"corn sugar"',
language := 'eng',
)
)
select res.object {name, description, score := res.score}
order by res.score desc;{
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
score: 0.4955161,
},
}Only one Item contains the phrase "corn sugar" and incomplete matches are
omitted.
The search query can also use AND (using upper-case to indicate that it is
a query modifier and not part of the query) to indicate whether terms are
required or optional:
gel> .... .... .... .... .... .... ....
with res := (
select fts::search(
Item, 'sweet AND treat',
language := 'eng',
)
)
select res.object {name, description, score := res.score}
order by res.score desc;{
default::Item {
name: 'Sweet',
description: 'Treat made with corn sugar',
score: 0.70076555,
},
}Adding a ! in front of a search term marks it as something that
the matching object must not contain:
gel> .... .... .... .... .... .... ....
with res := (
select fts::search(
Item, '!treat',
language := 'eng',
)
)
select res.object {name, description, score := res.score}
order by res.score desc;{
default::Item {
name: 'Canned corn',
description: {},
score: 0,
},
}An enum for representing commonly supported languages.
When indexing an object for full-text search it is important to specify
the expected language by fts::with_options() (added in 4.0) function. This
particular enum represents languages that are common across
several possible [future] backend implementations and thus are "safe" even
if the backend implementation switches from one of the options to another.
This generic enum is the recommended way of specifying the language.
The following ISO 639-3 language identifiers are available:
ara, hye, eus, cat, dan, nld, eng, fin,
fra, deu, ell, hin, hun, ind, gle, ita,
nor, por, ron, rus, spa, swe, tur.
An enum for representing languages supported by PostgreSQL.
When indexing an object for full-text search it is important to specify
the expected language by fts::with_options() (added in 4.0) function. This
particular enum represents languages that are available in
PostgreSQL implementation of full-text search.
The following ISO 639-3 language identifiers are available:
ara, hye, eus, cat, dan, nld, eng, fin,
fra, deu, ell, hin, hun, ind, gle, ita,
lit, npi, nor, por, ron, rus, srp, spa,
swe, tam, tur, yid.
Additionally, the xxx_simple identifier is also available to represent
the pg_catalog.simple language setting.
Unless you need some particular language setting that is not available in
the fts::Language (added in 4.0), it is recommended that you use the more
general lanuguage enum instead.
An enum for representing weight categories.
When indexing an object for full-text search different properties of this
object may have different significance. To account for that, they can be
assigned different weight categories by using
fts::with_options() (added in 4.0) function. There are four available weight
categories: A, B, C, or D.
An opaque transient type used in fts::index.
This type is technically what the fts::index expects as a valid on
expression. It cannot be directly instantiated and can only be produced as
the result of applying the special fts::with_options() (added in 4.0) function.
Thus this type only appears in full-text search index definitions and
cannot appear as either a property type or anywhere in regular queries.
Perform full-text search on a target object.
This function applies the search query to the specified object. If a
match is found, the result will consist of a tuple with the matched
object and the corresponding score. A higher score indicates a
better match. In case no match is found, the function will return an empty
set {}. Likewise, {} is returned if the object has no
fts::index defined for it.
The language parameter can be specified in order to match the expected
indexed language. In case of mismatch there is a big chance that the query
will not produce the expected results.
The optional weights parameter can be passed in order to provide
custom weights to the different weight groups. By default, the weights are
[1, 0.5, 0.25, 0.125] representing groups of diminishing significance.
Assign language and weight category to a document portion.
This is a special function that can only appear inside fts::index
expressions.
The text expression specifies the portion of the document that will be
indexed and available for full-text search.
The language parameter specifies the expected language of the text
expression. This affects how the index accounts for grammatical variants
of a given word (e.g. how plural and singular forms are determined, etc.).
The weight_category parameter assigns one of four available weight
categories to the text expression: A, B, C, or D. By
themselves, the categories simply group together portions of the document
so that these groups can be ascribed different significance by the
fts::search() (added in 4.0) function. By default it is assumed that each
successive category is half as significant as the previous, starting with
A as the most significant. However, these default weights can be
overridden when making a call to fts::search() (added in 4.0).