Search Architecture
This page documents internal Medplum implementation details, and refers to point-in-time code snapshots that may be different from the current application code, and are subject to change at any time without prior notice.
Resource Table Schema
In addition to id
and content
columns containing the resource's UUID and raw JSON, each FHIR resource type table in the Medplum DB (e.g. "Observation"
) contains columns corresponding to the search parameters for that resource type (e.g. "Observation".subject
). The data types of these columns are mapped from the search parameter type:
FHIR SearchParameter type | Postgres data type |
---|---|
number | double precision |
date | date /timestamp with time zone |
string | text * |
token | text */boolean |
reference | text |
composite | - (unsupported) |
quantity | double precision |
uri | text |
special | - (e.g. id uuid and compartments uuid[] ) |
Many of these columns can contain multiple values of the same type in an array, e.g. text[]
.
Parsing Search Queries
FHIR search queries use the application/x-fhir-query
MIME type, and generally follow the format ResourceType?param1=value1¶m2=value2
, e.g. Observation?status=final&subject:missing=true
. These query strings are parsed into our own internal representation using helper functions in @medplum/core
to facilitate common handling of search between the server and client-side applications. The resulting SearchRequest
object contains all information necessary to process the search operation:
interface SearchRequest {
readonly resourceType: string;
filters?: {
code: string;
operator: Operator;
value: string;
}[];
sortRules?: {
code: string;
descending?: boolean;
}[];
offset?: number;
count?: number;
fields?: string[];
name?: string;
total?: 'none' | 'estimate' | 'accurate';
include?: IncludeTarget[];
revInclude?: IncludeTarget[];
summary?: 'true' | 'text' | 'data';
}
Each search parameter from the query string corresponds to an element in SearchRequest.filters
, and the example query above (Observation?status=final&subject:missing=true
) would be parsed as:
{
resourceType: 'Observation',
filters: [
{
code: 'status',
operator: Operator.EQUALS,
value: 'final',
},
{
code: 'subject',
operator: Operator.MISSING,
value: 'true',
}
]
}
The Operator
enum represents a union over FHIR search modifiers and prefixes; combined, these define all logical search relations between parameter and value. Currently, there is zero overlap between parameters that use prefixes and those that allow modifiers; however, if this ever changes, modifications to this abstraction will be required.
SQL Builder
Medplum's core search logic translates FHIR search requests into a composable set of SQL expressions, from which the DB query is built. The SqlBuilder
class provides basic functionality for constructing the SQL string, but most code uses it only indirectly through classes representing logical parts of the SQL expression, e.g. a Condition
. Queries constructed this way are modular and flexible, allowing the server to easily manipulate the query without resorting to tricky string manipulation.
The main search parameter processing logic takes places in buildSearchFilterExpression()
, which performs:
- Checking if the search parameter is a special one that needs specific handling (e.g.
_filter
) - Identifying whether a lookup table is in use for the search parameter and constructing the JOIN
- The
:identifier
modifier is handled as a special case, using the token lookup table
- The
- Constructing the correct SQL
WHERE
expression for the search parameter based on its type and modifiers
Lookup Tables
Some search parameters require specific matching logic, and a separate lookup table to map parameter values to resources with acceptable performance. These lookup tables are modeled in subclasses of LookupTable
, and define custom logic for adding JOIN statements and conditions against the lookup tables to the SQL query. Each lookup table contains a resourceId
column used for joining to the main resource tables, alongside other column used for matching. Some lookup tables are global across all resource types, where others are resource type-specific. For example, the schema of the HumanName
lookup table is given below:
Column | Type | Nullable |
---|---|---|
id | uuid | not null |
resourceId | uuid | not null |
index | integer | not null |
content | text | not null |
name | text | |
given | text | |
family | text | |
name_tsv | tsvector | |
given_tsv | tsvector | |
family_tsv | tsvector |
Examples
Find Patient by ID (special search parameter for all resource types)
Patient?_id=c5a1e9bc-b627-4520-8a1c-cb6907a6e6c4
-- Only retrieve necessary data from the associated DB table
SELECT "Patient"."id", "Patient"."content" FROM "Patient"
-- `deleted=false` filter added to all queries by default
WHERE ("Patient"."deleted"=false AND "Patient"."id"='c5a1e9bc-b627-4520-8a1c-cb6907a6e6c4')
Array column and boolean filters
Organization?_profile=http://hl7.org/fhir/us/core/StructureDefinition/us-core-organization
&active=true
SELECT "Organization"."id", "Organization"."content" FROM "Organization"
WHERE ("Organization"."deleted"=false AND (
-- && operator checks for overlap between arrays
("Organization"."_profile" IS NOT NULL AND "Organization"."_profile" && ARRAY[
'http://hl7.org/fhir/us/core/StructureDefinition/us-core-organization'
]::TEXT[])
-- Multiple search parameters are ANDed together in the SQL query
AND "Organization"."active"=true
))
Multiple values for single search parameter
Observation?_profile=
http://hl7.org/fhir/us/core/StructureDefinition/us-core-observation-lab,http://hl7.org/fhir/us/core/StructureDefinition/us-core-observation-social-history
SELECT "Observation"."id", "Observation"."content" FROM "Observation"
WHERE ("Observation"."deleted"=false AND (
-- && matches if arrays have any element in common
"Observation"."_profile" IS NOT NULL AND "Observation"."_profile" && ARRAY[
'http://hl7.org/fhir/us/core/StructureDefinition/us-core-observation-lab',
'http://hl7.org/fhir/us/core/StructureDefinition/us-core-observation-social-history'
]::TEXT[]
))
String matching in array column
Organization?name=GeneCo
SELECT "Organization"."id", "Organization"."content" FROM "Organization"
WHERE ("Organization"."deleted"=false AND
-- Subquery on array column is nearly (±5%) as performant as &&,
-- provided that subquery is cheap
EXISTS(SELECT 1 FROM unnest("name") AS "name"
-- Strings use case-insensitive prefix matching
WHERE LOWER("name") LIKE 'geneco%'
)
)
Name lookup table
Practitioner?name=Dub
SELECT "Practitioner"."id", "Practitioner"."content" FROM "Practitioner"
-- LEFT JOIN enables selecting matches or not
-- JOINed tables are assigned numbered aliases
LEFT JOIN "HumanName" AS "T1" ON (
"Practitioner"."id"="T1"."resourceId" AND
-- Matching criteria are inserted into the JOIN ON
-- tsv column is computed when resource is inserted into DB
"T1"."name_tsv" @@ to_tsquery('english','Dub:*')
)
WHERE ("Practitioner"."deleted"=false AND
-- Find rows that match the code
"T1"."resourceId" IS NOT NULL
-- GROUP BY is required with JOIN
) GROUP BY "Practitioner"."id"
Token lookup table with modifier
DiagnosticReport?code:not=http://loinc.org|69737-5
SELECT "DiagnosticReport"."id", "DiagnosticReport"."content" FROM "DiagnosticReport"
LEFT JOIN "DiagnosticReport_Token" AS "T1" ON (
"DiagnosticReport"."id"="T1"."resourceId" AND
-- Tokens are stored essentially as (searchParam, system, code) tuples
"T1"."code"='code' AND (
"T1"."system"='http://loinc.org' AND
"T1"."value"='69737-5'
)
)
WHERE ("DiagnosticReport"."deleted"=false AND
-- Negate the search and find rows that do NOT match
"T1"."resourceId" IS NULL
) GROUP BY "DiagnosticReport"."id"
Permissions and Access Control
Search results must obey the same permissions and access model as the rest of the server, so any resources the user does not have access to must not appear in the search results. Maintaining accurate paging through the results set requires that we do not filter anything out of the rows returned by the database: all security filters must apply directly to the DB SQL query. These filters come in two kinds: one restricting the user to their own Project, and others related to associated AccessPolicy restrictions.
The Project filter is added for all users other than Super Admins, and takes the form:
-- Restrict to the Project compartment by its ID
compartments && ARRAY['4410089e-6a88-4cc8-9cb6-3592ee18191c']
AccessPolicy restrictions are specified using application/x-fhir-query
expressions (e.g. Observation?subject=Patient/c9d98310-d47d-4265-a377-4ec9317ceee6
), which are parsed into SQL expressions the same way as the main search query being performed. Resources matching these criteria are allowed to be shown to the user with that AccessPolicy. These filter expressions are added to the SQL query, wrapped in a Disjunction
so any matching policy will grant access.
Search Indexing
Search parameter values are extracted from resources via FHIRPath expressions defined for each parameter. Most search parameter expressions are simple paths or variations thereof, but some parameters make use of more complex expressions to derive the search value: e.g. Patient?deceased
(Patient.deceased.exists() and Patient.deceased != false
). The results of evaluating the expression on the resource is indexed into the database transactionally with the resource content, including any associated lookup tables.
Appendix A: FHIR Search Concerns
FHIR aims to provide a very broad and powerful search API, which places a significant burden on implementers to handle many different cases and cross-cutting concerns. A high-level summary of the entire scope of these concerns is given below:
- Context: resource type, compartment, or system-level (with list of types or for all types)
- Parameter type: number, date, string, token, reference, composite, quantity, uri, special
- Operators
- Modifiers: e.g. exact, missing, contains, identifier
- Prefixes: eq, ne, gt, lt, ge, le, sa, eb, ap
- Chaining: JOIN and filter on linked table columns, e.g.
Observation?subject.name=Fred&_has:DiagnosticReport:result:status=partial
- Value collection: Single value, comma-separated (OR), repeated parameter (AND)
- FHIR search queries are logically in CNF
- Parameters are ANDed together, and multiple values in one parameter are ORed
- Composite parameters: used to incorporate DNF filters e.g. (paramA=x AND paramB=y) OR (paramA=u AND paramB=v)
- Sorting: by search parameter, in forward or reverse order
- Pagination:
_count
and_offset
params, plus adding [paging links][page-links] to the response - Included resources:
_include
and_revinclude
params, include linked resources in the search response - Subsetting:
_summary
and_elements
params, return only subsets of each resource for performance