Skip to main content

Overview

Each dataset can declare a metadata_schema — a list of fields available for filtering. Use list_datasets() to see what’s filterable:
datasets = await client.list_datasets()
for ds in datasets:
    if ds.metadata_schema:
        for field in ds.metadata_schema:
            print(f"  {field['name']} ({field['type']}): {field.get('description', '')}")

Filter Syntax

Filters map each dataset ID to a filter object. Datasets without an entry are queried unfiltered.
filters={
    "ds-aaa": {"year": {"$gte": 2023}},          # filter ds-aaa
    "ds-bbb": {"category": {"$eq": "research"}},  # filter ds-bbb
    # ds-ccc has no entry — queried unfiltered
}

Operators

Comparison

OperatorDescriptionExample
$eqEquals{"status": {"$eq": "published"}}
$neNot equals{"status": {"$ne": "draft"}}
$gtGreater than{"score": {"$gt": 0.5}}
$gteGreater or equal{"year": {"$gte": 2020}}
$ltLess than{"price": {"$lt": 100}}
$lteLess or equal{"year": {"$lte": 2024}}
$inIn array{"status": {"$in": ["published", "reviewed"]}}

Logical

OperatorDescriptionExample
$andAll conditions must match{"$and": [{...}, {...}]}
$orAny condition must match{"$or": [{...}, {...}]}

Rules

  • Each field has exactly one operator: {"field": {"$op": value}}
  • Top-level fields are implicitly ANDed
  • Max nesting depth: 3 levels
  • Max conditions: 20 per dataset
  • Field names must not start with _ or $

Examples

Simple filter

filters={"ds-1": {"year": {"$gte": 2023}}}

Multiple fields (implicit AND)

filters={
    "ds-1": {
        "year": {"$gte": 2023},
        "category": {"$eq": "research"},
    }
}

Explicit AND

filters={
    "ds-1": {
        "$and": [
            {"year": {"$gte": 2023}},
            {"category": {"$eq": "research"}},
        ]
    }
}

OR

filters={
    "ds-1": {
        "$or": [
            {"category": {"$eq": "research"}},
            {"category": {"$eq": "review"}},
        ]
    }
}

Cross-dataset (different filters per dataset)

response = await client.query(
    text="neural networks",
    dataset_ids=["papers", "patents", "news"],
    filters={
        "papers": {"year": {"$gte": 2022}, "peer_reviewed": {"$eq": True}},
        "patents": {"filed_after": {"$gte": "2023-01-01"}},
        # "news" has no entry — queried unfiltered
    },
)

Schema Field Types

TypeDescriptionFilter operators
stringText value$eq, $ne, $in
numberNumeric valueAll comparison + $in
booleanTrue/false$eq, $ne
string[]Array of strings$in