Version: nightly 🚧

连接JOINs

kumosearch 支持根据一个或多个索引表之间的相关列来连接文档。

一对一关系

创建索引表时，可以通过 reference 属性创建一个字段，将文档与另一个索引表中的字段连接起来。

例如，我们可以使用 authors 的 id 字段将 books 索引表连接到 authors索引表中：

{
  "name":  "books",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "author_id", "type": "string", "reference": "authors.id"}
  ]
}

当我们搜索 books 索引表时，可以通过 include_fields 从 authors 索引表中获取作者字段：

curl "http://localhost:8868/multi_search" -X POST \
        -H "X-KUMOSEARCH-API-KEY: ${KUMOSEARCH_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "books",
              "include_fields": "$authors(first_name,last_name)",
              "q": "famous"
            }
          ]
        }'

通过请求 $authors(first_name,last_name)，可以获取 first_name 和 last_name，响应将包含 authors 对象以及相应的作者信息：

{
  "document": {
    "id": "0",
    "title": "Famous Five",
    "author_id": "0",
    "authors": {
      "first_name": "Enid",
      "last_name": "Blyton"
    }
  }
}

要包含索引表中的所有字段，可以使用星号 *：

{
  "collection": "books",
  "include_fields": "$authors(*)",
  "q": "famous"
}

我们还可以使用 filter_by 子句来连接索引表，如下所示：

{
  "filter_by": "$authors(id: *)" 
}

int32、int64 和 string 类型的字段可以用于一对一关联的情况，即一个文档与一份引用文档相关联。int32[]、int64[] 和 string[] 类型的字段可用于多重关联的情况，即一个文档与另一个索引表中的零个或多个文档相关联。

一对多关系

通用示例

举一个简单的例子，假设有一个 orders 索引表，我们想要跟踪每个客户的多个不同订单。

customers索引表的schema如下所示：

{
  "name":  "customers",
  "fields": [
    {"name": "forename", "type": "string"},
    {"name": "surname", "type": "string"},
    {"name": "email", "type": "string"}
  ]
}

让我们创建一个存储订单详细信息的 orders 索引表，并关联下单的 customers 索引表中的相应用户：

{
  "name":  "orders",
  "fields": [
    {"name": "total_price", "type": "float"},
    {"name": "initial_date", "type": "int64"},
    {"name": "accepted_date", "type": "int64", "optional": true},
    {"name": "completed_date", "type": "int64", "optional": true},
    {"name": "customer_id", "type": "string", "reference": "customers.id"}
  ] 
}

我们现在可以搜索 customers 索引表，并通过 filter_by子句筛选特定客户的订单：

{
    "q":"*",
    "collection":"customers",
    "filter_by":"$orders(customer_id:=customer_a)"
}

我们也可以按其他字段进行过滤，例如获取订单总价低于100的客户：

{
    "q": "*",
    "collection": "customers",
    "filter_by": "$orders(total_price:<100)"
}

默认情况下，上述查询将包含引用的 orders 索引表中的所有字段。要仅包含引用索引表中的 total_price 字段，可以执行以下操作：

{
  "include_fields": "$orders(total_price)" 
}

特定示例

对于更特定的示例，假设我们有一个 products 索引表，并且希望为客户提供个性化定价，即每种产品对于每个客户都有不同的价格。join 连接功能在这里也很方便。

products 索引表的schema如下所示：

{
  "name":  "products",
  "fields": [
    {"name": "product_id", "type": "string"},
    {"name": "product_name", "type": "string"},
    {"name": "product_description", "type": "string"}
  ]
}

让我们创建一个 customer_product_prices 索引表来存储每个客户的自定义价格，并引用products 索引表中的相应文档。

{
  "name":  "customer_product_prices",
  "fields": [
    {"name": "customer_id", "type": "string"},
    {"name": "custom_price", "type": "float"},
    {"name": "product_id", "type": "string", "reference": "products.product_id"}
  ] 
}

我们现在可以搜索 products 索引表，并通过 filter_by 字句筛选特定客户的价格：

{
    "q":"*",
    "collection":"products",
    "filter_by":"$customer_product_prices(customer_id:=customer_a)"
}

要获取价格低于100的产品也很容易实现：

{
    "q": "*",
    "collection": "products",
    "filter_by": "$customer_product_prices(customer_id:=customer_a && custom_price:<100)"
}

与通用示例类似，可以仅包含引用索引表中的 custom_price 字段：

{
  "include_fields": "$customer_product_prices(custom_price)" 
}

多对多关系

考虑一个包含文档的索引表，我们希望向用户提供访问权限，并且一个用户可以访问许多文档。

为此，我们可以创建三个索引表：documents、users 和 user_doc_access，其schema如下：

{
  "name":  "documents",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "title", "type": "string"}
    {"name": "content", "type": "string"}
  ]
}

{
  "name":  "users",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "username", "type": "string"}
  ]
}

{
  "name":  "user_doc_access",
  "fields": [
    {"name": "user_id", "type": "string", "reference": "users.id"},
    {"name": "document_id", "type": "string", "reference": "documents.id"},
  ]
}

要获取 user_a 可访问的所有文档，可以这样查询：

{
    "q": "*",
    "collection": "documents",
    "filter_by": "$user_doc_access(user_id:=user_a)"
}

要获取可以访问特定文档的用户 ID，可以这样查询：

{
    "q": "*",
    "collection": "documents",
    "query_by": "title",
    "filter_by": "$user_doc_access(id: *)",
    "include_fields": "$users(id) as user_identifier"
}

按连接索引表字段排序

我们可以通过这种方式对连接索引表中的字段进行排序：

{
  "sort_by": "$JoinedCollectionName(field_name:asc)"
}

合并/嵌套连接字段

默认情况下，当我们连接索引表时，索引表的字段将作为嵌套文档返回。

例如，当我们将上面的 books 索引表与 authors 索引表连接起来时，authors 索引表在响应文档中显示为一个对象：

{
  "document": {
    "id": "0",
    "title": "Famous Five",
    "author_id": "0",
    "authors": {
      "first_name": "Enid",
      "last_name": "Blyton"
    }
  }
}

我们可以将 authors 索引表的字段与 books 索引表的字段合并，通过使用 merge 合并策略：

{
  "collection": "books",
  "include_fields": "$authors(*, strategy: merge)",
  "q": "famous"
}

默认行为是嵌套 strategy: nest。

强制连接字段嵌套数组

在一对多连接查询中，您可能希望连接索引表的字段始终表示为数组对象，即使只有一个匹配。

例如，给定以下作者和书籍：

{"id": "0", ",first_name": "Enid", "last_name": "Blyton"}
{"id": "1", ",first_name": "JK", "last_name": "Rowling"}

{"title": "Famous Five", "author_id": "0"}
{"title": "Secret Seven", "author_id": "0"}
{"title": "Harry Potter", "author_id": "1"}

当我们查询 authors 索引表并加入 books 索引表时，如下所示：

{
  "collection": "authors",
  "q": "*",
  "filter_by": "$books(id:*)",
  "include_fields": "$books(*)"
}

最终可能会将书籍作为嵌套对象或嵌套对象数组，具体取决于每位作者有一本或多本匹配的书籍。

[
  {
    "document": {
      "id": "1",
      "first_name": "JK",
      "last_name": "Rowling",
      "books": {
        "author_id": "1",
        "id": "2",
        "title": "Harry Potter"
      }
    }
  },
  {
    "document": {
      "id": "0",
      "first_name": "Enid",
      "last_name": "Blyton",
      "books": [
        {
          "author_id": "0",
          "id": "0",
          "title": "Famous Five"
        },
        {
          "author_id": "0",
          "id": "1",
          "title": "Secret Seven"
        }
      ]
    }
  }
]

要始终将连接的书籍索引表的字段表示为对象数组，可以使用 nest_array 策略。

{
  "collection": "authors",
  "q": "*",
  "filter_by": "$books(id:*)",
  "include_fields": "$books(*, strategy: nest_array)"
}

这将始终返回 books 索引表字段的对象数组。

[
  {
    "document": {
      "id": "1",
      "first_name": "JK",
      "last_name": "Rowling",
      "books": [
        {
          "author_id": "1",
          "id": "2",
          "title": "Harry Potter"
        }
      ]
    }
  },
  {
    "document": {
      "id": "0",
      "first_name": "Enid",
      "last_name": "Blyton",
      "books": [
        {
          "author_id": "0",
          "id": "0",
          "title": "Famous Five"
        },
        {
          "author_id": "0",
          "id": "1",
          "title": "Secret Seven"
        }
      ]
    }
  }
]

对象类型中的引用

假设 orders 索引表中有一个名为 order 的 object 字段。我们可以在订单对象字段中设置一个引用，指向 products 索引表中的一个产品，如下所示：

{
  "name": "orders",
  "fields": [
    {"name": "order", "type": "object"},
    {"name": "order.product_id", "type": "string", "reference": "products.id"}
  ]
}

或者，如果我们有一个 order 对象的数组，每个 order 对象都包含一个引用，那么引用字段的类型也必须是数组。

{
  "name": "orders",
  "fields": [
    {"name": "orders", "type": "object[]"},
    {"name": "orders.product_id", "type": "string[]", "reference": "products.id"}
  ]
}

在连接中使用别名

可以在引用字段定义中使用索引表别名。在下面的示例中，products 可以是别名：

{"name": "product_id", "type": "string", "reference": "products.product_id"}

请注意，索引表的文档中存储了引用索引表文档的内部 ID，这些内部 ID 是连续的，并根据索引顺序分配给文档。

因此，当使用别名更新某个索引表时，请务必同时重新索引所有相关索引表，以确保在连接操作中涉及的所有索引表之间的内部 ID 保持一致。

左连接

默认情况下，kumosearch 执行内连接（即只返回两个表中满足连接条件的匹配文档）。要执行左连接，可以指定 id:*，这会匹配正在搜索的索引表中的所有文档。如果存在引用文档，结果将包括左表中的所有文档以及右表中的引用文档，否则，左表中的文档将按原样返回。

{
  "filter_by": "id:* || $join_collection_name( <join_condition> )" 
}

一对一关系​

一对多关系​

通用示例​

特定示例​

多对多关系​

按连接索引表字段排序​

合并/嵌套连接字段​

强制连接字段嵌套数组​

对象类型中的引用​

在连接中使用别名​

左连接​