Skip to main content

Qdrant

This page guides you through the process of setting up the Qdrant destination connector.

Features

FeatureSupported?(Yes/No)Notes
Full Refresh SyncYes
Incremental - Append SyncYes
Incremental - Append + DedupedYes

Output Schema

Only one stream will exist to collect payload and vectors (optional) from all source streams. This will be in a collection in Qdrant whose name will be defined by the user. If the collection does not already exist in the Qdrant instance, a new collection with the same name will be created.

For each point in the collection, a UUID string is generated and used as the point id. The embeddings generated as defined or extracted from the source stream will be stored as the point vectors. The point payload will contain primarily the record metadata. The text field will then be stored in a field (as defined in the config) in the point payload.

Getting Started

You can connect to a Qdrant instance either in local mode or cloud mode.

  • For the local mode, you will need to set it up using Docker. Check the Qdrant docs here for an official guide. After setting up, you would need your host, port and if applicable, your gRPC port.
  • To setup to an instance in Qdrant cloud, check out this official guide to get started. After setting up the instance, you would need the instance url and an API key to connect.

Note that this connector does not support a local persistent mode. To test, use the docker option.

Requirements

To use the Qdrant destination, you'll need:

  • An account with API access for OpenAI, Cohere (depending on which embedding method you want to use) or neither (if you want to extract the vectors from the source stream)
  • A Qdrant db instance (local mode or cloud mode)
  • Qdrant API Credentials (for cloud mode)
  • Host and Port (for local mode)
  • gRPC port (if applicable in local mode)

Configure Network Access

Make sure your Qdrant database can be accessed by Airbyte. If your database is within a VPC, you may need to allow access from the IP you're using to expose Airbyte.

Setup the Qdrant Destination in Airbyte

You should now have all the requirements needed to configure Qdrant as a destination in the UI. You'll need the following information to configure the Qdrant destination:

  • (Required) Text fields to embed
  • (Optional) Text splitter Options around configuring the chunking process provided by the Langchain Python library.
  • (Required) Fields to store as metadata
  • (Required) Collection The name of the collection in Qdrant db to store your data
  • (Required) The field in the payload that contains the embedded text
  • (Required) Prefer gRPC Whether to prefer gRPC over HTTP.
  • (Required) Distance Metric The Distance metrics used to measure similarities among vectors. Select from:
  • (Required) Authentication method
    • For local mode
      • Host for example localhost
      • Port for example 8000
      • gRPC Port (Optional)
    • For cloud mode
      • Url The url of the cloud Qdrant instance.
      • API Key The API Key for the cloud Qdrant instance
  • (Optional) Embedding
    • OpenAI API key if using OpenAI for embedding
    • Cohere API key if using Cohere for embedding
    • Embedding Field name and Embedding dimensions if getting the embeddings from stream records

Changelog

Expand to review
VersionDatePull RequestSubject
0.0.112024-04-15#37333Updated CDK and pytest versions to fix security vulnerabilities
0.0.102023-12-11#33303Fix bug with embedding special tokens
0.0.92023-12-01#32697Allow omitting raw text
0.0.82023-11-29#32608Support deleting records for CDC sources and fix spec schema
0.0.72023-11-13#32357Improve spec schema
0.0.62023-10-23#31563Add field mapping option
0.0.52023-10-15#31329Add OpenAI-compatible embedder option
0.0.42023-10-04#31075Fix OpenAI embedder batch size
0.0.32023-09-29#30820Update CDK
0.0.22023-09-25#30689Update CDK to support Azure OpenAI embeddings and text splitting options
0.0.12023-09-22#30332🎉 New Destination: Qdrant (Vector Database)