What is Qdrant?

Qdrant is a vector similarity engine & vector database written in Rust 🦀. It deploys as an API service to search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more! Implementing a unique custom modification of the HNSW algorithm for Approximate Nearest Neighbor Search, Qdrant achieves a State-of-the-Art speed and offers search filters without compromising on results. Filterable vector payload supports various data types and query conditions, including string matching, numerical ranges, geo-locations, and more. Unlike Elasticsearch post-filtering, Qdrant guarantees all relevant vectors are retrieved.

Qdrant is also responsible for developing the open-source framework Quaterion on top of Pytorch and Pytorch Lightning. Quaterion is a framework for fine-tuning similarity learning models. The framework closes the “last mile” problem in training models for semantic search, recommendations, anomaly detection, extreme classification, matching engines, etc. It is designed to combine the performance of pre-trained models with specialization for the custom task while avoiding slow and costly training.

Additionally, it is:

How to apply

You are supposed to make an application with a proposal at the GSoC website. Please do not request individual GitHub issues to be assigned to you. Instead, make sure that you correctly understand the project scope and requirements, craft a proposal detailing your approach to the problem and file your application at the GSoC website between March 20 and April 4. You may want to read our guide to learn how to apply for a GSoC internship at Qdrant.

Project ideas

Project: Geo Filtering by Polygon or multipolygon 🦀

For users of Qdrant, it would be beneficial to have the ability to filter vector data by a more complex geometry, such as a polygon or multi-polygon, in addition to the current radius and bounding box options. A standard format for this type of query is the GeoJSON geometry format, as outlined in the RFC7946 specification.

Motivation

Currently, it is possible to filter by a specific region of interest by choosing a radius or bounding box that covers the area and then post-filtering the retrieved points. However, this approach can be slow for large regions and requires managing location data in a separate database. This feature would be faster to work with complex geo-filters and open up doors to new use cases. This feature would be valuable to Qdrant, particularly for users working with large datasets or complex geometries.

Requirements

Implementing this feature would require a moderate contribution level and good level of Rust knowledge, as it would involve updating the current geo-filtering functionality to support polygon and multi-polygon geometries. Size: 350h

Level: Hard

Mentor: Arnaud Gourlay