Milvus 简介

Milvus 是一款开源的向量相似度搜索引擎，集成了 Faiss、NMSLIB、Annoy 等广泛应用的向量索引库，并提供了一整套简单直观的 API。Milvus 具备高度灵活、稳定可靠以及高速搜索等特点，在全球范围内已被数百家组织和机构所采用。他们将 Milvus 与 AI 模型结合，广泛应用于以下场景：

图像、音视频搜索领域
文本搜索、推荐和交互式问答系统等文本搜索领域
新药搜索、基因筛选等生物医药领域

Google Colab 简介

Google Colab 是谷歌开放的一款云服务工具，主要用于机器学习的开发和研究。Google Colab 提供了免费的 Jupyter 云环境及 GPU 资源。它支持许多常用的机器学习库，集成了 PyTorch、TensorFlow、Keras 和 OpenCV。本文将介绍如何使用 Google Colab 运行 Milvus 并且通过 Python SDK 执行一些基本操作。让我们一起熟悉 Milvus 吧。

使用 Google Colab 运行 Milvus

Milvus 官方文档中推荐使用 Docker 启动服务。但 Google Colab 云环境中目前不支持安装 Docker，且考虑到有人不会使用 Docker，因此本文将介绍源码编译的启动服务方式。

环境准备

我们将根据 Milvus 源码编译来启动服务。编译要求的 GCC、CMake 和 Git 在 Colab 中已安装。另外， GPU 版本编译所需的 CUDA 和 NVIDIA driver 在 Colab GPU 环境中也已默认安装，因此简化了 Milvus 的安装与启动过程。

1. 下载 Milvus_tutorial.ipynb

wget https://raw.githubusercontent.com/milvus-io/bootcamp/0.10.0/getting_started/basics/milvus_tutorial/Milvus_tutorial.ipynb

2. 在 Google Colab 中加载 notebook

安装与启动

下载并编译源码

注意：如需编译 GPU 版本，请修改 Notebook 环境为 GPU（Edit -> Notebook settings 选择 GPU），并且在 Step 3 Build Milvus source code 运行 ./build.sh -t Release -g。

启动 Milvus 服务

注意：如果你前面选择编译 GPU 版本，那么将会看到 “GPU resources ENABLED！”如下图：

基本操作

我们在 Google Colab 成功启动了 Milvus 服务。Milvus 提供 Python/Java/Go/Restful/C++ 多种 API 接口。接下来我们将在 Colab 中使用 Python 接口执行 Milvus 的基本操作：

安装 pymilvus

! pip install pymilvus==0.2.14

连接服务端


   
    
     
      
     
     
      
       # Connect to Milvus Server
      
     
    
     
      
     
     
      
       milvus = Milvus(_HOST, _PORT) 
      
     
    
     
      
     
     
      
       # Return the status of the Milvus server
      
     
    
     
      
     
     
      
       server_status = milvus.server_status(timeout=
       10)

创建集合 / 分区/ 索引


   
    
     
      
     
     
      
       # Information needed to create a collection   
      
     
    
     
      
     
     
      
       param={
       'collection_name'
       :collection_name, 
       'dimension': _DIM, 
       'index_file_size': _INDEX_FILE_SIZE, 
       'metric_type': MetricType.L2}   
      
     
    
     
      
     
     
      
       # Create a collection  
      
     
    
     
      
     
     
      
       milvus.create_collection(param, timeout=
       10)   
      
     
    
     
      
     
     
      
       # Create a partition for a collection   
      
     
    
     
      
     
     
      
       milvus.create_partition(collection_name=collection_name, partition_tag=partition_tag, timeout=
       10)   
      
     
    
     
      
     
     
      
       ivf_param = {
       'nlist': 
       16384}   
      
     
    
     
      
     
     
      
       # Create index for a collection   
      
     
    
     
      
     
     
      
       milvus.create_index(collection_name=collection_name, index_type=IndexType.IVF_FLAT, params=ivf_param)

插入数据并落盘


   
    
     
      
     
     
      
       # Insert vectors to a collection
      
     
    
     
      
     
     
      
       milvus.insert(collection
       _name=collection_name, records=vectors, ids=ids)  
      
     
    
     
      
     
     
      
       # Flush vector data in one collection or multiple collections to disk  
      
     
    
     
      
     
     
      
       milvus.flush(collection
       _name_array=[collection
       _name], timeout=None)

加载数据并检索


   
    
     
      
     
     
      
       # Load a collection for caching  
      
     
    
     
      
     
     
      
       milvus.load_collection(collection_name=collection_name, timeout=
       None)  
      
     
    
     
      
     
     
      
       # Search vectors in a collection 
      
     
    
     
      
     
     
      
       search_param = { 
       "nprobe": 
       16 }  
      
     
    
     
      
     
     
      
       milvus.search(collection_name=collection_name,query_records=[vectors[
       0]],partition_tags=
       None,top_k=
       10,params=search_param)

获取集合 / 索引信息


   
    
     
      
     
     
      
       # Return information of a collection 
      
     
    
     
      
     
     
      
       milvus.get_collection_info(collection_name=collection_name, timeout=10)  
      
     
    
     
      
     
     
      
       # Show index information of a collection
      
     
    
     
      
     
     
      
       milvus.get_index_info(collection_name=collection_name, timeout=10)

通过 ID 获取向量


   
    
     
      
     
     
      
       # List the ids in segment  
      
     
    
     
      
     
     
      
       # you can get the segment_name list by get_collection_stats() function  
      
     
    
     
      
     
     
      
       milvus.list_id_in_segment(collection_name =collection_name, segment_name=
       '1600328539015368000', timeout=
       None)  
      
     
    
     
      
     
     
      
       # Return raw vectors according to ids, and you can get the ids list by list_id_in_segment() function 
      
     
    
     
      
     
     
      
       milvus.get_entity_by_id(collection_name=collection_name, ids=[
       0], timeout=
       None)

获取 / 设置参数


   
    
     
      
     
     
      
       # Get Milvus configurations
      
     
    
     
      
     
     
      
       milvus.get_config(parent_key=
       'cache', child_key=
       'cache_size')  
      
     
    
     
      
     
     
      
       # Set Milvus configurations  
      
     
    
     
      
     
     
      
       milvus.set_config(parent_key=
       'cache', child_key=
       'cache_size', 
       value=
       '5G')

删除索引 / 向量 / 分区 / 集合


   
    
     
      
     
     
      
       # Remove an index.   
      
     
    
     
      
     
     
      
       milvus.drop_index(collection_name=collection_name, timeout=
       None)   
      
     
    
     
      
     
     
      
       # Delete vectors in a collection by vector ID.   
      
     
    
     
      
     
     
      
       # id_array (list[int]) -- list of vector id   
      
     
    
     
      
     
     
      
       milvus.delete_entity_by_id(collection_name=collection_name, id_array=[
       0], timeout=
       None)   
      
     
    
     
      
     
     
      
       # Delete a partition in a collection.   
      
     
    
     
      
     
     
      
       milvus.drop_partition(collection_name=collection_name, partition_tag=partition_tag, timeout=
       None)   
      
     
    
     
      
     
     
      
       # Delete a collection by name.   
      
     
    
     
      
     
     
      
       milvus.drop_collection(collection_name=collection_name, timeout=
       10)

‍

写在最后

感谢 Google Colab 提供的免费云服务，简化了 Milvus 源码编译过程，轻松实现 Python 基本操作。同时，Milvus 作为 LF AI & Data 基金会的开源孵化项目，也为机器学习爱好者提供了有关向量相似度检索引擎的资源。相信 Milvus 与 Colab 的结合将加快 AI 在你的机器学习项目中的应用。

本文仅介绍了 Milvus 的基本操作。关于 Milvus 的使用场景可以参考

Milvus-Bootcamp:(https://github.com/milvus-io/bootcamp/tree/master/EN_solutions/graph_based_recommend)，包含了以图搜图，问答机器人以及化学式检索等多个开源项目。