基于 Milvus 的 RAG¶

步骤	技术	执行
Embedding	OpenAI (text-embedding-3-small)	🌐 远程
向量存储	Milvus	💻 本地
生成式 AI	OpenAI (gpt-4o)	🌐 远程

一份食谱 🧑‍🍳 🐥 💚¶

这是一个代码食谱，它使用世界上最先进的开源向量数据库 Milvus，对通过 Docling 解析的文档执行 RAG。

在本 notebook 中，我们将实现以下目标

使用 Docling 的文档转换功能解析文档
使用 Docling 对文档执行分层分块
使用 OpenAI 生成文本嵌入
使用世界上最先进的开源向量数据库 Milvus 执行 RAG

注意：为了获得最佳结果，请使用 GPU 加速 来运行此 notebook。以下是运行此 notebook 的两种选择

在配备 Apple Silicon 芯片的 MacBook 本地运行。 由于 Docling 使用 MPS 加速器，在此 notebook 中转换所有文档在 MacBook M2 上大约需要 2 分钟。
在 Google Colab 上运行此 notebook。 在 Google Colab T4 GPU 上转换所有文档大约需要 8 分钟。

准备工作¶

依赖项和环境¶

首先，通过运行以下命令安装所需的依赖项

In [ ]

已复制！

! pip install --upgrade pymilvus docling openai torch
! pip install --upgrade pymilvus docling openai torch

如果您使用的是 Google Colab，为了启用刚刚安装的依赖项，您可能需要重启运行时环境（点击屏幕顶部的“Runtime”菜单，然后从下拉菜单中选择“Restart session”）。

GPU 检查¶

Docling 之所以如此出色，部分原因在于它可以在商用硬件上运行。这意味着此 notebook 可以在本地机器上通过 GPU 加速运行。如果您使用的是配备 Silicon 芯片的 MacBook，Docling 可以与 Metal Performance Shaders (MPS) 无缝集成。MPS 为 macOS 提供开箱即用的 GPU 加速，与 PyTorch 和 TensorFlow 无缝集成，在 Apple Silicon 上提供高效能性能，并广泛兼容所有支持 Metal 的 GPU。

下面的代码检查 GPU 是否可用，无论是通过 CUDA 还是 MPS。

In [1]

已复制！





import torch

# Check if GPU or MPS is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"CUDA GPU is enabled: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("MPS GPU is enabled.")
else:
    raise OSError(
        "No GPU or MPS device found. Please check your environment and ensure GPU or MPS support is configured."
    )
import torch # Check if GPU or MPS is available if torch.cuda.is_available(): device = torch.device("cuda") print(f"CUDA GPU is enabled: {torch.cuda.get_device_name(0)}") elif torch.backends.mps.is_available(): device = torch.device("mps") print("MPS GPU is enabled.") else: raise OSError( "No GPU or MPS device found. Please check your environment and ensure GPU or MPS support is configured." )

MPS GPU is enabled.

设置 API 密钥¶

在此示例中，我们将使用 OpenAI 作为 LLM。您应该将 OPENAI_API_KEY 准备为环境变量。

In [2]

已复制！

import os

os.environ["OPENAI_API_KEY"] = "sk-***********"
import os os.environ["OPENAI_API_KEY"] = "sk-***********"

准备 LLM 和 Embedding 模型¶

我们初始化 OpenAI 客户端来准备 embedding 模型。

In [3]

已复制！

from openai import OpenAI

openai_client = OpenAI()
from openai import OpenAI openai_client = OpenAI()

定义一个使用 OpenAI 客户端生成文本 embeddings 的函数。我们以 text-embedding-3-small 模型为例。

In [4]

已复制！





def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")
        .data[0]
        .embedding
    )
def emb_text(text): return ( openai_client.embeddings.create(input=text, model="text-embedding-3-small") .data[0] .embedding )

生成一个测试 embedding 并打印其维度和前几个元素。

In [5]

已复制！

test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])
test_embedding = emb_text("This is a test") embedding_dim = len(test_embedding) print(embedding_dim) print(test_embedding[:10])

1536
[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]

使用 Docling 处理数据¶

Docling 可以将各种文档格式解析成统一的表示（Docling Document），然后可以导出为不同的输出格式。有关支持的输入和输出格式的完整列表，请参阅官方文档。

在本教程中，我们将使用 Markdown 文件（源文件）作为输入。我们将使用 Docling 提供的 HierarchicalChunker 处理文档，以生成适合下游 RAG 任务的结构化、分层块。

In [6]

已复制！

from docling_core.transforms.chunker import HierarchicalChunker

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
chunker = HierarchicalChunker()

# Convert the input file to Docling Document
source = "https://milvus.org.cn/docs/overview.md"
doc = converter.convert(source).document

# Perform hierarchical chunking
texts = [chunk.text for chunk in chunker.chunk(doc)]
from docling_core.transforms.chunker import HierarchicalChunker from docling.document_converter import DocumentConverter converter = DocumentConverter() chunker = HierarchicalChunker() # Convert the input file to Docling Document source = "https://milvus.org.cn/docs/overview.md" doc = converter.convert(source).document # Perform hierarchical chunking texts = [chunk.text for chunk in chunker.chunk(doc)]

将数据加载到 Milvus¶

创建集合¶

有了数据，我们可以创建一个 MilvusClient 实例并将数据插入到 Milvus 集合中。

In [7]

已复制！

from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")
collection_name = "my_rag_collection"
from pymilvus import MilvusClient milvus_client = MilvusClient(uri="./milvus_demo.db") collection_name = "my_rag_collection"

关于 MilvusClient 的参数

将 uri 设置为本地文件，例如 ./milvus.db，是最便捷的方法，因为它会自动利用 Milvus Lite 将所有数据存储在此文件中。

如果您有大量数据，可以在 docker 或 kubernetes 上设置性能更好的 Milvus 服务器。在这种设置下，请使用服务器 URI，例如 https://:19530，作为您的 uri。

如果您想使用 Zilliz Cloud，即 Milvus 的完全托管云服务，请调整 uri 和 token，它们对应于 Zilliz Cloud 中的 Public Endpoint 和 Api key。

检查集合是否已存在，如果存在则删除它。

In [8]

已复制！

if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)
if milvus_client.has_collection(collection_name): milvus_client.drop_collection(collection_name)

使用指定的参数创建一个新集合。

如果我们不指定任何字段信息，Milvus 会自动创建一个默认的 id 字段作为主键，以及一个 vector 字段来存储向量数据。一个保留的 JSON 字段用于存储非 schema 定义的字段及其值。

In [9]

已复制！





milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Supported values are (`"Strong"`, `"Session"`, `"Bounded"`, `"Eventually"`). See https://milvus.org.cn/docs/consistency.md#Consistency-Level for more details.
)
milvus_client.create_collection( collection_name=collection_name, dimension=embedding_dim, metric_type="IP", # Inner product distance consistency_level="Strong", # Supported values are (`"Strong"`, `"Session"`, `"Bounded"`, `"Eventually"`). See https://milvus.org.cn/docs/consistency.md#Consistency-Level for more details. )

插入数据¶

In [10]

已复制！

from tqdm import tqdm

data = []

for i, chunk in enumerate(tqdm(texts, desc="Processing chunks")):
    embedding = emb_text(chunk)
    data.append({"id": i, "vector": embedding, "text": chunk})

milvus_client.insert(collection_name=collection_name, data=data)
from tqdm import tqdm data = [] for i, chunk in enumerate(tqdm(texts, desc="Processing chunks")): embedding = emb_text(chunk) data.append({"id": i, "vector": embedding, "text": chunk}) milvus_client.insert(collection_name=collection_name, data=data)

Processing chunks: 100%|██████████| 38/38 [00:14<00:00,  2.59it/s]

Out[10]

{'insert_count': 38, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37], 'cost': 0}

构建 RAG¶

检索查询数据¶

让我们指定一个关于我们刚刚抓取的网站的查询问题。

In [11]

已复制！

question = (
    "What are the three deployment modes of Milvus, and what are their differences?"
)
question = ( "What are the three deployment modes of Milvus, and what are their differences?" )

在集合中搜索该问题并检索语义上的前 3 个匹配项。

In [12]

已复制！





search_res = milvus_client.search(
    collection_name=collection_name,
    data=[emb_text(question)],
    limit=3,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"],
)
search_res = milvus_client.search( collection_name=collection_name, data=[emb_text(question)], limit=3, search_params={"metric_type": "IP", "params": {}}, output_fields=["text"], )

让我们看一下查询的搜索结果

In [13]

已复制！

import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))
import json retrieved_lines_with_distances = [ (res["entity"]["text"], res["distance"]) for res in search_res[0] ] print(json.dumps(retrieved_lines_with_distances, indent=4))

[
    [
        "Milvus offers three deployment modes, covering a wide range of data scales\u2014from local prototyping in Jupyter Notebooks to massive Kubernetes clusters managing tens of billions of vectors:",
        0.6503315567970276
    ],
    [
        "Milvus Lite is a Python library that can be easily integrated into your applications. As a lightweight version of Milvus, it\u2019s ideal for quick prototyping in Jupyter Notebooks or running on edge devices with limited resources. Learn more.\nMilvus Standalone is a single-machine server deployment, with all components bundled into a single Docker image for convenient deployment. Learn more.\nMilvus Distributed can be deployed on Kubernetes clusters, featuring a cloud-native architecture designed for billion-scale or even larger scenarios. This architecture ensures redundancy in critical components. Learn more.",
        0.6281915903091431
    ],
    [
        "What is Milvus?\nUnstructured Data, Embeddings, and Milvus\nWhat Makes Milvus so Fast\uff1f\nWhat Makes Milvus so Scalable\nTypes of Searches Supported by Milvus\nComprehensive Feature Set",
        0.6117826700210571
    ]
]

使用 LLM 获取 RAG 响应¶

将检索到的文档转换为字符串格式。

In [14]

已复制！

context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
context = "\n".join( [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances] )

为语言模型定义系统和用户提示。此提示由从 Milvus 检索到的文档组装而成。

In [16]

已复制！





SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
SYSTEM_PROMPT = """ Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided. """ USER_PROMPT = f""" Use the following pieces of information enclosed intags to provide an answer to the question enclosed in>tags.{context}
{question}
"""

使用 OpenAI ChatGPT 根据提示生成响应。

In [17]

已复制！





response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)
response = openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": USER_PROMPT}, ], ) print(response.choices[0].message.content)

The three deployment modes of Milvus are:

1. **Milvus Lite**: This is a Python library that integrates easily into your applications. It's a lightweight version ideal for quick prototyping in Jupyter Notebooks or for running on edge devices with limited resources.

2. **Milvus Standalone**: This mode is a single-machine server deployment where all components are bundled into a single Docker image, making it convenient to deploy.

3. **Milvus Distributed**: This mode is designed for deployment on Kubernetes clusters. It features a cloud-native architecture suited for managing scenarios at a billion-scale or larger, ensuring redundancy in critical components.