RAG检索增强系统实战指南：从向量数据库到语义搜索

Walson 收录于 AI工程化

2026-01-10 约 4649 字预计阅读 10 分钟

前言

在AI智能化服务平台中，我们开发了一个智能工具Agent平台，需要支持基于海量数据的语义搜索。传统的关键词搜索已经无法满足需求，我们需要的是理解用户意图的语义搜索。

这就是RAG（Retrieval-Augmented Generation，检索增强生成）技术的应用场景区。本文将详细记录我们构建RAG系统的全过程。

RAG系统完整流程

混合检索策略

一、什么是RAG？

1.1 RAG架构概述

RAG结合了检索系统和生成模型：

        
用户Query -> 向量化 -> 向量检索 -> TopK相关文档 -> Prompt增强 -> LLM生成回答

核心优势：

✅ 减少LLM幻觉（Hallucination）
✅ 支持实时数据（无需微调模型）
✅ 可解释性（知道答案来自哪篇文档）
✅ 成本更低（不需要训练大模型）

1.2 典型应用场景

企业知识库问答：基于内部文档回答员工问题
智能客服：基于产品手册回答用户咨询
论文检索：基于语义理解找到相关论文
代码搜索：基于功能描述找到相关代码

二、系统架构设计

2.1 整体架构

        
        
        
    
┌─────────────────────────────────────────────────────────┐
│                     应用层                               │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐        │
│  │  知识库管理  │ │  语义搜索   │ │  AI问答     │        │
│  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘        │
└─────────┼───────────────┼───────────────┼───────────────┘
          │               │               │
          └───────────────┴───────────────┘
                          │
          ┌───────────────▼───────────────┐
│         │      RAG Core Service         │
│  ┌──────┴───────────────────────┬──────┐ │
│  │      Embedding Service       │      │ │
│  │  ┌────────────────────────┐  │      │ │
│  │  │  Text Splitter         │  │      │ │
│  │  │  Embedding Model       │  │      │ │
│  │  └────────────────────────┘  │      │ │
│  └──────────────────────────────┘      │ │
│  ┌──────────────────────────────────┐  │ │
│  │      Vector Database             │  │ │
│  │  ┌──────────┐    ┌──────────┐    │  │ │
│  │  │ Collection│    │  Index   │    │  │ │
│  │  └──────────┘    └──────────┘    │  │ │
│  └──────────────────────────────────┘  │ │
│  ┌──────────────────────────────────┐  │ │
│  │      Retrieval & Ranking         │  │ │
│  │  ┌──────────┐    ┌──────────┐    │  │ │
│  │  │ Similarity│    │ Rerank   │    │  │ │
│  │  └──────────┘    └──────────┘    │  │ │
│  └──────────────────────────────────┘  │ │
└────────────────────────────────────────┘

2.2 核心模块

Embedding Service：文本向量化
Vector Database：向量存储与检索
Retrieval Engine：相似度计算与排序
Prompt Builder：构建增强Prompt

三、向量数据库选型

3.1 主流向量数据库对比

特性	Milvus	Pinecone	Weaviate	Qdrant
开源	✅	❌	✅	✅
云原生	✅	✅	✅	✅
分布式	✅	✅	✅	❌
性能	高	高	中	高
社区活跃度	高	中	中	中

我们的选择：Milvus

开源，无厂商锁定
支持百亿级向量
分布式架构，易于扩展
丰富的SDK（Java、Python、Go）

3.2 Milvus核心概念

        
Collection（集合）-> Partition（分区）-> Segment（段）
     ↓
Field（字段）: id（主键）, vector（向量）, metadata（元数据）
     ↓
Index（索引）: IVF_FLAT, HNSW, etc.

3.3 Collection设计

        
        
        
    
@Component
public class MilvusSchemaBuilder {
    
    public CollectionSchemaParam buildDocumentSchema() {
        return CollectionSchemaParam.newBuilder()
            // 主键字段 - 使用业务ID（fixId）
            .addFieldType(FieldType.newBuilder()
                .withName("id")
                .withDataType(DataType.VarChar)
                .withPrimaryKey(true)
                .withMaxLength(64)
                .withAutoID(false)
                .build())
            
            // 向量字段（1536维，对应Gemini embedding-001模型）
            .addFieldType(FieldType.newBuilder()
                .withName("embedding")
                .withDataType(DataType.FloatVector)
                .withDimension(1536)
                .build())
            
            // 设备类型（用于分区优化）
            .addFieldType(FieldType.newBuilder()
                .withName("device_type")
                .withDataType(DataType.VarChar)
                .withMaxLength(50)
                .build())
            
            // 设备名称（标题）
            .addFieldType(FieldType.newBuilder()
                .withName("title")
                .withDataType(DataType.VarChar)
                .withMaxLength(500)
                .build())
            
            // 故障描述
            .addFieldType(FieldType.newBuilder()
                .withName("troubleshooting")
                .withDataType(DataType.VarChar)
                .withMaxLength(65535)
                .build())
            
            // 来源URL
            .addFieldType(FieldType.newBuilder()
                .withName("url")
                .withDataType(DataType.VarChar)
                .withMaxLength(500)
                .build())
            
            // 最后更新时间
            .addFieldType(FieldType.newBuilder()
                .withName("last_update_time")
                .withDataType(DataType.Int64)
                .build())
            
            .build();
    }
}

四、Embedding模型选择

4.1 模型对比

模型	维度	语言支持	性能	适用场景
Gemini embedding-001	768/1536	多语言	高	成本敏感
OpenAI text-embedding-3-small	512/1536	多语言	高	通用场景
OpenAI text-embedding-3-large	256-3072	多语言	极高	高精度需求
BGE-M3	1024	多语言	高	长文本
BAAI/bge-large-zh	1024	中文	高	中文场景

我们的选择：Gemini embedding-001（1536维）

与Chat模型统一（Gemini系列）
免费额度充足，成本可控
1536维是业界主流，平衡精度和性能
与OpenAI ada-002维度一致，便于迁移

4.2 文本分块策略

为什么需要分块？

LLM有上下文长度限制（通常4K-128K tokens）
细粒度分块提高检索精度
避免无关信息干扰

分块策略对比：

        
        
        
    
public interface TextSplitter {
    List<String> split(String text);
}

/**
 * 固定长度分块
 */
@Component
public class FixedLengthSplitter implements TextSplitter {
    
    private final int chunkSize = 500;  // 每块500字符
    private final int overlap = 50;     // 重叠50字符
    
    @Override
    public List<String> split(String text) {
        List<String> chunks = new ArrayList<>();
        
        int start = 0;
        while (start < text.length()) {
            int end = Math.min(start + chunkSize, text.length());
            chunks.add(text.substring(start, end));
            start = end - overlap;  // 重叠部分
        }
        
        return chunks;
    }
}

/**
 * 递归字符分块（推荐）
 * 优先按段落、句子分割，保持语义完整
 */
@Component
public class RecursiveCharacterSplitter implements TextSplitter {
    
    private final List<String> separators = Arrays.asList(
        "\n\n",  // 段落
        "\n",    // 换行
        "。",    // 句号
        "！",    // 感叹号
        "？",    // 问号
        " "      // 空格
    );
    
    @Override
    public List<String> split(String text) {
        return splitRecursive(text, 0);
    }
    
    private List<String> splitRecursive(String text, int separatorIndex) {
        List<String> chunks = new ArrayList<>();
        
        if (separatorIndex >= separators.size()) {
            chunks.add(text);
            return chunks;
        }
        
        String separator = separators.get(separatorIndex);
        String[] parts = text.split(separator);
        
        for (String part : parts) {
            if (part.length() > 500) {
                // 仍太长，继续细分
                chunks.addAll(splitRecursive(part, separatorIndex + 1));
            } else {
                chunks.add(part);
            }
        }
        
        return chunks;
    }
}

我们的实践：

普通文档：递归字符分块，chunk_size=500，overlap=50
代码：按函数/类分块
表格：整行作为一个chunk

五、向量化与入库流程

5.1 整体流程

        
        
        
    
@Service
public class DocumentIngestionService {
    
    @Autowired
    private TextSplitter textSplitter;
    
    @Autowired
    private AIService aiService;
    
    @Autowired
    private MilvusClient milvusClient;
    
    /**
     * 文档入库流程
     */
    public void ingestDocument(Document document) {
        // 1. 文本预处理
        String cleanedText = preprocess(document.getContent());
        
        // 2. 分块
        List<String> chunks = textSplitter.split(cleanedText);
        
        // 3. 批量向量化（优化：并行处理）
        List<List<Float>> embeddings = chunks.parallelStream()
            .map(chunk -> aiService.embed(EmbeddingRequest.builder()
                .text(chunk)
            .model("gemini-embedding-001")
                .build()))
            .map(response -> response.getEmbedding())
            .collect(Collectors.toList());
        
        // 4. 构建插入数据
        List<InsertParam.Field> fields = Arrays.asList(
            new InsertParam.Field("id", 
                chunks.stream().map(c -> document.getId() + "_" + c.hashCode()).collect(Collectors.toList())),
            new InsertParam.Field("embedding", embeddings),
            new InsertParam.Field("device_type", 
                chunks.stream().map(c -> document.getDeviceType()).collect(Collectors.toList())),
            new InsertParam.Field("title", 
                chunks.stream().map(c -> document.getTitle()).collect(Collectors.toList())),
            new InsertParam.Field("troubleshooting", chunks),
            new InsertParam.Field("url", 
                chunks.stream().map(c -> document.getUrl()).collect(Collectors.toList())),
            new InsertParam.Field("last_update_time", 
                chunks.stream().map(c -> System.currentTimeMillis()).collect(Collectors.toList()))
        );
        
        // 5. 批量插入Milvus
        InsertParam insertParam = InsertParam.newBuilder()
            .withCollectionName("ifixit_vectors")
            .withFields(fields)
            .build();
        
        R<MutationResult> result = milvusClient.insert(insertParam);
        
        if (result.getStatus() != R.Status.Success.getCode()) {
            throw new RuntimeException("插入失败: " + result.getMessage());
        }
        
        log.info("文档入库成功，docId={}, chunks={}", document.getId(), chunks.size());
    }
    
    /**
     * 批量入库（大数据量优化）
     */
    public void batchIngest(List<Document> documents) {
        // 分批处理，每批1000条
        List<List<Document>> batches = Lists.partition(documents, 1000);
        
        for (List<Document> batch : batches) {
            batch.parallelStream().forEach(this::ingestDocument);
            
            // 防止内存溢出，主动GC
            System.gc();
        }
    }
}

5.2 大数据量处理优化

场景： 10万+条YouTube视频资源向量化

优化策略：

        
        
        
    
@Service
public class LargeScaleVectorizationService {
    
    @Autowired
    private ThreadPoolExecutor taskExecutor;
    
    /**
     * 流式处理，避免OOM
     */
    public void streamIngest(Stream<Document> documentStream) {
        documentStream
            .parallel()
            .map(this::processDocument)
            .filter(Objects::nonNull)
            .collect(Collectors.groupingByConcurrent(
                chunk -> chunk.hashCode() % 100,  // 分100组
                Collectors.toList()
            ))
            .values()
            .forEach(this::batchInsert);
    }
    
    /**
     * 异步批量向量化
     */
    @Async("taskExecutor")
    public CompletableFuture<Void> asyncVectorize(List<Document> documents) {
        return CompletableFuture.runAsync(() -> {
            documents.forEach(doc -> {
                try {
                    ingestDocument(doc);
                } catch (Exception e) {
                    log.error("文档向量化失败，docId={}", doc.getId(), e);
                    // 记录失败，后续补偿
                    saveFailedDoc(doc);
                }
            });
        }, taskExecutor);
    }
    
    /**
     * 定时补偿失败任务
     */
    @Scheduled(fixedDelay = 300000) // 每5分钟
    public void compensateFailedDocs() {
        List<Document> failedDocs = getFailedDocs();
        if (!failedDocs.isEmpty()) {
            log.info("补偿处理{}条失败文档", failedDocs.size());
            failedDocs.forEach(this::ingestDocument);
        }
    }
}

六、检索与排序算法

6.1 向量相似度计算

        
        
        
    
@Service
public class VectorSearchService {
    
    @Autowired
    private AIService aiService;
    
    @Autowired
    private MilvusClient milvusClient;
    
    /**
     * 语义搜索
     */
    public List<SearchResult> search(String query, int topK) {
        // 1. Query向量化
        EmbeddingResponse embedding = aiService.embed(EmbeddingRequest.builder()
            .text(query)
            .model("gemini-embedding-001")
            .build());
        
        // 2. 向量检索
        SearchParam searchParam = SearchParam.newBuilder()
            .withCollectionName("ifixit_vectors")
            .withVectors(Collections.singletonList(embedding.getEmbedding()))
            .withVectorFieldName("embedding")
            .withTopK(topK)
            .withMetricType(MetricType.COSINE)  // 余弦相似度
            .withParams("{\"nprobe\": 10}")      // IVF_FLAT搜索参数，探测10个簇
            .build();
        
        R<SearchResults> result = milvusClient.search(searchParam);
        
        if (result.getStatus() != R.Status.Success.getCode()) {
            throw new RuntimeException("搜索失败: " + result.getMessage());
        }
        
        // 3. 解析结果
        return parseSearchResults(result.getData());
    }
}

6.2 相似度算法选择

算法	公式	适用场景
余弦相似度	cos(θ) = A·B / (\|\|A\|\| * \|\|B\|\|)	方向更重要（推荐）
欧氏距离	\|\|A - B\|\|	绝对距离重要
点积	A·B	考虑向量长度

我们的选择：余弦相似度

对向量长度不敏感
适合语义相似度计算

6.3 混合检索策略

向量检索 + 关键词检索（Hybrid Search）

        
        
        
    
@Service
public class HybridSearchService {
    
    /**
     * 混合检索
     */
    public List<SearchResult> hybridSearch(String query, int topK) {
        // 1. 向量检索
        List<SearchResult> vectorResults = vectorSearchService.search(query, topK * 2);
        
        // 2. 关键词检索（BM25）
        List<SearchResult> keywordResults = keywordSearchService.search(query, topK * 2);
        
        // 3. 结果融合（RRF - Reciprocal Rank Fusion）
        return reciprocalRankFusion(vectorResults, keywordResults, topK);
    }
    
    /**
     * RRF算法融合结果
     */
    private List<SearchResult> reciprocalRankFusion(
            List<SearchResult> vectorResults, 
            List<SearchResult> keywordResults, 
            int topK) {
        
        Map<String, Double> scores = new HashMap<>();
        int k = 60; // RRF常数
        
        // 向量检索结果打分
        for (int i = 0; i < vectorResults.size(); i++) {
            String docId = vectorResults.get(i).getDocId();
            double score = 1.0 / (k + i + 1);
            scores.merge(docId, score, Double::sum);
        }
        
        // 关键词检索结果打分
        for (int i = 0; i < keywordResults.size(); i++) {
            String docId = keywordResults.get(i).getDocId();
            double score = 1.0 / (k + i + 1);
            scores.merge(docId, score, Double::sum);
        }
        
        // 排序取TopK
        return scores.entrySet().stream()
            .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
            .limit(topK)
            .map(entry -> {
                SearchResult result = new SearchResult();
                result.setDocId(entry.getKey());
                result.setScore(entry.getValue());
                return result;
            })
            .collect(Collectors.toList());
    }
}

6.4 重排序（Reranking）

为什么需要重排序？

向量检索召回率高但精度有限
引入交叉编码器（Cross-Encoder）精排

        
        
        
    
@Service
public class RerankService {
    
    @Autowired
    private AIService aiService;
    
    /**
     * 重排序
     */
    public List<SearchResult> rerank(String query, List<SearchResult> candidates) {
        // 构建交叉编码器输入
        List<RerankRequest> requests = candidates.stream()
            .map(result -> RerankRequest.builder()
                .query(query)
                .document(result.getContent())
                .build())
            .collect(Collectors.toList());
        
        // 调用重排序模型（如BGE-Reranker）
        List<RerankResponse> responses = aiService.rerankBatch(requests);
        
        // 按重排序分数排序
        return IntStream.range(0, candidates.size())
            .mapToObj(i -> {
                SearchResult result = candidates.get(i);
                result.setRerankScore(responses.get(i).getScore());
                return result;
            })
            .sorted(Comparator.comparing(SearchResult::getRerankScore).reversed())
            .collect(Collectors.toList());
    }
}

七、Prompt构建与答案生成

7.1 Prompt模板设计

        
        
        
    
@Component
public class RAGPromptBuilder {
    
    private final String template = """
        你是一个专业的AI助手。请基于以下参考资料回答用户问题。
        
        参考资料：
        {% for doc in documents %}
        [{{ loop.index }}] {{ doc.content }}
        来源：{{ doc.source }}
        {% endfor %}
        
        用户问题：{{ query }}
        
        要求：
        1. 基于参考资料回答，不要编造信息
        2. 如果参考资料不足以回答问题，请明确说明
        3. 回答要简洁明了，突出重点
        4. 如果可能，请指出信息来源的编号
        
        请回答：
        """;
    
    public String buildPrompt(String query, List<Document> documents) {
        Map<String, Object> variables = new HashMap<>();
        variables.put("query", query);
        variables.put("documents", documents);
        
        return renderTemplate(template, variables);
    }
}

7.2 完整RAG流程

        
        
        
    
@Service
public class RAGService {
    
    @Autowired
    private HybridSearchService searchService;
    
    @Autowired
    private RerankService rerankService;
    
    @Autowired
    private RAGPromptBuilder promptBuilder;
    
    @Autowired
    private AIService aiService;
    
    /**
     * 完整的RAG流程
     */
    public RAGResponse answer(String query) {
        // 1. 混合检索（召回）
        List<SearchResult> candidates = searchService.hybridSearch(query, 20);
        
        // 2. 重排序（精排）
        List<SearchResult> rankedResults = rerankService.rerank(query, candidates);
        
        // 3. 取Top5作为上下文
        List<Document> contextDocs = rankedResults.stream()
            .limit(5)
            .map(this::getDocumentById)
            .collect(Collectors.toList());
        
        // 4. 构建Prompt
        String prompt = promptBuilder.buildPrompt(query, contextDocs);
        
        // 5. 调用LLM生成回答
        ChatResponse response = aiService.chat(ChatRequest.builder()
            .message(ChatMessage.user(prompt))
            .temperature(0.3)  // 降低随机性
            .maxTokens(2000)
            .build());
        
        // 6. 返回结果
        return RAGResponse.builder()
            .answer(response.getContent())
            .sources(contextDocs)
            .confidence(calculateConfidence(rankedResults))
            .build();
    }
}

八、性能优化与最佳实践

8.1 索引优化

        
        
        
    
/**
 * 创建IVF_FLAT索引（当前使用）
 * 适合有明显聚类特征的数据，内存友好
 */
public void createIVFIndex() {
    CreateIndexParam indexParam = CreateIndexParam.newBuilder()
        .withCollectionName("ifixit_vectors")
        .withFieldName("embedding")
        .withIndexType(IndexType.IVF_FLAT)      // 倒排文件索引
        .withMetricType(MetricType.COSINE)      // 余弦相似度
        .withExtraParam("{\"nlist\": 2816}")     // 聚类中心数，根据数据量调整
        .build();
    
    milvusClient.createIndex(indexParam);
}

/**
 * 创建HNSW索引（未来升级选项）
 * 适合高维向量，查询速度更快但内存占用更高
 */
public void createHNSWIndex() {
    CreateIndexParam indexParam = CreateIndexParam.newBuilder()
        .withCollectionName("ifixit_vectors")
        .withFieldName("embedding")
        .withIndexType(IndexType.HNSW)
        .withMetricType(MetricType.COSINE)
        .withExtraParam(Map.of(
            "M", 16,        // 每个节点的邻居数
            "efConstruction", 200  // 构建时的搜索深度
        ))
        .build();
    
    milvusClient.createIndex(indexParam);
}

8.2 分区策略

        
        
        
    
/**
 * 按文档类型分区（可选优化）
 * 实际项目中可以根据设备类型创建分区，如手机、电脑、家电等
 */
public void createPartitions() {
    String[] deviceTypes = {"smartphone", "laptop", "appliance", "wearable"};
    
    for (String type : deviceTypes) {
        CreatePartitionParam partitionParam = CreatePartitionParam.newBuilder()
            .withCollectionName("ifixit_vectors")
            .withPartitionName("type_" + type)
            .build();
        
        milvusClient.createPartition(partitionParam);
    }
}

/**
 * 搜索指定分区
 */
public List<SearchResult> searchInPartition(String query, String deviceType) {
    SearchParam searchParam = SearchParam.newBuilder()
        .withCollectionName("ifixit_vectors")
        .withPartitionNames(Collections.singletonList("type_" + deviceType))
        // ... 其他参数
        .build();
    
    return milvusClient.search(searchParam);
}

九、监控与评估

9.1 检索质量评估

        
        
        
    
@Component
public class RAGEvaluator {
    
    /**
     * 评估指标
     */
    public EvaluationMetrics evaluate(List<TestCase> testCases) {
        int hitCount = 0;
        double totalPrecision = 0;
        double totalRecall = 0;
        
        for (TestCase testCase : testCases) {
            // 执行检索
            List<SearchResult> results = ragService.search(testCase.getQuery(), 10);
            Set<String> retrievedIds = results.stream()
                .map(SearchResult::getDocId)
                .collect(Collectors.toSet());
            
            Set<String> relevantIds = testCase.getRelevantDocIds();
            
            // 计算Precision@K
            int relevantRetrieved = (int) retrievedIds.stream()
                .filter(relevantIds::contains)
                .count();
            double precision = (double) relevantRetrieved / retrievedIds.size();
            
            // 计算Recall
            double recall = (double) relevantRetrieved / relevantIds.size();
            
            // Hit@1
            if (!results.isEmpty() && relevantIds.contains(results.get(0).getDocId())) {
                hitCount++;
            }
            
            totalPrecision += precision;
            totalRecall += recall;
        }
        
        return EvaluationMetrics.builder()
            .averagePrecision(totalPrecision / testCases.size())
            .averageRecall(totalRecall / testCases.size())
            .hitAt1((double) hitCount / testCases.size())
            .build();
    }
}

9.2 业务指标监控

        
        
        
    
@Component
public class RAGMetrics {
    
    private final MeterRegistry meterRegistry;
    
    public void recordSearch(String query, int resultCount, long latency) {
        // 记录检索延迟
        meterRegistry.timer("rag.search.latency")
            .record(latency, TimeUnit.MILLISECONDS);
        
        // 记录结果数量
        meterRegistry.distributionSummary("rag.search.result.count")
            .record(resultCount);
        
        // 记录Query长度分布
        meterRegistry.distributionSummary("rag.query.length")
            .record(query.length());
    }
    
    public void recordAnswerQuality(String query, double score) {
        meterRegistry.distributionSummary("rag.answer.quality.score")
            .record(score);
    }
}

十、总结

构建RAG系统的关键要点：

Embedding模型选择：根据语言、场景选择合适的模型
文本分块策略：平衡粒度与语义完整性
向量数据库：Milvus适合大规模数据，支持分布式
检索策略：向量检索+关键词检索混合，效果更好
重排序：引入Cross-Encoder提升精度
Prompt工程：清晰指示模型基于上下文回答

通过RAG系统，我们实现了：

语义搜索准确率：从60%提升到85%
向量检索延迟：<100ms（百万级数据）
支持数据：10万+文档，100万+向量

希望这篇指南对你有帮助！

相关文章：

目录