LLamaSharp 快速入门指南

LLamaSharp 是一个高性能的 C# 库，用于与大型语言模型（LLM）进行交互。它通过与一个用 C++ 编写的本地库（称为 backend）进行交互，提供了对多种后端的支持，包括 CPU、CUDA、Metal 和 OpenCL。本文将指导您如何安装 LLamaSharp，并提供一个完整的示例，展示如何使用该库进行聊天。

安装 LLamaSharp

要开始使用 LLamaSharp，您需要安装 LLamaSharp 包和相应的后端。以下是安装步骤：

安装 LLamaSharp 包：

在 NuGet 包管理器控制台中运行以下命令：

Bash
PM> Install-Package LLamaSharp

2. 安装后端：


根据您的设备和需求，选择并安装一个或多个后端包：

- [`LLamaSharp.Backend.Cpu`](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu): 适用于 Windows、Linux 和 macOS 的纯 CPU 后端。
- [`LLamaSharp.Backend.Cuda11`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11): 适用于 Windows 和 Linux 的 CUDA 11 后端。
- [`LLamaSharp.Backend.Cuda12`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12): 适用于 Windows 和 Linux 的 CUDA 12 后端。
- [`LLamaSharp.Backend.OpenCL`](https://www.nuget.org/packages/LLamaSharp.Backend.OpenCL): 适用于 Windows 和 Linux 的 OpenCL 后端。

3. 可选安装： - 如果您希望与 Microsoft Semantic Kernel 集成，可以安装 LLamaSharp.semantic-kernel 包。 - 若要启用 RAG（Retrieval-Augmented Generation）支持，请安装 LLamaSharp.kernel-memory 包（该包仅支持 net6.0 或更高版本）。

模型准备

LLamaSharp 使用 GGUF 格式的模型文件。您可以通过以下两种方式获取 GGUF 文件：

在 Huggingface 上搜索模型名称 + 'gguf'，找到已经转换为 GGUF 格式的模型文件。

HTML
https://huggingface.co/

将 PyTorch 或 Huggingface 格式的模型文件转换为 GGUF 格式。请遵循 llama.cpp README 中的说明，使用 Python 脚本进行转换。

建议下载量化模型而不是 fp16 格式的模型，因为量化模型显著减少了所需的内存大小，同时对生成质量的影响很小。

示例：与 LLM 进行聊天

以下是一个完整的示例，展示如何使用 LLamaSharp 与 LLM 进行聊天。请根据您的模型路径进行相应的更改。

我这里下载的模型是Qwen2.5-3B-Instruct-Q5_K_M-GGUF

C#
using LLama.Common;
using LLama;

namespace AppLLamaSharp
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            string modelPath = "D:\\myproject\\11Test\\AppLLamaSharp\\qwen2.5-3b-instruct-q5_k_m.gguf";

            // 配置模型参数
            var parameters = new ModelParams(modelPath)
            {
                ContextSize = 1024, // 聊天的最长上下文长度
                GpuLayerCount = 5 // 要卸载到 GPU 的层数，根据您的 GPU 内存进行调整
            };

            // 加载模型
            using var model = LLamaWeights.LoadFromFile(parameters);
            using var context = model.CreateContext(parameters);
            var executor = new InteractiveExecutor(context);

            // 添加聊天历史记录
            var chatHistory = new ChatHistory();
            chatHistory.AddMessage(AuthorRole.System, "用户与名为RICK的助手进行对话的记录。RICK乐于助人，友善，诚实，富有写作能力，总是能够立即并准确地满足用户的请求。");
            chatHistory.AddMessage(AuthorRole.User, "Hello, RICK.");
            chatHistory.AddMessage(AuthorRole.Assistant, "你好。今天我能帮你什么?");

            // 创建聊天会话
            ChatSession session = new(executor, chatHistory);

            // 配置推理参数
            InferenceParams inferenceParams = new InferenceParams()
            {
                MaxTokens = 1024, // 回答中最多生成 1024 个 tokens
                AntiPrompts = new List<string> { "User:" } // 一旦出现反提示，停止生成
            };

            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write("聊天会话已启动。\nUser: ");
            Console.ForegroundColor = ConsoleColor.Green;

            string userInput = Console.ReadLine() ?? "";

            // 聊天循环
            while (userInput != "exit")
            {
                await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, userInput), inferenceParams))
                {
                    Console.ForegroundColor = ConsoleColor.White;
                    Console.Write(text);
                }

                Console.ForegroundColor = ConsoleColor.Green;
                userInput = Console.ReadLine() ?? "";
            }
        }
    }
}

如果你GPU不支持会报错，安装一下CPU版本

运行后

代码说明

模型路径：请将 modelPath 替换为您自己的模型文件路径。
模型参数：ContextSize 设置聊天的上下文长度，GpuLayerCount 设置卸载到 GPU 的层数。
聊天历史：通过 ChatHistory 类记录与助手的对话。
推理参数：MaxTokens 限制生成的最大 token 数量，AntiPrompts 用于控制生成的停止条件。
聊天循环：用户输入通过控制台读取，助手生成的响应实时输出。

结论

通过以上步骤，您可以快速上手 LLamaSharp，并与大型语言模型进行交互。希望这个指南能帮助您顺利开始使用 LLamaSharp 进行开发！如需更多示例，请参考 LLamaSharp.Examples。

目录

安装 LLamaSharp

模型准备

示例：与 LLM 进行聊天

运行后

代码说明

结论