此页面由 Cloud Translation API 翻译。

在浏览器中缓存 AI 模型

Thomas Steiner

大多数 AI 模型都有一个共同点：对于通过互联网传输的资源而言，它们相当大。最小的 MediaPipe 对象检测模型 (SSD MobileNetV2 float16) 的大小为 5.6 MB，最大的大小约为 25 MB。

开源 LLM gemma-2b-it-gpu-int4.bin 的大小为 1.35 GB，对于 LLM 而言，这被认为非常小。生成式 AI 模型可能非常庞大。因此，目前很多 AI 用例都在云端运行。越来越多的应用直接在设备端运行高度优化的模型。虽然存在在浏览器中运行 LLM 的演示，但下面是一些在浏览器中运行的其他生产级模型的示例：

网页版 Adobe Photoshop 中，AI 赋能的对象选择工具处于打开状态，其中选择了三个对象：两只长颈鹿和一轮明月。

Adobe Photoshop 会在设备端运行 Conv2D 模型的变体，以便其智能对象选择工具正常运行。
Google Meet 会运行经过优化的 MobileNetV3-small 模型，以便为其背景虚化功能进行人体分割。
Tokopedia 运行 MediaPipeFaceDetector-TFJS 模型以进行实时人脸检测，以防止用户注册其服务时出现无效情况。
借助 Google Colab，用户可以在 Colab 笔记本中使用硬盘中的模型。

为了加快应用日后启动的速度，您应在设备上明确缓存模型数据，而不是依赖于隐式 HTTP 浏览器缓存。

虽然本指南使用 gemma-2b-it-gpu-int4.bin model 创建聊天机器人，但此方法可以推广到适用于设备端的其他模型和其他用例。将应用与模型相关联的最常见方法是将模型与应用的其余资源一起提供。优化传送至关重要。

配置正确的缓存标头

如果您要从服务器提供 AI 模型，请务必配置正确的 Cache-Control 标头。以下示例展示了一个可靠的默认设置，您可以根据应用的需求在此基础上进行构建。

Cache-Control: public, max-age=31536000, immutable

AI 模型的每个已发布版本都是静态资源。对于永远不会更改的内容，应在请求网址中为其指定较长的 max-age 并结合使用缓存破坏。如果您确实需要更新模型，则必须为其提供新的网址。

当用户重新加载网页时，客户端会发送重新验证请求，即使服务器知道内容是稳定的也是如此。immutable 指令明确表明，由于内容不会发生变化，因此无需重新验证。浏览器和中间缓存或代理服务器不广泛支持 immutable 指令，但通过将其与普遍认可的 max-age 指令结合使用，您可以确保最大限度地提高兼容性。public 响应指令表示响应可以存储在共享缓存中。

Chrome DevTools 会在请求 AI 模型时显示 Hugging Face 发送的生产 `Cache-Control` 标头。（来源）

在客户端缓存 AI 模型

在提供 AI 模型时，请务必在浏览器中显式缓存模型。这样可以确保在用户重新加载应用后，模型数据随时可用。

您可以使用多种方法来实现这一目标。对于以下代码示例，假设每个模型文件都存储在内存中名为 blob 的 Blob 对象中。

为了了解性能，每个代码示例都带有 performance.mark() 和 performance.measure() 方法注解。这些措施因设备而异，无法推广。

在 Chrome 开发者工具的 **Application** > **Storage** 中，查看包含 IndexedDB、缓存存储空间和文件系统的细分的使用情况图。每个片段显示消耗了 1354 兆字节的数据，总计为 4063 兆字节。

您可以选择使用以下任一 API 在浏览器中缓存 AI 模型：Cache API、Origin Private File System API 和 IndexedDB API。一般建议使用 Cache API，但本指南将介绍所有选项的优缺点。

Cache API

Cache API 为缓存在长效内存中的 Request 和 Response 对象对提供永久存储空间。虽然此 API 在服务工件规范中定义，但您也可以从主线程或常规 worker 使用此 API。如需在服务工件上下文之外使用它，请使用合成 Response 对象调用 Cache.put() 方法，并将其与合成网址（而非 Request 对象）搭配使用。

本指南假定使用的是内存中的 blob。使用虚假网址作为缓存键，并基于 blob 使用合成 Response。如果您要直接下载模型，则需要使用通过发出 fetch() 请求获得的 Response。

例如，下面展示了如何使用 Cache API 存储和恢复模型文件。

const storeFileInSWCache = async (blob) => {
  try {
    performance.mark('start-sw-cache-cache');
    const modelCache = await caches.open('models');
    await modelCache.put('model.bin', new Response(blob));
    performance.mark('end-sw-cache-cache');

    const mark = performance.measure(
      'sw-cache-cache',
      'start-sw-cache-cache',
      'end-sw-cache-cache'
    );
    console.log('Model file cached in sw-cache.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromSWCache = async () => {
  try {
    performance.mark('start-sw-cache-restore');
    const modelCache = await caches.open('models');
    const response = await modelCache.match('model.bin');
    if (!response) {
      throw new Error(`File model.bin not found in sw-cache.`);
    }
    const file = await response.blob();
    performance.mark('end-sw-cache-restore');
    const mark = performance.measure(
      'sw-cache-restore',
      'start-sw-cache-restore',
      'end-sw-cache-restore'
    );
    console.log(mark.name, mark.duration.toFixed(2));
    console.log('Cached model file found in sw-cache.');
    return file;
  } catch (err) {    
    throw err;
  }
};

Origin Private File System API

源私有文件系统 (OPFS) 是存储端点的标准，相对而言还比较新。与常规文件系统不同，它对网页的来源是私有的，因此对用户不可见。它提供对一个特殊文件的访问权限，该文件针对性能进行了高度优化，并提供对其内容的写入权限。

例如，以下是如何在 OPFS 中存储和恢复模型文件。

const storeFileInOPFS = async (blob) => {
  try {
    performance.mark('start-opfs-cache');
    const root = await navigator.storage.getDirectory();
    const handle = await root.getFileHandle('model.bin', { create: true });
    const writable = await handle.createWritable();
    await blob.stream().pipeTo(writable);
    performance.mark('end-opfs-cache');
    const mark = performance.measure(
      'opfs-cache',
      'start-opfs-cache',
      'end-opfs-cache'
    );
    console.log('Model file cached in OPFS.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromOPFS = async () => {
  try {
    performance.mark('start-opfs-restore');
    const root = await navigator.storage.getDirectory();
    const handle = await root.getFileHandle('model.bin');
    const file = await handle.getFile();
    performance.mark('end-opfs-restore');
    const mark = performance.measure(
      'opfs-restore',
      'start-opfs-restore',
      'end-opfs-restore'
    );
    console.log('Cached model file found in OPFS.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

IndexedDB API

IndexedDB 是一种成熟的标准，用于在浏览器中以永久方式存储任意数据。IndexedDB 以其 API 略显复杂而闻名，但通过使用封装容器库（例如 idb-keyval），您可以将 IndexedDB 视为传统的键值对存储。

例如：

import { get, set } from 'https://cdn.jsdelivr.net/npm/idb-keyval@latest/+esm';

const storeFileInIDB = async (blob) => {
  try {
    performance.mark('start-idb-cache');
    await set('model.bin', blob);
    performance.mark('end-idb-cache');
    const mark = performance.measure(
      'idb-cache',
      'start-idb-cache',
      'end-idb-cache'
    );
    console.log('Model file cached in IDB.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromIDB = async () => {
  try {
    performance.mark('start-idb-restore');
    const file = await get('model.bin');
    if (!file) {
      throw new Error('File model.bin not found in IDB.');
    }
    performance.mark('end-idb-restore');
    const mark = performance.measure(
      'idb-restore',
      'start-idb-restore',
      'end-idb-restore'
    );
    console.log('Cached model file found in IDB.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

将存储空间标记为持久存储

在任何这些缓存方法的末尾调用 navigator.storage.persist() 以请求使用永久存储空间的权限。此方法会返回一个 promise，如果已授予权限，则解析为 true；否则，解析为 false。浏览器可能会或可能不会遵循该请求，具体取决于浏览器专用规则。

if ('storage' in navigator && 'persist' in navigator.storage) {
  try {
    const persistent = await navigator.storage.persist();
    if (persistent) {
      console.log("Storage will not be cleared except by explicit user action.");
      return;
    }
    console.log("Storage may be cleared under storage pressure.");  
  } catch (err) {
    console.error(err.name, err.message);
  }
}

提示：在受支持的浏览器中使用 Storage Buckets API 将模型标记为持久化：

// Create a persisted storage bucket specifically for models.
const modelsBucket = await navigator.storageBuckets.open(models', {
  durability: 'strict'
  persisted: true,
});

特殊情况：在硬盘上使用模型

您可以直接从用户的硬盘引用 AI 模型，作为浏览器存储空间的替代方案。此技术可帮助以研究为重点的应用展示在浏览器中运行给定模型的可行性，或让艺术家在专家创意应用中使用自行训练的模型。

File System Access API

借助 File System Access API，您可以打开硬盘上的文件并获取一个 FileSystemFileHandle，以便将其持久保存到 IndexedDB。

采用这种模式时，用户只需授予对模型文件的访问权限一次。借助持久性权限，用户可以选择永久授予对文件的访问权限。重新加载应用并执行所需的用户手势（例如鼠标点击）后，即可从 IndexedDB 恢复 FileSystemFileHandle，并访问硬盘上的文件。

系统会根据需要查询和请求文件访问权限，以便日后顺利重新加载。以下示例展示了如何从硬盘获取文件的句柄，然后存储和恢复句柄。

import { fileOpen } from 'https://cdn.jsdelivr.net/npm/browser-fs-access@latest/dist/index.modern.js';
import { get, set } from 'https://cdn.jsdelivr.net/npm/idb-keyval@latest/+esm';

button.addEventListener('click', async () => {
  try {
    const file = await fileOpen({
      extensions: ['.bin'],
      mimeTypes: ['application/octet-stream'],
      description: 'AI model files',
    });
    if (file.handle) {
      // It's an asynchronous method, but no need to await it.
      storeFileHandleInIDB(file.handle);
    }
    return file;
  } catch (err) {
    if (err.name !== 'AbortError') {
      console.error(err.name, err.message);
    }
  }
});

const storeFileHandleInIDB = async (handle) => {
  try {
    performance.mark('start-file-handle-cache');
    await set('model.bin.handle', handle);
    performance.mark('end-file-handle-cache');
    const mark = performance.measure(
      'file-handle-cache',
      'start-file-handle-cache',
      'end-file-handle-cache'
    );
    console.log('Model file handle cached in IDB.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromFileHandle = async () => {
  try {
    performance.mark('start-file-handle-restore');
    const handle = await get('model.bin.handle');
    if (!handle) {
      throw new Error('File handle model.bin.handle not found in IDB.');
    }
    if ((await handle.queryPermission()) !== 'granted') {
      const decision = await handle.requestPermission();
      if (decision === 'denied' || decision === 'prompt') {
        throw new Error(Access to file model.bin.handle not granted.');
      }
    }
    const file = await handle.getFile();
    performance.mark('end-file-handle-restore');
    const mark = performance.measure(
      'file-handle-restore',
      'start-file-handle-restore',
      'end-file-handle-restore'
    );
    console.log('Cached model file handle found in IDB.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

这些方法并不相互排斥。在某些情况下，您可能需要在浏览器中显式缓存模型，同时使用用户硬盘中的模型。

演示

您可以在 MediaPipe LLM 演示版中查看所有三种常规案例存储方法以及实现的硬盘方法。

附加内容：分块下载大型文件

如果您需要从互联网下载大型 AI 模型，请将下载分解为单独的块，然后在客户端上重新拼接。

以下是可在代码中使用的辅助函数。您只需向其传递 url。chunkSize（默认：5MB）、maxParallelRequests（默认：6）、progressCallback 函数（用于报告 downloadedBytes 和总 fileSize）以及 AbortSignal 信号的 signal 都是可选的。

您可以将以下函数复制到项目中，也可以通过 npm 安装 fetch-in-chunks 软件包。

async function fetchInChunks(
  url,
  chunkSize = 5 * 1024 * 1024,
  maxParallelRequests = 6,
  progressCallback = null,
  signal = null
) {
  // Helper function to get the size of the remote file using a HEAD request
  async function getFileSize(url, signal) {
    const response = await fetch(url, { method: 'HEAD', signal });
    if (!response.ok) {
      throw new Error('Failed to fetch the file size');
    }
    const contentLength = response.headers.get('content-length');
    if (!contentLength) {
      throw new Error('Content-Length header is missing');
    }
    return parseInt(contentLength, 10);
  }

  // Helper function to fetch a chunk of the file
  async function fetchChunk(url, start, end, signal) {
    const response = await fetch(url, {
      headers: { Range: `bytes=${start}-${end}` },
      signal,
    });
    if (!response.ok && response.status !== 206) {
      throw new Error('Failed to fetch chunk');
    }
    return await response.arrayBuffer();
  }

  // Helper function to download chunks with parallelism
  async function downloadChunks(
    url,
    fileSize,
    chunkSize,
    maxParallelRequests,
    progressCallback,
    signal
  ) {
    let chunks = [];
    let queue = [];
    let start = 0;
    let downloadedBytes = 0;

    // Function to process the queue
    async function processQueue() {
      while (start < fileSize) {
        if (queue.length < maxParallelRequests) {
          let end = Math.min(start + chunkSize - 1, fileSize - 1);
          let promise = fetchChunk(url, start, end, signal)
            .then((chunk) => {
              chunks.push({ start, chunk });
              downloadedBytes += chunk.byteLength;

              // Update progress if callback is provided
              if (progressCallback) {
                progressCallback(downloadedBytes, fileSize);
              }

              // Remove this promise from the queue when it resolves
              queue = queue.filter((p) => p !== promise);
            })
            .catch((err) => {              
              throw err;              
            });
          queue.push(promise);
          start += chunkSize;
        }
        // Wait for at least one promise to resolve before continuing
        if (queue.length >= maxParallelRequests) {
          await Promise.race(queue);
        }
      }

      // Wait for all remaining promises to resolve
      await Promise.all(queue);
    }

    await processQueue();

    return chunks.sort((a, b) => a.start - b.start).map((chunk) => chunk.chunk);
  }

  // Get the file size
  const fileSize = await getFileSize(url, signal);

  // Download the file in chunks
  const chunks = await downloadChunks(
    url,
    fileSize,
    chunkSize,
    maxParallelRequests,
    progressCallback,
    signal
  );

  // Stitch the chunks together
  const blob = new Blob(chunks);

  return blob;
}

export default fetchInChunks;

选择适合您的方法

本指南探讨了在浏览器中有效缓存 AI 模型的各种方法，这对于提升用户体验和应用性能至关重要。Chrome 存储团队建议使用 Cache API 以实现最佳性能，确保快速访问 AI 模型、缩短加载时间并提高响应速度。

OPFS 和 IndexedDB 的可用性较差。OPFS 和 IndexedDB API 需要先序列化数据，然后才能存储数据。IndexedDB 在检索数据时还需要对数据进行反序列化，因此它是最糟糕的大型模型存储位置。

对于小众应用，File System Access API 可让应用直接访问用户设备上的文件，非常适合自行管理 AI 模型的用户。

如果您需要保护 AI 模型，请将其保留在服务器上。在客户端上存储数据后，您可以轻松使用开发者工具或 OPS 开发者工具扩展程序从缓存和 IndexedDB 中提取数据。这些存储 API 在安全性方面本质上是平等的。您可能会考虑存储加密版本的模型，但之后需要将解密密钥发送给客户端，而该密钥可能会被拦截。这意味着，恶意行为者要想盗取您的模型会稍微困难一些，但并非不可能。

我们建议您选择符合应用要求、目标受众群体行为和所用 AI 模型特征的缓存策略。这样可确保您的应用在各种网络条件和系统限制下都能快速响应且运行稳定。

致谢

此文档由 Joshua Bell、Reilly Grant、Evan Stade、Nathan Memmott、Austin Sullivan、Etienne Noël、André Bandarra、Alexandra Klepper、François Beaufort、Paul Kinlan 和 Rachel Andrew 审核。