本頁面由 Cloud Translation API 翻譯而成。

在瀏覽器中快取 AI 模型

Thomas Steiner

大多數 AI 模型都至少有一件事：相當龐大透過網際網路轉移的資料最小的 MediaPipe 物件偵測模型 (SSD MobileNetV2 float16) 變為 5.6 MB 最大約 25 MB

開放原始碼 LLM gemma-2b-it-gpu-int4.bin敬上 1.35 GB 的時鐘，對 LLM 而言就視為非常小。生成式 AI 模型可能相當龐大因此，現今大量使用 AI 技術許多機構使用 Google Cloud 時第一個常見的要求是在雲端執行程式碼越來越多應用程式直接執行經過高度最佳化的模型應用程式。在瀏覽器中執行 LLM 示範這裡有一些正式環境等級的模型，瀏覽器：

網頁版 Adobe Photoshop，而 AI 技術輔助的物件選取工具已開啟，其中已選取三個物件：兩個長頸鹿和月亮。

Adobe Photoshop 執行 Conv2D 模型的變化版本導入智慧型物件選取工具
Google Meet 執行的是最佳化版本的 MobileNetV3-small 模型背景模糊處理功能則可呈現出人物區隔
Tokopedia 執行 MediaPipeFaceDetector-TFJS 模型，即時偵測臉部偵測服務，防止無效的服務註冊。
Google Colab 可讓使用者使用硬碟中的模型在 Colab 筆記本中部署

為了加快日後啟動應用程式的速度，建議您明確快取擷取模型資料，而非仰賴隱式 HTTP 瀏覽器快取。

本指南使用 gemma-2b-it-gpu-int4.bin model 建立聊天機器人，可將做法一般化，以符合其他模型和其他用途應用程式。將應用程式連結至模型最常見的方式，就是提供與其他應用程式資源一起處理想要最佳化廣告放送。

設定合適的快取標頭

如要透過伺服器提供 AI 模型，請務必正確設定 Cache-Control敬上標題。以下範例顯示可靠的預設設定，您可以透過滿足你的應用程式需求

Cache-Control: public, max-age=31536000, immutable

每個 AI 模型發布的版本都是靜態資源。永不過時的內容您應延長 max-age敬上搭配快取清除功能。如需更新模型，您必須新網址。

當使用者重新載入網頁時，用戶端會傳送重新驗證要求，但伺服器知道內容相當穩定 immutable敬上指令表示不需要重新驗證，因為你的內容就不會跟著改變immutable 指令為不支援由瀏覽器和中介快取或 Proxy 伺服器共用結合普遍通用的 max-age 指令相容性。public 回應指令表示回應可儲存在共用快取中。

Chrome 開發人員工具會顯示正式版 `Cache-Control` 要求 AI 模型時，Hugging Face 傳送的標頭。 (資料來源)

在用戶端快取 AI 模型

提供 AI 模型時，請務必明確快取。這可確保使用者重新載入後，隨時能存取模型資料應用程式

您可以運用幾種技術來達成此目標。在下列程式碼範例，假設每個模型檔案都儲存在名為 blob 的 Blob 物件在記憶體中

為了瞭解效能，每個程式碼範例都會加註 performance.mark()敬上和performance.measure() 方法。這些措施因裝置而異，無法一般化。

在 Chrome 開發人員工具中的「Applications」中 >**儲存空間**，查看使用圖表，其中包含索引資料庫、快取儲存空間和檔案系統。每個區隔都顯示會耗用 1354 MB 的資料，因此總計可達 4063 MB。

您可以選擇使用下列任一 API，在瀏覽器中快取 AI 模型： Cache API，來源 Private File System API，以及 IndexedDB API。一般來說，建議您使用 Cache API，本指南說明所有選項。

快取 API

Cache API 提供了 Request 的永久儲存空間和 Response 物件可以在長效記憶體中快取配對組合雖然定義於 Service Worker 規格中您可以透過主執行緒或一般工作站使用這個 API。如何在戶外使用 Service Worker 內容 Cache.put() 方法並搭配使用合成的 Response 物件，並與合成網址配對，而非 Request 物件。

本指南假設記憶體內 blob。使用假網址做為快取金鑰，合成的 Response (根據 blob)。如果您想直接下載而是使用從建立 fetch() 時取得的 Response 請求。

以下舉例說明如何使用 Cache API 儲存及還原模型檔案。

const storeFileInSWCache = async (blob) => {
  try {
    performance.mark('start-sw-cache-cache');
    const modelCache = await caches.open('models');
    await modelCache.put('model.bin', new Response(blob));
    performance.mark('end-sw-cache-cache');

    const mark = performance.measure(
      'sw-cache-cache',
      'start-sw-cache-cache',
      'end-sw-cache-cache'
    );
    console.log('Model file cached in sw-cache.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromSWCache = async () => {
  try {
    performance.mark('start-sw-cache-restore');
    const modelCache = await caches.open('models');
    const response = await modelCache.match('model.bin');
    if (!response) {
      throw new Error(`File model.bin not found in sw-cache.`);
    }
    const file = await response.blob();
    performance.mark('end-sw-cache-restore');
    const mark = performance.measure(
      'sw-cache-restore',
      'start-sw-cache-restore',
      'end-sw-cache-restore'
    );
    console.log(mark.name, mark.duration.toFixed(2));
    console.log('Cached model file found in sw-cache.');
    return file;
  } catch (err) {    
    throw err;
  }
};

來源 Private File System API

來源私人檔案系統 (OPFS) 是相當年輕的標準，儲存空間端點這對網頁來源而言不會公開，因此也無法顯示與使用者一般檔案系統不同這樣就能均針對效能極佳，並提供寫入權限內容。

以下舉例說明如何在 OPFS 中儲存及還原模型檔案。

const storeFileInOPFS = async (blob) => {
  try {
    performance.mark('start-opfs-cache');
    const root = await navigator.storage.getDirectory();
    const handle = await root.getFileHandle('model.bin', { create: true });
    const writable = await handle.createWritable();
    await blob.stream().pipeTo(writable);
    performance.mark('end-opfs-cache');
    const mark = performance.measure(
      'opfs-cache',
      'start-opfs-cache',
      'end-opfs-cache'
    );
    console.log('Model file cached in OPFS.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromOPFS = async () => {
  try {
    performance.mark('start-opfs-restore');
    const root = await navigator.storage.getDirectory();
    const handle = await root.getFileHandle('model.bin');
    const file = await handle.getFile();
    performance.mark('end-opfs-restore');
    const mark = performance.measure(
      'opfs-restore',
      'start-opfs-restore',
      'end-opfs-restore'
    );
    console.log('Cached model file found in OPFS.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

IndexedDB API

IndexedDB 能持續以永久方式儲存任意資料。它以不太複雜的 API 聞名，但使用的是包裝函式程式庫，例如 idb-keyval 可以將 IndexedDB 視為傳統的鍵/值儲存庫。

例如：

import { get, set } from 'https://cdn.jsdelivr.net/npm/idb-keyval@latest/+esm';

const storeFileInIDB = async (blob) => {
  try {
    performance.mark('start-idb-cache');
    await set('model.bin', blob);
    performance.mark('end-idb-cache');
    const mark = performance.measure(
      'idb-cache',
      'start-idb-cache',
      'end-idb-cache'
    );
    console.log('Model file cached in IDB.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromIDB = async () => {
  try {
    performance.mark('start-idb-restore');
    const file = await get('model.bin');
    if (!file) {
      throw new Error('File model.bin not found in IDB.');
    }
    performance.mark('end-idb-restore');
    const mark = performance.measure(
      'idb-restore',
      'start-idb-restore',
      'end-idb-restore'
    );
    console.log('Cached model file found in IDB.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

將儲存空間標示為永久性

呼叫 navigator.storage.persist() 上述任一快取方法的結尾處，用來要求永久儲存空間這個方法會傳回會解析為 true 的承諾，權限，反之則授予 false。瀏覽器是否符合要求根據瀏覽器專屬的規則

if ('storage' in navigator && 'persist' in navigator.storage) {
  try {
    const persistent = await navigator.storage.persist();
    if (persistent) {
      console.log("Storage will not be cleared except by explicit user action.");
      return;
    }
    console.log("Storage may be cleared under storage pressure.");  
  } catch (err) {
    console.error(err.name, err.message);
  }
}

敬上

特殊情況：在硬碟上使用模型

您可以直接從使用者的硬碟參照 AI 模型來當做替代方案轉換至瀏覽器儲存空間這項技術可讓以研究為主的應用程式展示以便在瀏覽器中執行特定模型，或讓藝人能以及從專業的創意應用程式中自行訓練模型

File System Access API

有了 File System Access API 您可以從硬碟開啟檔案 FileSystemFileHandle 保存在 IndexedDB 中

透過這個模式，使用者只需要授予模型檔案的存取權一次。感謝你保留權限。使用者可以選擇永久授予檔案存取權。重新載入以及必要的使用者手勢例如滑鼠點選可以透過可存取檔案的 IndexedDB 還原FileSystemFileHandle 在硬碟上進行檢查

必要時，系統會查詢及要求檔案存取權限，進而以便日後重新載入以下範例說明如何取得處理硬碟中的檔案，然後儲存及還原控制代碼。

import { fileOpen } from 'https://cdn.jsdelivr.net/npm/browser-fs-access@latest/dist/index.modern.js';
import { get, set } from 'https://cdn.jsdelivr.net/npm/idb-keyval@latest/+esm';

button.addEventListener('click', async () => {
  try {
    const file = await fileOpen({
      extensions: ['.bin'],
      mimeTypes: ['application/octet-stream'],
      description: 'AI model files',
    });
    if (file.handle) {
      // It's an asynchronous method, but no need to await it.
      storeFileHandleInIDB(file.handle);
    }
    return file;
  } catch (err) {
    if (err.name !== 'AbortError') {
      console.error(err.name, err.message);
    }
  }
});

const storeFileHandleInIDB = async (handle) => {
  try {
    performance.mark('start-file-handle-cache');
    await set('model.bin.handle', handle);
    performance.mark('end-file-handle-cache');
    const mark = performance.measure(
      'file-handle-cache',
      'start-file-handle-cache',
      'end-file-handle-cache'
    );
    console.log('Model file handle cached in IDB.', mark.name, mark.duration.toFixed(2));
  } catch (err) {
    console.error(err.name, err.message);
  }
};

const restoreFileFromFileHandle = async () => {
  try {
    performance.mark('start-file-handle-restore');
    const handle = await get('model.bin.handle');
    if (!handle) {
      throw new Error('File handle model.bin.handle not found in IDB.');
    }
    if ((await handle.queryPermission()) !== 'granted') {
      const decision = await handle.requestPermission();
      if (decision === 'denied' || decision === 'prompt') {
        throw new Error(Access to file model.bin.handle not granted.');
      }
    }
    const file = await handle.getFile();
    performance.mark('end-file-handle-restore');
    const mark = performance.measure(
      'file-handle-restore',
      'start-file-handle-restore',
      'end-file-handle-restore'
    );
    console.log('Cached model file handle found in IDB.', mark.name, mark.duration.toFixed(2));
    return file;
  } catch (err) {    
    throw err;
  }
};

這些方法不會相互牴觸。但可能同時明確在瀏覽器中快取模型，並使用使用者硬碟中的模型。

示範

您可以看到全部三種一般的儲存方法和硬碟方法 MediaPipe LLM 示範中實作的技巧。

額外的好處：將大型檔案分塊下載至各個區塊

如需從網際網路下載大型 AI 模型，請平行處理下載至個別區塊，然後在用戶端再次拼接在一起。

以下是可在程式碼中使用的輔助函式。您只需通過但 url。chunkSize (預設：5 MB)、maxParallelRequests (預設值：6)、progressCallback 函式 (回報 downloadedBytes和總計 fileSize)，以及 signal AbortSignal信號皆為選用。

您可以將下列函式複製到專案中，或是從 npm 套件安裝 fetch-in-chunks 套件。

async function fetchInChunks(
  url,
  chunkSize = 5 * 1024 * 1024,
  maxParallelRequests = 6,
  progressCallback = null,
  signal = null
) {
  // Helper function to get the size of the remote file using a HEAD request
  async function getFileSize(url, signal) {
    const response = await fetch(url, { method: 'HEAD', signal });
    if (!response.ok) {
      throw new Error('Failed to fetch the file size');
    }
    const contentLength = response.headers.get('content-length');
    if (!contentLength) {
      throw new Error('Content-Length header is missing');
    }
    return parseInt(contentLength, 10);
  }

  // Helper function to fetch a chunk of the file
  async function fetchChunk(url, start, end, signal) {
    const response = await fetch(url, {
      headers: { Range: `bytes=${start}-${end}` },
      signal,
    });
    if (!response.ok && response.status !== 206) {
      throw new Error('Failed to fetch chunk');
    }
    return await response.arrayBuffer();
  }

  // Helper function to download chunks with parallelism
  async function downloadChunks(
    url,
    fileSize,
    chunkSize,
    maxParallelRequests,
    progressCallback,
    signal
  ) {
    let chunks = [];
    let queue = [];
    let start = 0;
    let downloadedBytes = 0;

    // Function to process the queue
    async function processQueue() {
      while (start < fileSize) {
        if (queue.length < maxParallelRequests) {
          let end = Math.min(start + chunkSize - 1, fileSize - 1);
          let promise = fetchChunk(url, start, end, signal)
            .then((chunk) => {
              chunks.push({ start, chunk });
              downloadedBytes += chunk.byteLength;

              // Update progress if callback is provided
              if (progressCallback) {
                progressCallback(downloadedBytes, fileSize);
              }

              // Remove this promise from the queue when it resolves
              queue = queue.filter((p) => p !== promise);
            })
            .catch((err) => {              
              throw err;              
            });
          queue.push(promise);
          start += chunkSize;
        }
        // Wait for at least one promise to resolve before continuing
        if (queue.length >= maxParallelRequests) {
          await Promise.race(queue);
        }
      }

      // Wait for all remaining promises to resolve
      await Promise.all(queue);
    }

    await processQueue();

    return chunks.sort((a, b) => a.start - b.start).map((chunk) => chunk.chunk);
  }

  // Get the file size
  const fileSize = await getFileSize(url, signal);

  // Download the file in chunks
  const chunks = await downloadChunks(
    url,
    fileSize,
    chunkSize,
    maxParallelRequests,
    progressCallback,
    signal
  );

  // Stitch the chunks together
  const blob = new Blob(chunks);

  return blob;
}

export default fetchInChunks;

選擇最適合您的方法

本指南介紹了各種能有效快取 AI 模型的方法，這對於提升使用者體驗時至關重要應用程式的效能Chrome 儲存空間團隊會推薦 Cache API，最佳效能，確保能快速存取 AI 模型並縮短載入時間並改善回應速度

而 OPFS 和 IndexedDB 較不實用。OPFS 和 IndexedDB API 您必須先將資料序列化，才能儲存資料。索引資料庫也需要擷取資料時進行去序列化，使資料最不容易儲存以及大型模型

針對小眾應用程式，File System Access API 可直接存取檔案使用者的裝置，最適合自行管理 AI 模型的使用者。

如要保護 AI 模型，請將模型保留在伺服器上。儲存在那麼從 Cache 和 IndexedDB 擷取資料的開發人員工具或 OFPS 開發人員工具擴充功能。在這些儲存 API 的安全性方面，這些儲存 API 本質上的差異。系統仍可能想嘗試會儲存模型的加密版本不過您需要取得解密資料鍵到用戶端，可能會遭到攔截。這代表不肖分子竊取模型相對困難，但不不可能

建議您選擇符合應用程式規範的快取策略需求、目標對象行為，以及 AI 模型的特性這可確保應用程式能在網路狀況和系統限制

特別銘謝

評論者：Joshua Bell、Reilly Grant、Evan Stade、Nathan Memmott Austin Sullivan、Etienne Noël、André Bandarra、Alexandra Klepper François Beaufort、Paul Kinlan 和 Rachel Andrew。