Published: September 24, 2024
Before translating text from one language to another, you must first determine what language is used in the given text. Previously, this required uploading the text to a cloud service. With inference on-device, you can improve your privacy story. While it's possible to ship a specific library which does this, it would require additional resources to download.
The Language Detector and Translator API proposal aims to solve this challenge by fine-tuning a model to this task, with an API built-in to the browser.
Example use cases
The Language Detector API is primarily useful in the following scenarios:
- Determine the language of input text, so it can be translated.
- Determine the language of input text, so the correct model can be loaded for language-specific tasks, such as toxicity detection.
- Determine the language of input text, so it can be labeled correctly, for example, in online social networking sites.
- Determine the language of input text, so an app's interface can be adjusted accordingly. For example, on a Belgian site to only show the interface relevant to users who speak French.
Use the Language Detector API
The Language Detector API is part of the larger family of the Translator API. First, run feature detection to see if the browser supports the Language Detector API.
if ('translation' in self && 'canDetect' in self.translation) {
// The Language Detector API is available.
}
Model download
Language detection depends on a model that is fine-tuned for the specific task of detecting languages. While the API is built in the browser, the model is downloaded on-demand the first time a site tries to use the API. In Chrome, this model is very small by comparison with other models. In fact, it might already be present given that this model is also used by Chrome browser features.
To see if the model is ready to use, call the asynchronous
translation.canDetect()
function. There are three possible responses:
'no'
: The current browser supports the Language Detector API, but it can't be used at the moment. For example, because there isn't enough free disk space available to download the model.'readily'
: The current browser supports the Language Detector API, and it can be used right away.'after-download'
: The current browser supports the Language Detector API, but it needs to download the model first.
To trigger the download and instantiate the language detector, call the
asynchronous translation.createDetector()
function. If the response to
canDetect()
was 'after-download'
, it's best practice to listen for download
progress, so you can inform the user in case the download takes time.
The following example demonstrates how to initialize the language detector.
const canDetect = await translation.canDetect();
let detector;
if (canDetect === 'no') {
// The language detector isn't usable.
return;
}
if (canDetect === 'readily') {
// The language detector can immediately be used.
detector = await translation.createDetector();
} else {
// The language detector can be used after model download.
detector = await translation.createDetector();
detector.addEventListener('downloadprogress', (e) => {
console.log(e.loaded, e.total);
});
await detector.ready;
}
Run the language detector
The Language Detector API uses a ranking model to determine which language is most likely used in a given piece of text. Ranking is a type of machine learning, where the objective is to order a list of items. In this case, the Language Detector API ranks languages from highest to lowest probability.
The detect()
function can return either the first result, the likeliest
answer, or iterate over the ranked candidates with the level of confidence.
This is returned as a list of {detectedLanguage, confidence}
objects. The
confidence
level is expressed as a value between 0.0
(lowest confidence)
and 1.0
(highest confidence).
const someUserText \= 'Hallo und herzlich willkommen\!';
const results \= await detector.detect(someUserText);
for (const result of results) {
// Show the full list of potential languages with their likelihood, ranked
// from most likely to least likely. In practice, one would pick the top
// language(s) that cross a high enough threshold.
console.log(result.detectedLanguage, result.confidence);
}
// (Output truncated):
// de 0.9993835687637329
// en 0.00038279531872831285
// nl 0.00010798392031574622
// ...
Demo
Preview the Language Detector API in our demo. Enter text written in different languages in the textarea.
Sign up for the origin trial
Register for the Language Detector API trial to start testing this API with your users. This origin trial runs from Chrome 130 to 135.
Learn more about how origin trials work.
Standardization effort
The Language Detector API was moved to the W3C Web Incubator Community Group after the corresponding proposal received enough support. The API is part of a larger Translation API proposal. The Chrome team requested feedback from the W3C Technical Architecture Group and asked Mozilla and WebKit for the particular browser vendor's standards positions.
Share your feedback
If you have feedback on Chrome's implementation, file a Chromium bug. Share your feedback on the API shape of the Language Detector API by commenting on an existing or open a new Issue in the Translation API GitHub repository.