Breaking Language Barriers: Apple's New Approach to Enhancing AI's Multilingual Capabilities
Summary
Issue: Non-native English speakers often find that Large Language Models (LLMs) perform much better in English than in their native languages. This disparity can lead to subtle or significant differences, and in some cases, even dangerous bypasses of safety filters, as highlighted by a 2023 Carnegie Mellon study.
Apple's Study: Apple has co-authored a study with researchers from Inria Paris, École Polytechnique, and Sapienza University of Rome to address the English-centric bias in LLMs. The study introduces two new metrics—Lexical Naturalness and Syntactic Naturalness—to evaluate how well models generate text that matches native speaker vocabulary and grammar.
Findings: Even Chinese-developed models like Qwen underperform in both non-English and English languages, while Meta’s Llama 3.1 is the most natural overall but still lags behind human-level output.
Apple's Solution: Apple proposes a method to train models to prefer more natural-sounding outputs by using back-translation to generate negative examples (unnatural patterns). By training the model to distinguish between natural and unnatural responses, they significantly improved vocabulary choice and grammar without degrading overall performance.