Apple's AI study exposes flaws in LLM-based models
A new study from Apple's AI researchers reveals that engines based on large language models (LLMs), such as those from Meta and OpenAI, lack basic reasoning skills. They have proposed a new benchmark called GSM-Symbolic to measure the reasoning capabilities of various LLMs. Initial testing has shown significant inconsistencies in their answers when the wording of questions is slightly changed. The study found that adding just one sentence with irrelevant information can reduce answer accuracy by up to 65%. According to Apple's researchers, there is no way to build reliable AI agents on a foundation where changing a word or two, or adding irrelevant information, leads to different answers. They concluded that LLM-based models exhibit no evidence of formal reasoning and their behavior is best explained as sophisticated pattern matching.