Teaching machines to understand Arabic - assalamu alaikum
Assalamu alaikum - sharing some thoughts on getting AI to work well with Arabic.
As developers across the Arab world try to standardize Arabic for AI - dealing with its many dialects, limited datasets, and cultural nuance - English-first AI systems have kept pulling ahead. Experts now say it’s time for Arabic speakers to catch up and get the same tech benefits.
The biggest gap shows up in speech recognition, where pronunciation, rhythm, and vocabulary change a lot between dialects. That makes it hard for one model to reliably understand spoken Arabic everywhere.
Still, progress is picking up. With more investment and government-backed projects, especially from Saudi Arabia and neighboring states, Arabic AI is getting closer to English in both sophistication and accessibility.
Amsal Kapetanovic, head of KSA at Infobip, pointed out that while written tasks like simple chatbots can be handled with extra work, speech recognition really highlights where today’s models fall short. It needs more fine-tuning and region-specific adaptation to handle the diversity of spoken Arabic well.
Infobip’s work with telecom and private partners across the Gulf shows a common story: Arabic virtual assistants often need more hands-on training at first than English ones. But once models are retrained with local conversational data and Gulf dialects, accuracy and customer satisfaction improve a lot.
Arabic remains one of AI’s toughest language challenges. Unlike English, it’s not a single unified language but a family of dialects from Asia to Africa. Its complex word forms, gender and number agreement, and missing short-vowel diacritics make tokenization and model training harder.
Kapetanovic cited a 2025 study that found Arabic models still lag behind English by about 10–20% on complicated tasks. He said the gap is mostly because Arabic training datasets are smaller and the dialect diversity is greater. Still, he’s optimistic because of growing regional investments and initiatives like Vision 2030 that push localization for Arabic speakers.
Speech recognition is the most visible gap: a Lebanese speaker and a Saudi speaker might use different words and speeds, so one model can struggle to handle both accurately. Localization, he adds, goes beyond translation - it’s about adapting features, workflows, and channels commonly used in the region.
Real examples are already emerging. For instance, some companies have launched chat services that support right-to-left text and Arabic stop-word recognition and are retrained on Gulf expressions, which improves understanding and makes services feel more natural for users here. Partnerships with local tech firms and support for regional payment methods and business processes help too.
Kapetanovic warned about the ethical side: if AI neglects Arabic, it risks being biased and exclusionary. If systems don’t cover certain dialects or lack regional data, they can leave out parts of the story or reinforce disparities in services and access.
The takeaway: with cultural understanding, targeted datasets, and continued investment, Arabic AI can close the gap. May we see tools that serve our communities well and inclusively - in sha Allah.
https://www.arabnews.com/node/