
When it comes to AI models, the spotlight is mostly on the US and China. India, despite its scale and deep talent pool, has rarely been seen as a source of core AI development. But Bengaluru-based startup Sarvam AI is changing that perception with what it calls a “sovereign AI”. The company is creating foundational AI models from scratch in India. This week two of its tools, Sarvam Vision and Bulbul, are making a lot of buzz. All for the right reasons.
Sarvam Vision is apparently beating bigger and more talked about AI models such as ChatGPT, Google Gemini and Anthropic Claude on certain benchmarks in optical character recognition (OCR), which is its area of expertise. Its performance is seemingly so good that it is winning praise from users and experts alike.
Sarvam AI co-founder Pratyush Kumar recently shared details of the latest achievements from the company’s in-house AI models in a series of posts on X. According to the company, Sarvam Vision has achieved an accuracy score of 84.3 percent on the olmOCR-Bench. The score is higher than Gemini 3 Pro and recent OCR models such as DeepSeek OCR v2, while ChatGPT ranked significantly lower.
In addition, Sarvam Vision has also scored well on OmniDocBench v1.5, a benchmark that tests how AI systems read and understand real-world documents. It scored 93.28 percent overall, with especially strong results on complex layouts, technical tables and mathematical formulas. These are the areas where traditional OCR systems often struggle because of messy formatting and dense content.
The performance of the AI tool has attracted global attention. Sarvam, which was earlier questioned for focusing on Indic-language models, is now seeing that scepticism turn into approval.
Tech commentator Deedy Das, who earlier questioned the value of building smaller Indic-language models, recently admitted that he had underestimated the company. In a post on X, Das said Sarvam’s OCR and speech models for Indian languages are strong and fill a gap that large global AI labs have largely ignored.
“I was wrong about Sarvam. When I wrote about them a year ago, I felt like the direction to train small Indic language models was wrong. But boy, have they turned it around,” he wrote. “They have the best text-to-speech, speech-to text, and OCR models for Indic languages, and that’s actually really valuable. The pricing is very reasonable.”
Praise has come from users as well. One user talked about their experience with Sarvam’s models and wrote, “I used this a couple of days ago! Oh man wow.”
Bulbul brings AI voice in Indic languages
In addition to OCR tool, Sarvam has also launched its new AI voice model called Bulbul V3. This one is a text-to-speech AI model that aims to generate audio using AI. In a way it is similar to AI tools offered by ElevenLabs, a company considered the best in this space.
“Today we’re releasing Bulbul V3, our most capable text-to-speech model designed to deliver natural, expressive and production-ready voices for Indian languages,” Sarvam noted in a blog post. “Bulbul V3 minimizes failure modes, delivering content-accurate, stable speech across the inputs that matter for India-specific use cases.”
Currently, the tool supports 35 plus voices across 11 Indian languages. The company says the plan is to expand the language support to a total of 22 languages.
Bulbul too is winning some praise. Pratik Desai, founder of KissanAI, wrote on X, “We use Bulbul as our go-to tts model for our Indic use cases, and they have just gotten better with each release. Meanwhile, ElevenLabs cost never made sense for Indic or any other languages.”
– Ends