India badly needs its own ElevenLabs, and it's a billion-dollar opportunity
Cultural nuances in building Indian voice AI
There is a fundamental difference in how Indians express emotions compared to the West. In certain Tamil communities, grief isn't a private affair. It's a communal healing performed through a ritual called 'Oppari.' Professional mourners, traditionally women, are invited to cry and sing with a chest-thumping performance. In contrast, many Western cultures, particularly those influenced by a Protestant work ethic, place immense pressure to "be strong" and internalize grief.
Western cultures operate on what we might call "transactional efficiency." The individual is the atomic unit. You can walk into a bank, buy a coffee, or close a business deal with minimal personal context. Trust flows through institutions - contracts, processes, brands. The system works because it's designed to be impersonal. Efficiency emerges from keeping the personal separate from the professional.
India operates differently. Here, the individual isn't an atom; they're a node in an interconnected web of relationships - family, community, school, hometown, caste. Trust doesn't flow through institutions alone; it flows through people. Therefore, before any meaningful interaction or transaction can occur, one must first establish context.
The conversation is the due diligence.
The questions that seem "intrusive" from a Western perspective are the data points for plotting a social map:
"Where are you from?" (Geography, potential common ground)
"What does your father do?" (Family background, social standing)
"Are you married? Do you have children?" (Life stage, stability, responsibilities)
"Which company do you work for? Which college did you attend?" (Professional/educational network)
The goal is to find a point of connection, a shared context, or a trusted third party that can anchor the relationship. The "extended conversation" is not inefficient; it is the work of building the necessary foundation of trust upon which everything else can be built.
Building Voice Agents for India
We often confuse efficiency with effectiveness. To build a voice agent that succeeds in India, we must abandon the Western model of transactional efficiency. Our guiding principle is this: the AI must be designed not as a tool, but as a participant in a relationship. The problem isn't technical - it's anthropological. Current AI systems fall into what I call the Cultural Uncanny Valley: they're human enough to trigger our social expectations, but alien enough to violate them.
Three types of conversation
To navigate this, let me architect AI's conversational capabilities into a three-tiered framework:
Utility Conversations: This governs the simple stuff: reminders, updates, confirmations. Here, brevity is king. Users want information, not intimacy. Get in, deliver value, get out. No small talk required.
Free Flowing Conversations: This operates in the realm of pure relationship. These are the conversations that serve no transactional purpose but fulfill a deeper human need-the digital equivalent of sitting on a porch, sharing silence with someone you trust. This is where AI companions, therapy bots, and digital astrologers live. The goal isn't efficiency; it's presence.
Persuasive Conversations: This sits in the profitable middle ground between Utility and Free Flowing. This is where the real money lives - every conversation that requires changing minds rather than simply providing information. Loan collections, insurance sales, complex purchases, understanding why someone abandoned their shopping cart. They require something more sophisticated: the ability to build just enough trust to accomplish a specific goal, then gracefully exit.
Most companies optimize for speed in simple transactions while fumbling the conversations that actually drive revenue. It's like perfecting your small talk while failing every job interview.
The companies that crack this code won't just build better voice agents - they'll unlock an entirely new category of digital relationship, one that respects both human psychology and business reality.
The Vocabulary of Persuasion
Most voice AI systems operating in India today are functionally illiterate in the language of human motivation.
They can handle the easy stuff - reminders, confirmations, basic transactions. But the moment you need to change someone's mind, to persuade rather than simply inform, the wheels fall off.
The problem starts with language itself. Current AI systems speak what I call "textbook Hindi"—technically correct but culturally hollow. They're trained on formal datasets and synthetic conversations that bear little resemblance to how real people actually talk. It's like learning English from a grammar book and then trying to charm someone at a bar.
But language is just the surface layer. The deeper challenge is cultural fluency - understanding not just what people say, but what their circumstances mean.
When a Maharashtrian farmer explains he can't make his EMI payment because of drought, he's not just conveying information. He's invoking a complex web of agricultural reality, seasonal dependency, and rural economics that requires genuine comprehension. An AI that responds with generic payment options instead of empathising and providing alternate solutions reveals itself as an outsider and loses all credibility.
Then there's the art of influence itself, which operates in realms that would make most programmers uncomfortable. Walk through any BPO center and you'll witness a masterclass in psychological calibration.
In a BPO center, you might observe someone selling insurance using slightly flirtatious language with men, saying things like "all good men should take care of their family" to boost their self-esteem and close the sale. Meanwhile, for loan collections, the opposite approach might be used - emphasizing shame and the repercussions of not paying. It's about understanding which emotional buttons to push, when to push them, and how hard. It requires an AI that can read between the lines of what people say to understand what they actually mean, then respond in the cultural dialect of persuasion that resonates with that specific person in that specific moment. The challenge isn't just to teach an AI to use shame as a lever, but to build the sophisticated ethical guardrails that know when such an approach is helpful versus when it becomes predatory.
Most AI systems today have some of the tools, but none of the nuanced understanding required for the delicate work of changing minds.
Patient Capital for an Impatient Nation
There's a pattern that repeats throughout history: the most transformative technologies don't just solve problems - they change how humans relate to each other. The telephone didn't just speed up communication; it rewired how families stayed connected. Television didn't just bring entertainment; it created shared cultural experiences across vast distances.
Voice AI in India sits at a similar inflection point, but with a twist that Silicon Valley hasn't fully grasped yet.
The technical challenges are solvable - more data, better models, smarter algorithms. But the real obstacle isn't computational; it's cultural. Most companies are guarding their customer conversations data to their life, sitting on treasure troves of linguistic gold while the ecosystem starves for the raw materials needed to build truly fluent AI.
This creates a paradox: the very data needed to solve the problem is locked away by companies afraid of losing competitive advantage. It's like trying to build roads while every landowner refuses to allow access to their property.
The solution requires thinking like infrastructure builders rather than product makers. Consider electricity: it took massive upfront investment to string power lines across entire continents before anyone could flip a switch and pay a small monthly fee. The early investors didn't see immediate returns, but they built the foundation upon which entire economies would eventually run.
Voice AI for India needs the same kind of patient capital and systems thinking. We're not just building chatbots; we're constructing the conversational infrastructure for 1.4 billion people across hundreds of dialects, thousands of cultural contexts, and millions of different ways of being human.
The companies that succeed won't be those trying to transplant Western efficiency onto Indian soil. They'll be the ones who understand that in a relationship-driven culture, AI isn't just about processing language - it's about earning the right to participate in the most intimate form of human connection: conversation itself.
An "ElevenLabs for India" is inevitable. But it won't be built by applying a Silicon Valley playbook to Indian problems. It will emerge from patient capital and cultural curiosity.
What we need to capitalize is hours of training data from call centre across states which were used for calling and training purposes can be used to train our own models , fueling up to make our own elevan labs !
We were building generative ai voice for different indian regional languages, but to keep running a startup like this need capital and indian investors didn’t wanted to invest much, so after our seed round and operating for 2 years we had to shutdown at the end.