The Voice Stack is improving rapidly. Systems that interact with users via speaking and listening will drive many new applications. Over the past year, I’ve been working closely with DeepLearning.AI, AI Fund, and several collaborators on voice-based applications, and I will share best practices I’ve learned in this and future posts. Foundation models that are trained to directly input, and often also directly generate, audio have contributed to this growth, but they are only part of the story. OpenAI’s RealTime API makes it easy for developers to write prompts to develop systems that deliver voice-in, voice-out experiences. This is great for building quick-and-dirty prototypes, and it also works well for low-stakes conversations where making an occasional mistake is okay. I encourage you to try it! However, compared to text-based generation, it is still hard to control the output of voice-in voice-out models. In contrast to directly generating audio, when we use an LLM to generate text, we have many tools for building guardrails, and we can double-check the output before showing it to users. We can also use sophisticated agentic reasoning workflows to compute high-quality outputs. Before a customer-service agent shows a user the message, “Sure, I’m happy to issue a refund,” we can make sure that (i) issuing the refund is consistent with our business policy and (ii) we will call the API to issue the refund (and not just promise a refund without issuing it). In contrast, the tools to prevent a voice-in, voice-out model from making such mistakes are much less mature. In my experience, the reasoning capability of voice models also seems inferior to text-based models, and they give less sophisticated answers. (Perhaps this is because voice responses have to be more brief, leaving less room for chain-of-thought reasoning to get to a more thoughtful answer.) When building applications where I need a more control over the output, I use agentic workflows to reason at length about the user’s input. In voice applications, this means I end up using a pipeline that includes speech-to-text (STT) to transcribe the user’s words, then processes the text using one or more LLM calls, and finally returns an audio response to the user via TTS (text-to-speech). This, where the reasoning is done in text, allows for more accurate responses. However, this process introduces latency, and users of voice applications are very sensitive to latency. When DeepLearning.AI worked with RealAvatar (an AI Fund portfolio company led by Jeff Daniel) to build an avatar of me, we found that getting TTS to generate a voice that sounded like me was not very hard, but getting it to respond to questions using words similar to those I would choose was. Even after much tuning, it remains a work in progress. You can play with it at https://lnkd.in/gcZ66yGM [At length limit. Full text, including latency reduction technique: https://lnkd.in/gjzjiVwx ]
Voice Search Optimization
Explore top LinkedIn content from expert professionals.
-
-
Voice is the next frontier for AI Agents, but most builders struggle to navigate this rapidly evolving ecosystem. After seeing the challenges firsthand, I've created a comprehensive guide to building voice agents in 2024. Three key developments are accelerating this revolution: -> Speech-native models - OpenAI's 60% price cut on their Realtime API last week and Google's Gemini 2.0 Realtime release mark a shift from clunky cascading architectures to fluid, natural interactions -> Reduced complexity - small teams are now building specialized voice agents reaching substantial ARR - from restaurant order-taking to sales qualification -> Mature infrastructure - new developer platforms handle the hard parts (latency, error handling, conversation management), letting builders focus on unique experiences For the first time, we have god-like AI systems that truly converse like humans. For builders, this moment is huge. Unlike web or mobile development, voice AI is still being defined—offering fertile ground for those who understand both the technical stack and real-world use cases. With voice agents that can be interrupted and can handle emotional context, we’re leaving behind the era of rule-based, rigid experiences and ushering in a future where AI feels truly conversational. This toolkit breaks down: -> Foundation layers (speech-to-text, text-to-speech) -> Voice AI middleware (speech-to-speech models, agent frameworks) -> End-to-end platforms -> Evaluation tools and best practices Plus, a detailed framework for choosing between full-stack platforms vs. custom builds based on your latency, cost, and control requirements. Post with the full list of packages and tools as well as my framework for choosing your voice agent architecture https://lnkd.in/g9ebbfX3 Also available as a NotebookLM-powered podcast episode. Go build. P.S. I plan to publish concrete guides so follow here and subscribe to my newsletter.
-
🚨 Attention marketers: ChatGPT will eventually introduce ads. Not an if, but a when. ChatGPT has changed how people search, research and buy. *When* ads are introduced, it’ll reshape discovery for everything from consultants to coaching to candles. The most likely formats? Conversational slots that suggest products or services based on user intent. So what should marketers do? Here’s how this shifts things: → Ad slots will likely live *inside* answers, not beside them. This means your copy must blend into a conversational, helpful tone. Traditional CTA-heavy or benefit-stacked messaging won’t slot in naturally. You're writing for both a reader and an AI parser. → Discovery is intent-led, not keyword-led. Unlike search ads, where you bid on keywords, ChatGPT may serve ads based on user intent patterns (“I’m overwhelmed by budgeting”) rather than exact phrasing. Your content needs to map to user needs more than search terms. → Ad targeting may lean on inferred personas. I could see OpenAI introducing persona-based targeting (e.g. "founder planning a launch"). Your content strategy needs to speak to life moments, not just product features. → Start now by treating your content like conversation. If someone asked, “What’s the best [your product category] for someone like me?”, would your answer sound natural, trustworthy and brief? Your copy needs to sound natural.. Your brand voice has to work *through* AI, not around it If you're preparing for ChatGPT ads the same way you prepare for Google or Meta, you're not preparing enough. That’s the future ad slot: not a banner, but a helpful sentence in the right moment. Write like you belong there.
-
🤖 Your dashboard tracks keywords and backlinks—but not the answers that buyers read first. ChatGPT now summarises categories, recommends vendors, and shapes buying criteria before anyone sees a results page. Ignore that, and you’re invisible where decisions begin. Think of each AI answer as a micro-PR hit: “cited” = discovered; “uncited” = non-existent. Enterprise deals tilt when an LLM’s first paragraph crowns one vendor a “leader” and relegates the rest to footnotes. Try this quick audit on your flagship product: - Ask ChatGPT, Gemini, and Perplexity how they describe it. - Note which competitors show up above, beside, or not at all. - Compare their narrative to yours. If the gap makes you cringe—good. That discomfort is your next growth roadmap. Integrate AI-answer visibility into your weekly scorecard and treat it like any pipeline KPI. The moment you can see it, you can optimise it; until then, you’re driving with one eye closed. https://lnkd.in/eV-hTzin #AI #Search #Marketing
-
OpenAI just confirmed ads are coming to ChatGPT. Not surprising, but here's what caught my attention. The format: sponsored listings at the bottom of answers, contextual to the conversation you're having. Think about what that means. The ad unit is the dialogue. Someone isn't casually browsing or typing keywords into a search bar. They're deep in conversation with an AI, synthesizing information, weighing options. The consideration phase is already happening. So here's the problem: you're going to send that person to a static landing page built for Google keyword traffic? That's like someone asking you a detailed, thoughtful question and you responding by handing them a brochure. It doesn't just fall flat, it breaks trust. This isn't a creative problem or a targeting problem. It's an infrastructure problem. The brands that figure this out will build experiences that continue the conversation, not restart it from scratch. And the time to build that infrastructure isn't when the traffic shows up. It's now.
-
Cartesia Sonic-3 is the first AI voice model I’ve seen that nails Hindi perfectly. For years, even the best text-to-speech (TTS) models struggled with Hindi. The rhythm, tonality, and emotional micro-expressions just didn’t sound human and the accent was inaccurate. This model doesn’t just translate Hindi. It is specially trained for it, with precise control over pacing, expressions and tonality, all rendered in real time. Under the hood, Sonic-3 is engineered for low-latency voice generation optimized for conversational AI agents, clocking in 3–5x faster than OpenAI’s TTS while maintaining superior transcript fidelity. What makes it stand out technically: → 𝗚𝗿𝗮𝗻𝘂𝗹𝗮𝗿 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 𝘁𝗮𝗴𝘀 let developers dynamically modulate speed, volume, and emotion inside the transcript itself. ("Can you repeat that slower?" now works in production.) → 𝟰𝟮-𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗺𝗼𝗱𝗲𝗹 built on a single unified speaker embedding, so one voice can switch between languages like Hindi, Tamil, and English natively while maintaining accent continuity. → 𝟯-𝘀𝗲𝗰𝗼𝗻𝗱 𝘃𝗼𝗶𝗰𝗲 𝗰𝗹𝗼𝗻𝗶𝗻𝗴 powered by a low-sample adaptive cloning pipeline that enables instant personalization at scale. → 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝘀𝘁𝗮𝗰𝗸 achieving sub-300 ms end-to-end latency at p90, tuned for live interactions like support agents, NPCs, and healthcare assistants. → 𝗙𝗶𝗻𝗲-𝗴𝗿𝗮𝗶𝗻𝗲𝗱 𝘁𝗿𝗮𝗻𝘀𝗰𝗿𝗶𝗽𝘁 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 that handles heteronyms, acronyms, and structured text (emails, IDs, phone numbers) which usually break realism in production systems. 🎧 Here is example of me trying Sonic-3’s Hindi. You have to hear it to believe it. If you’re building voice agents, conversational AI, or multimodal assistants, keep an eye on Cartesia. They’ve raised $100M to build the most human-sounding voice models in the world, and Sonic-3 just set a new benchmark for multilingual voice AI. #CartesiaPartner
-
The most interesting shift in advertising right now is the shift from “Here’s an ad” to “Here’s help”, thanks to agentic AI. Microsoft Advertising’s latest updates shared by CVP Kya Sainsbury-Carter point to a future where ads behave more like helpful nudges than interruptions. Think: - A travel ad going beyond showing you a beach to booking your flight. - A B2B campaign going beyond pitching a product to answering your procurement questions. - A retail banner going beyond promoting a sale to remembering your size, your style, and your last return. Microsoft’s pivot away from traditional DSPs signals a belief that the next era of advertising will be built with AI agents that act, adapt, and assist. Forget about being “personalized.” That's so yesterday. Now, it’s about being useful in the moment. Conversational. Context-aware. Capable of doing something, not just saying something. That’s a big leap. And it’s going to change how we brief, how we measure, and how we build. If you’re in marketing or media, here’s one small way to start preparing: - Pick one of your current campaigns. - Now reframe the creative brief from a message to deliver to a task to help someone complete. - What changes? Make note. This can become a reality. If you want to go deeper, read Kya's post here: https://lnkd.in/dveqDneU and follow leaders actually building this such as Paul Longo, Tim Frank, and Pedro Bojikian. #hicm #AI #AIinAdvertising #AgenticAI
-
The search engine is dead ☠️ Blue links ? Game over 👾 Welcome to the age of answer engines. Just finished two inspiring days at #BrightonSEO in Brighton. Great talks by the sea, solid conversations with fellow marketers, and a lot to unpack about where digital discovery (discovery of your business or brand), is headed. Jon Earnshaw’s session on “The Age of Conversation” was a standout for me. His clear, engaging take on AI turning searches into ongoing chats really opened my eyes. It paired well with Ray Grieselhuber’s excellent and more technical “It’s ALL Ai Search” and Michael King’s forward-looking close, leaving everyone thinking about how to adapt. More on Mikes’ keynote later. Key takeaways? We’re moving from quick Google searches to deeper, back-and-forth conversations with ChatGPT and co. As Jon put it, it’s like judging a marathon by the first 100 meters -don’t get hung up on that initial result. Instead, focus on staying relevant through the whole discussion, which often lasts minutes and involves follow-up questions. For CEOs, founders, and marketers here’s what this shift means for your marketing, and simple ways to make sure your brand shows up (and sticks around) as people turn away from traditional search: 1) Focus on People’s Real Questions, Not Exact Words: We aren’t typing short keywords anymore; we’re asking longer, personal questions like “What are the best winter jackets under $300 that work for both office and weekends?” Build helpful guides or videos that cover the natural next steps in those chats. This keeps your brand in the flow, building trust and guiding people toward your products without heavy ad spending. 2) Plan for the Whole Story, Not Just the Start: What you see in an AI summary isn’t always what your customers get, the tool remembers earlier parts of the conversation and tailors deeper answers. Create content that’s easy to build on, like short videos or step-by-step stories showing real experiences. Track how long your brand stays in these discussions to spot where you might fade out, then fill those gaps to become the reliable voice. 3) Mix Human Touch with AI Tools Smartly: AI can help generate ideas, but add your unique stories and customer insights to stand out from generic answers. Shift some budget from search ads to creating ongoing connections, like emails or community posts( ie Reddit). Brands doing this early are seeing stronger customer loyalty, even when clicks are harder to come by. This change is BIG, like past shifts from desktop to mobile, but it’s an opportunity to connect more meaningfully. As a digital marketer with years helping growing companies navigate these turns, I love building straightforward plans to keep your brand front and centre. If you’re a CEO, founder or marketer mapping out next year and want to talk through it (no jargon, just clear steps), drop me a DM or let’s grab a coffee ☕️ #DigitalMarketing #AI #MarketingStrategy #BrightonSEO
-
Building AI is easy. Running it when OpenAI goes down is the real test. That line perfectly framed a tech meetup at Meesho, Bangalore — where the conversation wasn’t about demos, but about operating AI in the real world. Three talks. One clear theme: production reality beats lab intelligence. 1) AI at Billion Scale – Portkey | Ayush Garg (Co-founder) At scale, AI stops being a model problem and becomes a systems problem. Key lessons: - Resilience > intelligence - Stability > sophistication - Abstraction layers are unavoidable at scale - Security belongs at the gateway, not in apps Hard truths: - Blind retries = 2x cost, 3x latency - FINOPS asks: Which team consumed these tokens? - CISOs ask: Can you audit every request? - Business asks: Why did AI costs spike 4x? Observability. Accountability. Audit trails. Guardrails. At billion #scale, AI is infra first, models second. 2) Agents for #Agents – Anubhav Singh (AI Engineer @ Weights & Biases) Agents are powerful. Agents without evals are dangerous. What mattered: Agent evals using role assignment + tools LLM-based tracing with Weave from Weights & Biases. Metrics that actually reflect reality: • Tokens consumed • TTFT + TPOT • Call-level traces Standout demo: An optimizer agent improving response quality. Polite refusal trait improved from 20% → 60%. Not prompt engineering. Measured, observable improvement. 3) #Voice AI at Scale – Anuj Goel (Tech Leader at Meesho & VoiceBot) Voice AI is where all constraints collide: latency, language, and real users. The stack: - Telephony layer - VoiceBot pipeline: ASR → VCA (intent detection) → TTS - Fine-tuned LLMs to bridge human–agent conversation gaps - Models are a mix of internal and external nature Scale & performance: - 2 lakh calls/day on average, peaks at 5–10 lakh - p90 latency ~1 second - 3 Indic languages supported (Tamil, Telugu, Malayalam) + English & Hindi Key engineering choices: - Cross-region fallbacks & dynamic routing - Circuit breakers & vendor localisation - Real-time escalation using parameters like empathy & coherence - LLM streaming architecture to cut latency - Warmed-up WebSockets for faster starts - Metadata (order IDs, details) cached deliberately - Conversation phrasing intentionally not cached to mimic the natural flow of human agent behavior This wasn’t a chatbot story. This was distributed systems engineering with humans in the loop. Big takeaway from the morning: If you can’t trace it, audit it, explain its latency, and justify its cost — it’s not production AI. AI isn’t scaling because models got smarter. It’s scaling because engineering discipline finally caught up. Respect to Meesho and Nihal Kashinath & #DeepTechStars for hosting AI conversations that actually matter. Thank you for having me! #AbhiWritesAI #AIatScale
-
+2
-
Mark Zuckerberg just outlined a future where Meta's AI handles everything from creative generation to campaign optimization to purchase decisions. His vision: businesses connect their bank accounts, state their objectives, and "read the results we spit out." The technical architecture he's describing would fundamentally reshape how advertising technology works. But there's a critical flaw in this approach that creates an opportunity for the next generation of advertising infrastructure. The trust problem isn't just about measurement transparency—though agency executives are rightfully skeptical of platforms "checking their own homework." The deeper issue is institutional knowledge transfer and real-time brand governance. Enterprise brands have decades of learned context about what works, what doesn't, and what could damage their reputation. This isn't just about brand safety filters. It's about nuanced understanding of seasonal messaging, competitive positioning, cultural sensitivities, and customer journey orchestration that can't be reverse-engineered from campaign performance data alone. If AI truly automates the entire advertising stack, brands will need their own AI agents—not just dashboards or approval workflows, but intelligent systems that can negotiate with vendor AI in real-time. Think of it as API-level conversation between two AI systems where the brand's AI has veto power over creative decisions, placement choices, and budget allocation. This creates fascinating technical challenges: How do you architect AI-to-AI communication protocols that maintain brand governance while enabling real-time optimization? How do you build systems that can incorporate institutional knowledge without exposing competitive advantages to vendor platforms? We're talking about building advertising technology that functions more like autonomous diplomatic negotiation than traditional campaign management. For platform companies pushing toward full automation, the question becomes whether they're building systems that enterprise clients can actually trust with their brands and budgets. For independent technology builders, there's an opportunity to create the middleware that makes AI-powered advertising actually viable for sophisticated marketers. The future of advertising isn't just about better algorithms—it's about building trust architectures that let those algorithms work together.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development