We taught LSTMs to run in parallel. Now they've grown to 7B parameters, and are ready to challenge Transformers. For years, we’ve assumed RNNs were doomed—inherently sequential, too slow to train, impossible to scale—and looked at Transformers as the go-to choice for Large Language Modelling. Turns out we just needed better math. Introducing 𝗣𝗮𝗿𝗮𝗥𝗡𝗡: 𝗨𝗻𝗹𝗼𝗰𝗸𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗼𝗳 𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 👉 [TL;DR] We can now train nonlinear RNNs at unprecedented scales, by parallelising what was previously considered inherently sequential—the unrolling of recurrent computations. If you care about fast inference for LLMs, or are into time-series analysis, we got good news for you: RNNs are back on the menu. 🐍 But wait, doesn’t Mamba parallelise this too? Sure, but here's the catch: Mamba requires state space updates to be linear, fundamentally affecting expressivity. We want the freedom to apply nonlinearities sequence-wise. 💡 Our approach: Recast the sequence of nonlinear recurrences as a system of equations, then solve them in parallel using Newton's method. As a bonus, make everything blazingly fast with custom CUDA kernels. ⚡ The result? Up to 665x speedup over naive sequential processing, and training times comparable to Mamba, even with the extra overhead from Newton’s iterations. 📈 So we took LSTM and GRU architectures—remember those from the pre-Transformer era?—scaled them to 7B parameters, and achieved perplexity comparable to similarly-sized Transformers. No architectural tricks. Just pure scale, finally unlocked. 🔥 Why this matters: Mamba challenged the Transformer’s monopoly. ParaRNN expands the search space of available architectures. It’s time to get back to the drawing board and use these tools to start designing the next generation of inference-efficient models. 💻 To aid with this, we’re releasing open-source code to parallelise RNN applications, out-of-the box. No need to bother implementing your own parallel scan, nor trying to remember how Newton works: just prescribe the recurrence relationship, flag eventual structures in your hidden state update, and watch GPUs go 𝘣𝘳𝘳𝘳𝘳𝘳𝘳𝘳. Paper: https://lnkd.in/dTEGh5Jp Code: https://lnkd.in/d_Ven9Y2 Collaborators: Pau Rodriguez Lopez, Miguel Sarabia, Xavier Suau, Luca Zappella --------------------------------------------- 💼 And if you're a PhD student interested in working on these topics, we got a fresh internship position just for you: https://lnkd.in/dDVSsfJj 𝗧𝗶𝗺𝗲 𝘁𝗼 𝗲𝘅𝗽𝗹𝗼𝗿𝗲 𝘄𝗵𝗮𝘁 𝘁𝗿𝘂𝗹𝘆 𝗻𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗰𝗮𝗻 𝗱𝗼 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲
Machine Learning Algorithms
Explore top LinkedIn content from expert professionals.
-
-
Fascinating Research Alert: Cross-Encoders Rediscover BM25 in a Semantic Way! I just read an incredible paper that reveals how neural ranking models actually work under the hood. Researchers from Brown University and University of Tuebingen have discovered that cross-encoder models (specifically a MiniLM variant) essentially implement a semantic version of the classic BM25 algorithm! >> The Key Discovery The researchers used mechanistic interpretability techniques to reverse-engineer how the cross-encoder computes relevance scores. They found that the model: - Uses "Matching Heads" in early transformer layers that compute soft term frequency while accounting for term saturation and document length - Stores inverse document frequency (IDF) information in a dominant low-rank vector of its embedding matrix - Employs "Contextual Query Representation Heads" in middle layers to distribute soft-TF information - Finally uses "Relevance Scoring Heads" in later layers to combine all these signals in a BM25-like computation >> Why This Matters This research bridges the gap between traditional IR and neural methods, showing that transformer-based models aren't just black boxes but actually rediscover fundamental IR principles in a more semantic way. The researchers validated their findings by creating a linear approximation of the cross-encoder's relevance computation that achieved an impressive 0.84 Pearson correlation with the model's actual scores. The paper also demonstrates how we can edit the model's IDF values to control term importance, opening possibilities for model editing, personalization, and bias mitigation. This work gives us a deeper understanding of neural IR models and could lead to more interpretable, controllable, and efficient ranking systems. Truly groundbreaking work at the intersection of information retrieval and mechanistic interpretability!
-
Your model is trained. But is it actually good? Most ML engineers default to accuracy. Then wonder why their model fails in production. Here are 20 evaluation metrics — and when to actually use each one: Classification: - Accuracy → Balanced datasets only. - Precision → When false positives are costly. - Recall → When false negatives matter more. - F1 Score → Imbalanced datasets. Balances both. - ROC-AUC → Binary classification evaluation. - Log Loss → Probabilistic models. Penalizes confident wrong predictions. - Confusion Matrix → Error analysis. See exactly where it breaks. - Specificity → When detecting negatives correctly matters. - Balanced Accuracy → Uneven datasets. Don't trust plain accuracy here. Regression: - MAE → Simple, interpretable error measurement. - MSE → Penalizes larger errors more heavily. - RMSE → Error in original scale. Most interpretable. - R² Score → How much variance your model explains. - Adjusted R² → Feature-heavy models. Adjusts for complexity. - MAPE → Business forecasting. Error as a percentage. - Explained Variance → Model consistency evaluation. Clustering: - Silhouette Score → Cluster cohesion and separation. Cluster validation. - Davies-Bouldin Index → Lower is better clustering. NLP: - BLEU Score → Machine translation quality. - ROUGE Score → Text summarization quality. Accuracy is not a strategy. Picking the right metric for the right problem is. A model that looks great on accuracy can destroy real-world outcomes when the wrong metric guided its evaluation. Save this. 📌 Which metric do most engineers misuse? 👇
-
Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. “Cheap” is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter “easy” requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your “I survived LLM prod” stories in the comments!
-
If you’re building with LLMs, these are 10 toolkits I highly recommend getting familiar with 👇 Whether you’re an engineer, researcher, PM, or infra lead, these tools are shaping how GenAI systems get built, debugged, fine-tuned, and scaled today. They form the core of production-grade AI, across RAG, agents, multimodal, evaluation, and more. → AI-Native IDEs (Cursor, JetBrains Junie, Copilot X) Modern IDEs now embed LLMs to accelerate coding, testing, and debugging. They go beyond autocomplete, understanding repo structure, generating unit tests, and optimizing workflows. → Multi-Agent Frameworks (CrewAI, AutoGen, LangGraph) Useful when one model isn’t enough. These frameworks let you build role-based agents (e.g. planner, retriever, coder) that collaborate and coordinate across complex tasks. → Inference Engines (Fireworks AI, vLLM, TGI) Designed for high-throughput, low-latency LLM serving. They handle open models, fine-tuned variants, and multimodal inputs, essential for scaling to production. → Data Frameworks for RAG (LlamaIndex, Haystack, RAGflow) Builds the bridge between your data and the LLM. These frameworks handle parsing, chunking, retrieval, and indexing to ground model outputs in enterprise knowledge. → Vector Databases (Pinecone, Weaviate, Qdrant, Chroma) Backbone of semantic search. They store embeddings and power retrieval in RAG, recommendations, and memory systems using fast nearest-neighbor algorithms. → Evaluation & Benchmarking (Fireworks AI Eval Protocol, Ragas, TruLens) Lets you test for accuracy, hallucinations, regressions, and preference alignment. Core to validating model behavior across prompts, versions, or fine-tuning runs. → Memory Systems (MEM-0, LangChain Memory, Milvus Hybrid) Enables agents to retain past interactions. Useful for building persistent assistants, session-aware tools, and long-term personalized workflows. → Agent Observability (LangSmith, HoneyHive, Arize AI Phoenix) Debugging LLM chains is non-trivial. These tools surface traces, logs, and step-by-step reasoning so you can inspect and iterate with confidence. → Fine-Tuning & Reward Stacks (PEFT, LoRA, Fireworks AI RLHF/RLVR) Supports adapting base models efficiently or aligning behavior using reward models. Great for domain tuning, personalization, and safety alignment. → Multimodal Toolkits (CLIP, BLIP-2, Florence-2, GPT-4o APIs) Text is just one modality. These toolkits let you build agents that understand images, audio, and video, enabling richer input/output capabilities. If you're deep in AI infra or systems, print this out, build a test project around each, and experiment with how they fit together. You’ll learn more in a weekend with these tools than from hours of reading docs. What’s one tool you’d add to this list? 👇 〰️〰️〰️ Follow me (Aishwarya Srinivasan) for more AI infrastructure insights, and subscribe to my newsletter for deeper technical breakdowns: 🔗 https://lnkd.in/dpBNr6Jg
-
This is how Aimpoint Digital built an AI agent system to generate personalised travel itineraries in under 30 seconds, saving hours of planning time. - Aimpoint Digital's system uses a multi-RAG architecture -- it has three parallel RAG systems to gather info quickly. Each system focuses on different aspects such as places, restaurants, and events to give detailed itinerary options. - They utilised Databricks' Vector Search service to help the system scale. The architecture currently supports data for 100s of cities, with an existing DB of ~500 restaurants in Paris, ready to expand. - To stay up-to-date, the system adds Delta tables with Change Data Feed. This updates the vector search indices automatically whenever there's a change in source data, keeping recommendations fresh and accurate. - The AI agent system runs on standalone Databricks Vector Search Endpoints for querying. This setup has provisioned throughput endpoints to serve LLM requests. - Evaluation metrics like precision, recall, and NDCG quantify the quality of data retrieval. The system also uses an LLM-as-judge to check output quality from aspects like professionalism, based on examples. Link to the article: https://lnkd.in/gFGvyTT9 #AI #RAG #GenAI
-
76% of AI teams pick embedding methods randomly! 🤯 (HostingAdvice, Aug 2025) Here's what costs you: mismatched methodologies lead to poor semantic understanding, wasted compute, and retrieval systems that can't find what users actually need. The 6 embedding methodologies aren't interchangeable: ✳️ Prediction-Based predicts context from words, best for NLP tasks requiring deep language understanding. ✳️ Matrix Factorisation decomposes statistical matrices into vectors, ideal for word similarity and clustering. ✳️ Subword Composition breaks words into n-grams, handles out-of-vocabulary words that break other methods. ✳️ Contextual Encoding turns entire paragraphs into vectors, powering semantic search, QA systems, and compliance tools. ✳️ Convolution & Attention extracts dense features and dominates image and text recognition tasks. ✳️ Contrastive Learning aligns cross-modal data and enables reasoning across completely different domains. Same goal, totally different mechanics. One should select based on the data structure and use case, rather than popularity. Now you tell me: how many of these did you know already? P.S. Follow me, Bhavishya Pandit, for weekly AI architecture insights that cut through the noise 🔥 #VectorEmbeddings #MachineLearning #AIEngineering #NLP #ComputerVision #GenAI
-
"Can language models (LMs) learn to faithfully describe their internal computations? Are they better able to describe themselves than other models? We study the extent to which LMs’ privileged access to their own internals can be leveraged to produce new techniques for explaining their behavior. Using existing interpretability techniques as a source of ground truth, we fine-tune LMs to generate natural language descriptions of (1) the information encoded by LM features, (2) the causal structure of LMs’ internal activations, and (3) the influence of specific input tokens on LM outputs. When trained with only tens of thousands of example explanations, explainer models exhibit non-trivial generalization to new queries. This generalization appears partly attributable to explainer models’ privileged access to their own internals: using a model to explain its own computations generally works better than using a different model to explain its computations (even if the other model is significantly more capable). Our results suggest not only that LMs can learn to reliably explain their internal computations, but that such explanations offer a scalable complement to existing interpretability methods." Belinda Lin, Carl Guo, Vincent Huang, Jacob Steinhardt, Jacob Andreas, and Transluce
-
“99% Accuracy” from an ML model is a lie. 🚨 If you are building a fraud detection model and 99% of your transactions are legitimate, a model that simply guesses "Legit" every single time will have 99% accuracy. But it captures 0% of the fraud. In the real world, "Accuracy" is rarely the best metric. If you want to move from a junior developer to a Senior ML Engineer, you need to understand the nuances of how we measure success. In this visual story lets get on ML evaluation journey. 🛑 Stop 1: Classification (The "Is it X or Y?" problems) • Precision: When you predict "Spam," how often are you right? (Crucial when false alarms are annoying). • Recall: Out of all the actual "Spam," how much did you find? (Crucial when missing a positive is dangerous). • F1 Score: The harmonic mean. It’s the peace treaty between Precision and Recall. 🛑 Stop 2: Regression (The "How much?" problems) • MAE (Mean Absolute Error): The average "oops." Great for generic error tracking (e.g., House prices off by $5k). • MSE (Mean Squared Error): This penalizes large errors heavily. Use this if being very wrong is much worse than being slightly wrong. • RMSE: Puts the error back into the same units as the target so you can actually explain it to your boss. 🛑 Stop 3: Clustering & Ranking • Silhouette Score: Are your customer segments actually distinct, or just a messy blob? • ROC-AUC: How well does the model separate classes? (e.g., Distinguishing Fraud vs. Not Fraud). Don't just optimize for the high score. Optimize for the business problem. Save this roadmap for your next model deployment! 💾 Like this? Share it and follow me Priyanka for more cloud and AI concepts. #MachineLearning #DataScience #AI #DeepLearning
-
Explainable AI strengthens accountability and integrity in automation by making algorithmic reasoning transparent, ensuring fair governance, detecting bias, supporting compliance, and nurturing trust that sustains responsible innovation. Organizations that aim to integrate AI responsibly face a common challenge: understanding how decisions are made by their systems. Without clarity, compliance becomes fragile and ethics remain theoretical. Explainable AI brings visibility into this process, translating complex model logic into a language that regulators, auditors, and executives can actually understand. Transparency is not a luxury. It is a structural requirement for building trust in automated decision-making. When models are explainable, teams can trace outcomes, identify hidden biases, and take timely corrective action before risk escalates. This level of insight also helps align technology with existing regulatory frameworks, from GDPR principles to sector-specific governance standards. Embedding explainability within AI governance frameworks creates a bridge between innovation and responsibility. It helps organizations evolve without compromising accountability, ensuring that progress remains both human-centered and sustainable. #ExplainableAI #EthicalAI #AIGovernance #Compliance #Trust
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development