LLM Security Management

Explore top LinkedIn content from expert professionals.

  • View profile for Steve Nouri

    The largest AI Community 14 Million Members | Advisor @ Fortune 500 | Keynote Speaker

    1,734,615 followers

    🧠 Stop building single-agent GenAI apps. That era is over. Most GenAI products today look like this: āž”ļø One prompt āž”ļø One model āž”ļø One output But when things break, here’s what you hear: -ā€œIt forgot context.ā€ -ā€œIt hallucinated.ā€ -ā€œIt’s too slow, too dumb, too fragile.ā€ That’s not the model’s fault. That’s the architecture’s fault. Let me explain šŸ‘‡ šŸ’„ What breaks in single-agent apps? -Context Overload: LLMs don’t need more information—they need relevant information. Dumping the entire history into one context window isn’t memory. It’s noise. -No Role Separation: A single agent trying to do research, analysis, reasoning, and response formatting? That’s like asking one employee to be your assistant, lawyer, analyst, and social media manager. -Zero Observability: There’s no traceability of why the model failed. No logs, no fallback logic, no task routing. šŸ” What works instead? Orchestration. Here’s how a real system works: āœ… Agents have roles āœ… Tasks are passed, not re-prompted āœ… Tools are securely invoked āœ… Humans can override any step āœ… Everything is observable, auditable, and retrainable You move from: 🧱 Prompt engineering → šŸ”— Protocol engineering Orchestration isn’t about complexity. It’s about coordination. āš™ļø Here’s what we’ve built to solve this: We open-sourced an orchestration protocol that lets you: šŸ“ Register any LLM or external agent 🧠 Assign tasks based on roles + memory šŸ”„ Coordinate flows with zero-code YAML or full API šŸ‘ļø Trace every interaction with observability tools šŸ‘„ Add human-in-the-loop interventions at any node No fancy wrappers. No black-box magic. Just a robust foundation for multi-agent GenAI systems. šŸ› ļø Example use cases we’ve deployed: - A pre-sales agent team (research + pricing + objection handler) - A co-pilot for onboarding new employees across tools - An internal policy bot that cites source docs, lets HR approve or reject in real-time - Legal summarizers that escalate to humans if confidence < 80% These aren’t PoCs. They’re production-grade AI systems. Want to see how it works? šŸ’”Comment "Agents" and I will send invitation to the closed pioneer group to access the code and tutorialsāœ”ļø #AgenticAI #GenAI #artificialintelligence #innovation

  • View profile for Rahul Agarwal

    Staff ML Engineer | Meta, Roku, Walmart | 1:1 @ topmate.io/MLwhiz

    45,162 followers

    Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. ā€œCheapā€ is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter ā€œeasyā€ requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your ā€œI survived LLM prodā€ stories in the comments!

  • View profile for Zain Hasan

    I build and teach AI | AI/ML @ Together AI | EngSci ā„•ĪØ/PhD @ UofT | Previously: Vector DBs, Data Scientist, Lecturer & Health Tech Founder | šŸ‡ŗšŸ‡øšŸ‡ØšŸ‡¦šŸ‡µšŸ‡°

    19,510 followers

    You don't need a 2 trillion parameter model to tell you the capital of France is Paris. Be smart and route between a panel of models according to query difficulty and model specialty! New paper proposes a framework to train a router that routes queries to the appropriate LLM to optimize the trade-off b/w cost vs. performance. Overview: Model inference cost varies significantly: Per one million output tokens: Llama-3-70b ($1) vs. GPT-4-0613 ($60), Haiku ($1.25) vs. Opus ($75) The RouteLLM paper propose a router training framework based on human preference data and augmentation techniques, demonstrating over 2x cost saving on widely used benchmarks. They define the problem as having to choose between two classes of models: (1) strong models - produce high quality responses but at a high cost (GPT-4o, Claude3.5) (2) weak models - relatively lower quality and lower cost (Mixtral8x7B, Llama3-8b) A good router requires a deep understanding of the question’s complexity as well as the strengths and weaknesses of the available LLMs. Explore different routing approaches: - Similarity-weighted (SW) ranking - Matrix factorization - BERT query classifier - Causal LLM query classifier Neat Ideas to Build From: - Users can collect a small amount of in-domain data to improve performance for their specific use cases via dataset augmentation. - Can expand this problem from routing between a strong and weak LLM to a multiclass model routing approach where we have specialist models(language vision model, function calling model etc.) - Larger framework controlled by a router - imagine a system of 15-20 tuned small models and the router as the n+1'th model responsible for picking the LLM that will handle a particular query at inference time. - MoA architectures: Routing to different architectures of a Mixture of Agents would be a cool idea as well. Depending on the query you decide how many proposers there should be, how many layers in the mixture, what the aggregate models should be etc. - Route based caching: If you get redundant queries that are slightly different then route the query+previous answer to a small model to light rewriting instead of regenerating the answer

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    626,057 followers

    If you’re building anything with LLMs, your system architecture matters more than your prompts. Most people stop at ā€œcall the model, get the output.ā€ But LLM-native systems need workflows, blueprints that define how multiple LLM calls interact, how routing, evaluation, memory, tools, or chaining come into play. Here’s a breakdown of 6 core LLM workflows I see in production: 🧠 LLM Augmentation Classic RAG + tools setup. The model augments its own capabilities using: → Retrieval (e.g., from vector DBs) → Tool use (e.g., calculators, APIs) → Memory (short-term or long-term context) šŸ”— Prompt Chaining Workflow Sequential reasoning across steps. Each output is validated (pass/fail) → passed to the next model. Great for multi-stage tasks like reasoning, summarizing, translating, and evaluating. šŸ›£ LLM Routing Workflow Input routed to different models (or prompts) based on the type of task. Example: classification → Q&A → summarization all handled by different call paths. šŸ“Š LLM Parallelization Workflow (Aggregator) Run multiple models/tasks in parallel → aggregate the outputs. Useful for ensembling or sourcing multiple perspectives. šŸŽ¼ LLM Parallelization Workflow (Synthesizer) A more orchestrated version with a control layer. Think: multi-agent systems with a conductor + synthesizer to harmonize responses. 🧪 Evaluator–Optimizer Workflow The most underrated architecture. One LLM generates. Another evaluates (pass/fail + feedback). This loop continues until quality thresholds are met. If you’re an AI engineer, don’t just build for single-shot inference. Design workflows that scale, self-correct, and adapt. šŸ“Œ Save this visual for your next project architecture review. ć€°ļøć€°ļøć€°ļø Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg

  • View profile for Martin Zwick

    Lawyer | AIGP | CIPP/E | CIPT | FIP | GDDcert.EU | DHL Express Germany | IAPP Advisory Board Member

    20,289 followers

    AI agents are not yet safe for unsupervised use in enterprise environments The German Federal Office for Information Security (BSI) and France’s ANSSI have just released updated guidance on the secure integration of Large Language Models (LLMs). Their key message?Ā Fully autonomous AI systems without human oversight are a security risk and should be avoided. As LLMs evolve into agentic systems capable of autonomous decision-making, the risks grow exponentially. FromĀ Prompt Injection attacksĀ toĀ unauthorized data access, the threats are real and increasingly sophisticated. The updated framework introducesĀ Zero Trust principlesĀ tailored for LLMs: 1) No implicit trust: every interaction must be verified. 2) Strict authentication & least privilege access – even internal components must earn their permissions. 3) Continuous monitoring – not just outputs, but inputs must be validated and sanitized. 4) Sandboxing & session isolation – to prevent cross-session data leaks and persistent attacks. 5) Human-in-the-loop, i.e., critical decisions must remain under human control. Whether you're deploying chatbots, AI agents, or multimodal LLMs, this guidance is a must-read. It’s not just about compliance but about buildingĀ trustworthy AIĀ that respects privacy, integrity, and security. Bottom line:Ā AI agents are not yet safe for unsupervised use in enterprise environments.Ā If you're working with LLMs, it's time to rethink your architecture.

  • View profile for Matija Franklin

    Research Scientist

    4,180 followers

    Excited about our new paper: AI Agent Traps AI agents inherit every vulnerability of the LLMs they're built on - but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself. The web pages, emails, APIs, and databases agents interact with can all be weaponised against them. We introduce a taxonomy of six classes of adversarial threats - from prompt injections hidden in web pages to systemic attacks on multi-agent networks.  1. Content Injection Traps (Perception): What a human sees on a web page is not what an agent parses. Attackers can embed malicious instructions in HTML comments, hidden CSS, image metadata, or accessibility tags. These are invisible to users, but processed directly by the agent. 2. Semantic Manipulation Traps (Reasoning): These attacks corrupt how the agent thinks. Sentiment-laden or authoritative-sounding content skews synthesis and conclusions. LLMs are susceptible to the same framing effects and anchoring biases as humans - logically equivalent problems phrased differently produce systematically different outputs. 3. Cognitive State Traps (Memory & Learning): Persistent agents accumulate memory across sessions and that memory becomes an attack surface. Poisoning a handful of documents in a RAG knowledge base reliably manipulates outputs for targeted queries. 4. Behavioural Control Traps (Action): These traps hijack what the agent does. A single crafted email caused an agent to bypass safety classifiers and exfiltrate its entire privileged context. 5. Systemic Traps (Multi-Agent Dynamics): The most dangerous attacks may not target individual agents at all. A fabricated financial report could trigger synchronised sell-offs across trading agents - a digital flash crash. Compositional fragment traps distribute a payload across multiple benign-looking sources; each passes safety filters alone, but when agents aggregate them, the full attack reconstitutes. 6. Human-in-the-Loop Traps: The final class uses the agent as a vector to attack the human. A compromised agent can generate outputs that induce approval fatigue, present misleading but technical-sounding summaries, or exploit automation bias. These aren't theoretical. Every type of trap has documented proof-of-concept attacks. And the attack surface is combinatorial - traps can be chained, layered, or distributed across multi-agent systems. Authors: Nenad TomaŔev Joel Leibo Julian Jacobs Simon Osindero Read here: https://lnkd.in/eTTZsPNG

  • View profile for Peter Slattery, PhD

    MIT AI Risk Initiative | MIT FutureTech

    68,202 followers

    Isabel BarberĆ”: "This document provides practical guidance and tools for developers and users of Large Language Model (LLM) based systems to manage privacy risks associated with these technologies. The risk management methodology outlined in this document is designed to help developers and users systematically identify, assess, and mitigate privacy and data protection risks, supporting the responsible development and deployment of LLM systems. This guidance also supports the requirements of the GDPR Article 25 Data protection by design and by default and Article 32 Security of processing by offering technical and organizational measures to help ensure an appropriate level of security and data protection. However, the guidance is not intended to replace a Data Protection Impact Assessment (DPIA) as required under Article 35 of the GDPR. Instead, it complements the DPIA process by addressing privacy risks specific to LLM systems, thereby enhancing the robustness of such assessments. Guidance for Readers > For Developers: Use this guidance to integrate privacy risk management into the development lifecycle and deployment of your LLM based systems, from understanding data flows to how to implement risk identification and mitigation measures. > For Users: Refer to this document to evaluate the privacy risks associated with LLM systems you plan to deploy and use, helping you adopt responsible practices and protect individuals’ privacy. " >For Decision-makers: The structured methodology and use case examples will help you assess the compliance of LLM systems and make informed risk-based decision" European Data Protection Board

  • View profile for Ethan Goh, MD
    Ethan Goh, MD Ethan Goh, MD is an Influencer

    Executive Director, Stanford ARISE (AI Research and Science Evaluation) | Associate Editor, BMJ Digital Health & AI

    21,031 followers

    AI matched specialist responses on real-world consults at Stanford — but doctors couldn’t agree whether they preferred the AI or human response. šŸ“„ New peer-reviewed study (+ David JH Wu, Fateme (Fatima) Nateghi, Vishnu Ravi, MD, Saloni Maharaj, Stephen Ma, Jonathan H. Chen et al) ā€œAutomated Evaluation of Large Language Model Response Concordance with Human Specialist Responses on eConsult Casesā€ šŸ“Œ The study - 40 physician-to-physician eConsults used to test how well AI replies matched board-certified specialists (cardiology, endocrinology, ID, neurology, heme/onc, etc.) - Each case included real notes + labs - GPT-4.1 generated consult replies to actual clinical queries - Real human specialists’ replies were used for comparison Doctors graded AI outputs (ā€œconcordantā€ vs ā€œdiscordantā€). āš™ļø Evaluation methods 1ļøāƒ£ LLM-as-Judge (LaJ): LLM grader compared doctor and AI replies 2ļøāƒ£ Decompose-then-Verify (DtV): LLM broke replies into atomic facts (e.g., ā€œstart metforminā€) before LLM grader checked each fact for agreement. 3 internal medicine doctors reviewed all 40 cases to benchmark how reliably each automated method matched expert judgment. šŸ“Š Key findings 1ļøāƒ£ Ā AI’s replies were similar to actual specialist responses - GPT-4.1 answers were as consistent with human specialists as doctors were with each other - Īŗ = 0.75, F1 = 0.89 (AI–doctor concordance) - Inter-physician Īŗ = 0.69–0.90 (doctor–doctor concordance) 2ļøāƒ£ An AI ā€œjudgeā€ can rate AI responses with near-human expert reliability - Best evaluator: DeepSeek R1 (F1 0.89, Īŗ 0.75) - Gemini 2.5 Pro (F1 0.86, Īŗ 0.70) LLM-as-Judge outperformed DtV. But DtV was more "explainable" since could compare grading against atomic facts. 3ļøāƒ£ Doctors disagreed on whether they preferred the AI or human specialist’s consult - One preferred the AI in 82% of cases (clarity, organization). - Another preferred the human in 88% of cases (nuance, tone, and contextual awareness eg. insurance considerations). 🩺 Takeaways AI can produce consults that are concordant with how a real human specialist might respond. + LLMs can evaluate AI outputs with near-human expert reliability. šŸ“Œ Limitations - Small sample: 40 eConsults and 3 physician reviewers - Conducted at single health system (Stanford) - Evaluated concordance, not clinical accuracy šŸ“Œ Why this matters Human evaluation is a big (and costly) bottleneck in clinical AI. Results show automated evaluation can reach human expert level reliability, perhaps enabling scalable validation across real-time specialty consults. šŸ“… Free online Stanford BMIR Colloquium ā€œApplied Intelligence: Integrating AI Technologies Into Medical Educationā€ šŸŽ™ļø Laurah Turner, PhD šŸ“… Thursday, Oct 9 | 12–1 PM PT šŸ“ Tapao Hall (3180 Porter Dr) + Zoom Live Stream Dr. Turner's talk will explore when to trust AI autonomously, when human oversight is essential, and when to avoid AI entirely.

  • View profile for Prem Naraindas
    Prem Naraindas Prem Naraindas is an Influencer

    Founder & CEO at Katonic AI | Building The Operating System for Sovereign AI

    20,135 followers

    As an MLOps platform, we started by helping organizations implement responsible AI governance for traditional machine learning models. With principles of transparency, accountability, and oversight, our Guardrails enabled smooth model development. However, governing large language models (LLMs) like ChatGPT requires a fundamentally different approach. LLMs aren't narrow systems designed for specific tasks - they can generate nuanced text on virtually any topic imaginable. This presents a whole new set of challenges for governance. Here are some key components for evolving AI governance frameworks to effectively oversee large language models (LLMs): 1ļøāƒ£ Usage-Focused Governance: Focus governance efforts on real-world LLM usage - the workflows, inputs and outputs - rather than just the technical architecture. Continuously assess risks posed by different use cases. 2ļøāƒ£ Dynamic Risk Assessment: Identify unique risks presented by LLMs such as bias amplification and develop flexible frameworks to proactively address emerging issues. 3ļøāƒ£ Customized Integrations: Invest in tailored solutions to integrate complex LLMs with existing systems in alignment with governance goals. 4ļøāƒ£ Advanced Monitoring: Utilize state-of-the-art tools to monitor LLMs in real-time across metrics like outputs, bias indicators, misuse prevention, and more. 5ļøāƒ£ Continuous Accuracy Tracking: Implement ongoing processes to detect subtle accuracy drifts or inconsistencies in LLM outputs before they escalate. 6ļøāƒ£ Agile Oversight: Adopt agile, iterative governance processes to manage frequent LLM updates and retraining in line with the rapid evolution of models. 7ļøāƒ£ Enhanced Transparency: Incorporate methodologies to audit LLMs, trace outputs back to training data/prompts and pinpoint root causes of issues to enhance accountability. In conclusion, while the rise of LLMs has disrupted traditional governance models, we at Katonic AI are working hard to understand the nuances of LLM-centric governance and aim to provide effective solutions to assist organizations in harnessing the power of LLMs responsibly and efficiently. #LLMGovernance #ResponsibleLLMs #LLMrisks #LLMethics #LLMpolicy #LLMregulation #LLMbias #LLMtransparency #LLMaccountability

  • View profile for Joas A Santos
    Joas A Santos Joas A Santos is an Influencer

    Cyber Security Leader | Offensive Security Specialist | Application Security / Cloud Security | University Lecturer | AI and Machine Learning Engineer

    141,941 followers

    Vulnerabilities in MCP (Model Context Protocol) I was hired to audit integrations of an LLM with MCP, for use with data management tools, log collections and automated routines. Here are some problems I found and would like to share so that those of you who want to implement MCP in your products can start thinking about security at the beginning of the development cycle. However, it is worth mentioning that there are still not many efficient solutions, despite some selling LLM Firewalls. I would like to test and validate the effectiveness of this. Anyway, let's get to the points: 1) The lack of HTTPS in API Integrations was a problem I noticed a lot. The LLM and the integrated MCP APIs that were integrated with the tools or executed commands and received the response to the commands allowed me to view the requests and responses. I used Wireshark to validate. 2) Inadequate Permission Management, allowing me to access data from other clients without any tenant isolation, all via Prompt Injection and Burp Suite to analyze requests and perform basic manipulations. 3) Abuse of Automations and Unrestricted Resource Consumption, allowing me to trigger multiple parallel routines, all via a single prompt, or sending different prompts causing the server to trigger routines all at once, without proper thread queue management. I used Burp Suite with Intruder and created a list of prompts and executed at least 50 different prompts with the same context. In addition, there was no control over the request limit in the APIs. 4) SQL Injection via Prompt, basically making requests using human language, for example: ā€œwhat columns does the users table have?ā€ resulted in queries being executed directly without control and spitting out information, i.e., it seems that the integration opened the database schema (weird). Obviously, the problem is that it built the query in the backend and processed it as an SQL query. I used Burp Suite in this case to analyze the response, etc. 5) Hardcoded Secrets in the MCP Code. API tokens, database credentials, and endpoints were found directly in the MCP integration scripts. Although it is obvious, just because they are in the backend does not mean they must be hardcoded. Unfortunately, I was unable to extract secrets via prompt injection or obtain an RCE. 6) Broad Context allowing Full Control of the application. Although I did not obtain the application secrets, providing broad context to the LLM gave it full control over the integrated systems, executing tasks that should be exclusive to the admin, since the configured keys had excessive permissions that allowed the execution of numerous functions. In short, these are flaws that a trained developer with knowledge of application security could resolve, but many who start integrating solutions with AI do not worry about Shift-Left. #mcp #AI #redteam #cybersecurity #AISecurity #mcpsecurity #pentest #llmpentest

Explore categories