Data Privacy Risks in Writing Software

Explore top LinkedIn content from expert professionals.

Summary

Data privacy risks in writing software refer to the potential ways sensitive information can be exposed, misused, or accessed without authorization during the creation, training, or use of software—especially those using artificial intelligence or large language models. These risks arise when software handles personal or confidential data in ways that might not be transparent or secure, leading to privacy breaches, regulatory violations, or loss of trust.

Audit data flows: Regularly review how and where the software processes and shares personal information to ensure all sensitive data is handled securely and with user consent.
Minimize data exposure: Limit the amount of personal or sensitive data used by software, anonymize information wherever possible, and restrict access based on clear need.
Maintain oversight: Set up clear logging, audit trails, and human review processes to monitor AI decisions and data usage, especially when handling confidential or regulated information.

Summarized by AI based on LinkedIn member posts

Ankita Gupta

Co-founder and CEO at Akto.io - Building the world’s #1 MCP and AI Agent Security Platform

24,420 followers 9mo
Report this post
Day 6 of MCP Security: How Does MCP Handle Data Privacy and Security? In MCPs, AI agents don’t just call APIs — they decide which APIs to call, what data to inject, and how to act across tools. But that introduces new privacy and security risks 👇 𝗪𝗵𝗮𝘁’𝘀 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝘄𝗶𝘁𝗵 𝗠𝗖𝗣𝘀? In traditional systems, data moves in defined flows: Frontend → API → Backend You know what’s shared, when, and with whom. 𝗜𝗻 𝗠𝗖𝗣𝘀: • Context (PII, tokens, metadata) is injected at runtime • The model decides what’s relevant • The agent can store, reason over, and share user data autonomously • Tool calls are invisible unless explicitly audited 𝗞𝗲𝘆 𝗣𝗿𝗶𝘃𝗮𝗰𝘆 𝗥𝗶𝘀𝗸𝘀 𝘄𝗶𝘁𝗵 𝗠𝗖𝗣𝘀 1. Context Leakage: Memory and prompt history may persist across sessions, allowing PII to leak between users or flows. 2. Excessive Data Exposure: Agents may call APIs or tools with more data than needed, violating the principle of least privilege. 3. Unlogged Data Flows: Tool calls, prompt injections, and chained actions may bypass traditional logging, breaking auditability. 4. Consent Drift: A user consents to one action, but the agent infers and performs other actions based on the user's intent. That’s a privacy violation. 𝗪𝗵𝗮𝘁 𝗣𝗿𝗶𝘃𝗮𝗰𝘆 𝗖𝗼𝗻𝘁𝗿𝗼𝗹𝘀 𝗠𝗖𝗣 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗠𝘂𝘀𝘁 𝗜𝗻𝗰𝗹𝘂𝗱𝗲: ✔️ Context Isolation Prevent data from crossing agent sessions or user boundaries without explicit logic. ✔️ Prompt-Level Redaction Strip sensitive data before it's passed into agent prompts. ✔️ Chain-Aware Access Controls Control not just what tool can be called, but how and when it’s called, especially for downstream flows. ✔️ Logging & Audit Trails for Reasoning Log not just API calls, but: Prompt inputs Tool decisions Context usage Response paths ✔️ Dynamic Consent Models Support user-level prompts that include consent logic, especially when agents make cross-domain decisions. In short: MCPs don’t just call APIs, they decide what data to use and how. If you’re not securing the context, the memory, and the tools, you’re not securing the system.

3 Comments
Like Comment
Luiza Jarovsky, PhD Luiza Jarovsky, PhD is an Influencer

Co-founder of the AI, Tech & Privacy Academy (1,400+ participants), Author of Luiza’s Newsletter (94,000+ subscribers), Mother of 3

130,727 followers 1y
Report this post
🚨 AI Privacy Risks & Mitigations Large Language Models (LLMs), by Isabel Barberá, is the 107-page report about AI & Privacy you were waiting for! [Bookmark & share below]. Topics covered: - Background "This section introduces Large Language Models, how they work, and their common applications. It also discusses performance evaluation measures, helping readers understand the foundational aspects of LLM systems." - Data Flow and Associated Privacy Risks in LLM Systems "Here, we explore how privacy risks emerge across different LLM service models, emphasizing the importance of understanding data flows throughout the AI lifecycle. This section also identifies risks and mitigations and examines roles and responsibilities under the AI Act and the GDPR." - Data Protection and Privacy Risk Assessment: Risk Identification "This section outlines criteria for identifying risks and provides examples of privacy risks specific to LLM systems. Developers and users can use this section as a starting point for identifying risks in their own systems." - Data Protection and Privacy Risk Assessment: Risk Estimation & Evaluation "Guidance on how to analyse, classify and assess privacy risks is provided here, with criteria for evaluating both the probability and severity of risks. This section explains how to derive a final risk evaluation to prioritize mitigation efforts effectively." - Data Protection and Privacy Risk Control "This section details risk treatment strategies, offering practical mitigation measures for common privacy risks in LLM systems. It also discusses residual risk acceptance and the iterative nature of risk management in AI systems." - Residual Risk Evaluation "Evaluating residual risks after mitigation is essential to ensure risks fall within acceptable thresholds and do not require further action. This section outlines how residual risks are evaluated to determine whether additional mitigation is needed or if the model or LLM system is ready for deployment." - Review & Monitor "This section covers the importance of reviewing risk management activities and maintaining a risk register. It also highlights the importance of continuous monitoring to detect emerging risks, assess real-world impact, and refine mitigation strategies." - Examples of LLM Systems’ Risk Assessments "Three detailed use cases are provided to demonstrate the application of the risk management framework in real-world scenarios. These examples illustrate how risks can be identified, assessed, and mitigated across various contexts." - Reference to Tools, Methodologies, Benchmarks, and Guidance "The final section compiles tools, evaluation metrics, benchmarks, methodologies, and standards to support developers and users in managing risks and evaluating the performance of LLM systems." 👉 Download it below. 👉 NEVER MISS my AI governance updates: join my newsletter's 58,500+ subscribers (below). #AI #AIGovernance #Privacy #DataProtection #AIRegulation #EDPB
No more previous content

No more next content
26 Comments
Like Comment
Peter Slattery, PhD

MIT AI Risk Initiative | MIT FutureTech

68,197 followers 11mo
Report this post
Isabel Barberá: "This document provides practical guidance and tools for developers and users of Large Language Model (LLM) based systems to manage privacy risks associated with these technologies. The risk management methodology outlined in this document is designed to help developers and users systematically identify, assess, and mitigate privacy and data protection risks, supporting the responsible development and deployment of LLM systems. This guidance also supports the requirements of the GDPR Article 25 Data protection by design and by default and Article 32 Security of processing by offering technical and organizational measures to help ensure an appropriate level of security and data protection. However, the guidance is not intended to replace a Data Protection Impact Assessment (DPIA) as required under Article 35 of the GDPR. Instead, it complements the DPIA process by addressing privacy risks specific to LLM systems, thereby enhancing the robustness of such assessments. Guidance for Readers > For Developers: Use this guidance to integrate privacy risk management into the development lifecycle and deployment of your LLM based systems, from understanding data flows to how to implement risk identification and mitigation measures. > For Users: Refer to this document to evaluate the privacy risks associated with LLM systems you plan to deploy and use, helping you adopt responsible practices and protect individuals’ privacy. " >For Decision-makers: The structured methodology and use case examples will help you assess the compliance of LLM systems and make informed risk-based decision" European Data Protection Board

12 Comments
Like Comment
Mani Keerthi N

Cybersecurity Strategist & Advisor || LinkedIn Learning Instructor

17,656 followers 2y
Report this post
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey From the research paper: In this paper, we extensively investigate data privacy concerns within Large LLMs, specifically examining potential privacy threats from two folds: Privacy leakage and privacy attacks, and the pivotal technologies for privacy protection during various stages of LLM privacy inference, including federated learning, differential privacy, knowledge unlearning, and hardware-assisted privacy protection. Some key aspects from the paper: 1)Challenges: Given the intricate complexity involved in training LLMs, privacy protection research tends to dissect various phases of LLM development and deployment, including pre-training, prompt tuning, and inference 2) Future Directions: Protecting the privacy of LLMs throughout their creation process is paramount and requires a multifaceted approach. (i) Firstly, during data collection, minimizing the collection of sensitive information and obtaining informed consent from users are critical steps. Data should be anonymized or pseudonymized to mitigate re-identification risks. (ii) Secondly, in data preprocessing and model training, techniques such as federated learning, secure multiparty computation, and differential privacy can be employed to train LLMs on decentralized data sources while preserving individual privacy. (iii) Additionally, conducting privacy impact assessments and adversarial testing during model evaluation ensures potential privacy risks are identified and addressed before deployment. (iv)In the deployment phase, privacy-preserving APIs and access controls can limit access to LLMs, while transparency and accountability measures foster trust with users by providing insight into data handling practices. (v)Ongoing monitoring and maintenance, including continuous monitoring for privacy breaches and regular privacy audits, are essential to ensure compliance with privacy regulations and the effectiveness of privacy safeguards. By implementing these measures comprehensively throughout the LLM creation process, developers can mitigate privacy risks and build trust with users, thereby leveraging the capabilities of LLMs while safeguarding individual privacy. #privacy #llm #llmprivacy #mitigationstrategies #riskmanagement #artificialintelligence #ai #languagelearningmodels #security #risks

2 Comments
Like Comment
Martin Delahunty

Company Director, Inspiring STEM Consulting

4,603 followers 10mo
Report this post
⚠️ CRITICAL AI SECURITY ALERT FOR MEDICAL WRITERS The recent Fortune investigation into Microsoft Copilot's "EchoLeak" vulnerability should be a wake-up call for the medical writing industry. As medical writers increasingly rely on AI tools like Copilot to draft clinical study reports, regulatory submissions, and other documents containing sensitive patient data, we need to address some uncomfortable truths. The Reality Check: ⚠️ A "zero-click" attack could expose patient data without any user interaction ⚠️ Hackers could access clinical trial data, patient information, and proprietary research simply by sending an email ⚠️ The vulnerability bypassed Copilot's built-in protections designed to secure user files Why This Matters for Medical Writing: ✅ We handle HIPAA-protected patient data daily ✅ Clinical study reports contain sensitive efficacy and safety information ✅ Regulatory submissions include proprietary drug development data ✅ Competitive intelligence could be compromised through document access While Microsoft has reportedly fixed this specific flaw, the researchers warn this represents a "fundamental design flaw" in AI agents similar to vulnerabilities that plagued software for decades. Questions We Need to Ask: ⁉️ Are our current AI tool policies adequate for protecting patient privacy? ⁉️ Do we have sufficient oversight when AI assistants access clinical databases? ⁉️ Are we creating audit trails for AI interactions with sensitive documents? ⁉️ Have we assessed the security posture of ALL AI tools in our workflows? The pharmaceutical industry has been cautiously adopting AI agents, and frankly, this caution appears justified. As one researcher noted: "Every Fortune 500 I know is terrified of getting agents to production." Moving Forward: We can't abandon AI innovation, but we must demand transparency about security measures, implement robust data governance, and maintain human oversight of AI interactions with sensitive clinical data. ❓ What security protocols has your organization implemented for AI tool usage? How are you balancing innovation with patient data protection? #MedicalWriting #AIethics #DataSecurity #ClinicalTrials #HIPAA #PharmaSecurity #RegulatoryAffairs https://lnkd.in/eEX2pJ6d

Microsoft Copilot flaw raises urgent questions for any business deploying AI agents | Fortune fortune.com

3 Comments
Like Comment
Richard Lawne

Privacy & AI Lawyer

2,755 followers 1y
Report this post
The EDPB recently published a report on AI Privacy Risks and Mitigations in LLMs. This is one of the most practical and detailed resources I've seen from the EDPB, with extensive guidance for developers and deployers. The report walks through privacy risks associated with LLMs across the AI lifecycle, from data collection and training to deployment and retirement, and offers practical tips for identifying, measuring, and mitigating risks. Here's a quick summary of some of the key mitigations mentioned in the report: For providers: • Fine-tune LLMs on curated, high-quality datasets and limit the scope of model outputs to relevant and up-to-date information. • Use robust anonymisation techniques and automated tools to detect and remove personal data from training data. • Apply input filters and user warnings during deployment to discourage users from entering personal data, as well as automated detection methods to flag or anonymise sensitive input data before it is processed. • Clearly inform users about how their data will be processed through privacy policies, instructions, warning or disclaimers in the user interface. • Encrypt user inputs and outputs during transmission and storage to protect data from unauthorized access. • Protect against prompt injection and jailbreaking by validating inputs, monitoring LLMs for abnormal input behaviour, and limiting the amount of text a user can input. • Apply content filtering and human review processes to flag sensitive or inappropriate outputs. • Limit data logging and provide configurable options to deployers regarding log retention. • Offer easy-to-use opt-in/opt-out options for users whose feedback data might be used for retraining. For deployers: • Enforce strong authentication to restrict access to the input interface and protect session data. • Mitigate adversarial attacks by adding a layer for input sanitization and filtering, monitoring and logging user queries to detect unusual patterns. • Work with providers to ensure they do not retain or misuse sensitive input data. • Guide users to avoid sharing unnecessary personal data through clear instructions, training and warnings. • Educate employees and end users on proper usage, including the appropriate use of outputs and phishing techniques that could trick individuals into revealing sensitive information. • Ensure employees and end users avoid overreliance on LLMs for critical or high-stakes decisions without verification, and ensure outputs are reviewed by humans before implementation or dissemination. • Securely store outputs and restrict access to authorised personnel and systems. This is a rare example where the EDPB strikes a good balance between practical safeguards and legal expectations. Link to the report included in the comments. #AIprivacy #LLMs #dataprotection #AIgovernance #EDPB #privacybydesign #GDPR

4 Comments
Like Comment
Viresh Kumar

Digital Transformation Manager | AI‑Enabled Enterprise Platforms | Agile & SAFe | AWS | Application Engineering Leader

8,551 followers 8mo
Report this post
As AI systems become smarter, they also become juicier targets for attackers and unlike traditional software, AI brings new kinds of risks. Here are the big ones to watch: 🔹Input Manipulation Risks These are the “front door” attacks — they exploit how an AI is fed data. - Prompt Injection → Super common in LLMs. Attackers hide instructions inside text (or documents/images) that override safety rules. Defending is hard because natural language itself is so flexible. - Data Poisoning → If attackers sneak bad data into your training set, the model “learns” to make biased or dangerous outputs. Cloud datasets scraped from the internet are especially vulnerable. - Adversarial Examples → Small tweaks to an input (like barely pixel-changed images or weird punctuation in text) can mislead AI. This is one of the hardest to detect because for humans it looks “normal.” 🔹Protocol Vulnerabilities These reflect traditional cyber risks but in an AI-enabled system. - API Misuse → If the AI API isn’t rate-limited or validated, attackers can overload it or run “prompt brute-forcing.” - Session Hijacking→ Common in any authenticated AI service. If a hijacker steals your token/session, they control your AI feed. - Weak Authentication → This is a human/system design failure, not AI-specific, but still a big gap. 🔹System & Privacy Risks This is where AI overlaps with sensitive data handling. - Unauthorized Access → Hackers running arbitrary commands through AI → think “prompt as the new SQL injection.” - Memory Leaks → Chatbots sometimes “remember” and accidentally share PII or corporate secrets in later conversations. - Data Exfiltration → Attackers can use crafted prompts to slowly extract confidential knowledge from the system. 🔹Model Compromise - This is the “core AI asset” risk. - Model Extraction → Attackers query your model enough times to clone its behavior. Bad for companies with proprietary LLMs. - Model Inversion→ Attackers pull out private training data (e.g., names, addresses, secrets) from model responses. GDPR/Privacy nightmare. - Backdoor Attacks→ If a model is trained on poisoned data with a hidden trigger (“if I type 🔑word, give admin access”), it may look normal until activated. This can sit undetected for a long time. 💡 Which is hardest to defend? 👉 In practice, input Manipulation (Prompt Injection & Adversarial Examples) are the toughest. Why? - Because AI works on probabilistic reasoning, not strict rules — so attackers can always find new wordings, encodings, or formats that “slip through.” - Unlike traditional software bugs, you can’t patch human language. - Every new feature (like letting AI browse the web or run tools) widens the attack surface. That’s why companies focus heavily on red-teaming, layered defense, human-in-the-loop monitoring, and continuous fine-tuning #AISecurity #AIrisks #PromptInjection #AdversarialAI #CyberSecurity #DataPrivacy #ResponsibleAI #FutureOfAI
No more previous content

No more next content
8 Comments
Like Comment
Prashant Mahajan

Privacy Engineering Infrastructure Leader | Founder & CTO, Privado.ai | Built $100M+ Scale Systems | Defining AI-Driven Privacy Automation

11,844 followers 1y
Report this post
The Case for App Scanning and SDK Governance: Lessons from Texas Lawsuit The State of Texas has filed a lawsuit against a large insurance company and its analytics subsidiary for alleged violations of the Texas Data Privacy and Security Act (TDPSA), the Data Broker Law, and the Texas Insurance Code. What happened: - A large insurance company and its analytics subsidiary created a Software Development Kit (SDK), that was embedded into third-party apps offering location-based services. - This SDK secretly collected sensitive user data, including precise locations, speed, direction, and other phone sensor data, without users' awareness. - The collected data was used to create a massive driving behaviour database covering millions of users. - This data was monetized, influencing insurance premiums and policies, often without users' knowledge or consent. - Users were not informed about how their data was being collected or shared, and privacy policies were not clear or accessible. Key issues: 1) No user consent: People did not know their data was being collected or sold. 2) Inaccurate profiling: The SDK often mistook passengers or other scenarios as "bad driving," leading to misleading profiles. 3 ) Non-compliance: The analytics subsidiary failed to register as a data broker, as required by Texas law. Why this matters: This case highlights the risks of hidden data collection in apps. It shows how companies can misuse sensitive data and the importance of protecting user privacy through stronger controls. The way forward: To effectively address these risks, organizations must take assertive action by implementing the following measures - a) Conduct regular mobile app scanning: Analyze apps weekly or bi-weekly to identify permissions, embedded SDKs, and dataflows. b) Govern SDKs effectively: Establish strict policies for integrating and monitoring SDKs. Require transparency from SDK providers about what data is collected, how it is used, and who it is shared with. Avoid SDKs that fail to meet these standards. c) Monitor hidden dataflows: SDKs often operate in the background and can rely on permissions obtained by the app to collect sensitive data. Regularly audit these dataflows to uncover any implicit collection or sharing practices and address potential violations proactively. d) Communicate transparently with users: Update #privacy policies to clearly explain what data is collected, how it will be used, and who it will be shared with. Obtain explicit consent before collecting or sharing sensitive data. The risks of hidden #dataflows and implicit data collection are significant, especially as #SDKs become more complex. How frequently does your team #audit apps for SDK behaviors and permissions? What tools or strategies have you found most effective in uncovering hidden #datasharing?
No more previous content

No more next content
12 Comments
Like Comment
Abi Aryan

ML Research Engineer | Author: GPU Engineering (early release soon!) | Making AI Systems go brrrr...

17,847 followers 6mo
Report this post
This sunday I discovered a big privacy risk I hadn't considered previously while writing my LLMOps book. Everyone jumps on Multilingual models for their capabilities, without considering they are the biggest privacy breaches esp in RAG Systems. Most people think privacy risk in AI is about data breaches or model memorization. But there’s a more subtle vector that almost nobody talks about i.e. semantic neighbors in multilingual embedding space + probabilistic sampling. In a RAG pipeline, a semantically similar but foreign-language document containing sensitive data can be retrieved and then partially leaked into the generated answer, even when the user never queried in that language. Here's why this happens: 1. Multilingual embeddings cluster by meaning, not language So a Farsi sentence about “national ID” can be a close neighbor to an English query about “personal ID.” 2. RAG retrieves based on similarity, not language boundaries So foreign-language private text may enter the context window even if the user never asked for it. 3. Generation uses sampling, not deterministic selection Low-probability tokens from retrieved text can still appear in the final response if temperature/top-p allow them. Result: Even without model memorization, a system can leak sensitive information purely through semantic retrieval and probabilistic decoding. What can you do about it? 1. Language-constrained retrieval 2. Pre-redaction before indexing 3. Output filters for foreign-language tokens & PII 4. Lower temperature for sensitive domains As we deploy RAG systems into enterprise, healthcare, and legal environments, this subtle class of leaks needs to be treated as a first-class threat model, not an edge case.
No more previous content

No more next content
2 Comments
Like Comment
Adrian Dragomir

Founder & CEO @ SFERAL - Helping Mid-Sized Ops Teams Escape “Spreadsheet Purgatory”. We Turn Manual Processes into Software via Dialogue | Founder Termene.ro – 6000+ B2B Clients

16,658 followers 2mo
Report this post
The Dangerous Illusion of "Vibe-Coding" We need to talk about the hidden cost of "vibe-coding." I’ve seen a wave of excitement about building apps just by "vibing" with an LLM. And yes, the speed is intoxicating. But as engineering leaders like Vlad Călin recently pointed out, there is a massive difference between generating code and building a system. If you are building a toy, vibe away. If you are building a business, you are likely walking into a minefield. Here is why "blind" AI code generation is dangerous for operations, and how we decided to fix it structurally at Sferal AI. 1. Security cannot be "hallucinated" An LLM might write a login function for you. But will it use bcrypt or plain text? Will it handle session tokens securely? We decided that AI should never touch this layer. In Sferal AI, authentication, session management, and RBAC are hard-coded, audited modules. The AI configures who gets in, but it never touches the code that lets them in. 2. The Serverless Trap Most vibe-coding tools push you to serverless environments. It sounds great until an AI-generated infinite loop spins up your resources and generates a massive cloud bill overnight. We chose isolation. Every Sferal app runs in its own secure Virtual Machine (VM) with strict resource limits. No surprises, no noisy neighbors. 3. Data Privacy is not just a URL Hosting files is easy. Securing them so that User A cannot guess the URL of User B’s contract is hard. We don’t rely on "security by obscurity." We use a Backend Proxy pattern. No file has a public link. Every request is mediated by the backend, which verifies identity before streaming a single bit. 4. Secrets must remain secret AI loves to put API keys in the frontend code because it's "easier." We enforce a strict Client-Side vs. Server-Side separation. Your OpenAI or Stripe keys never leave the secure backend container. Plus, our automated security audit scans every configuration before deployment to catch what the AI might have missed. The Bottom Line: The future isn't about writing code faster. It's about Assisted Engineering. We built Sferal to give business leaders the speed of AI, but with the guardrails of a strict IT department. Don’t just generate code. Generate architecture. #Sferal #NoCode #AI #Engineering #SaaS #Security
No more previous content

No more next content
8 Comments
Like Comment

Data Privacy Risks in Writing Software

Summary

More in Navigating Data Privacy

Explore categories