This paper looks at how LLMs, which are AI models trained on huge amounts of text, can help doctors predict medical conditions by analyzing EHRs (patient health records). 1️⃣ The study tests two AI models, GTE-Qwen2-7B and LLM2Vec-Llama3.1-8B, on 15 medical prediction tasks, like guessing if a patient will need to stay in the hospital longer or develop a certain condition. 2️⃣ These AI models often work just as well—or even better—than special EHR foundation models (like CLIMBR-T-Base) and traditional prediction methods, especially when there's little training data. 3️⃣ Performance improves with larger LLM models and longer context windows, with GTE-Qwen2-7B performing best at a 4,096-token context length. 4️⃣ The researchers turned complex medical records into simple, organized text (like a structured note), making it easier for the AI to understand and predict health outcomes. 5️⃣ Combining LLM-based embeddings with the EHR-specific model further improves predictive accuracy, suggesting complementary strengths. 6️⃣ LLM-based EHR encoding offers a scalable alternative to traditional EHR-specific models, overcoming challenges related to dataset availability and coding inconsistencies. ✍🏻 Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, Benjamin Wild. Large Language Models are Powerful EHR Encoders. arXiv. 2025. DOI: 10.48550/arXiv.2502.17403
Machine Learning Models For Healthcare Predictive Analytics
Explore top LinkedIn content from expert professionals.
Summary
Machine learning models for healthcare predictive analytics use artificial intelligence to analyze patient data and forecast medical events, such as disease progression or treatment responses. These models are helping doctors and healthcare organizations make more informed decisions by turning complex information into practical insights.
- Combine different data: Integrating sources like medical images, electronic health records, and genetic information can help models paint a fuller picture of a patient’s health for more accurate predictions.
- Explain predictions: Using tools that ground AI forecasts in evidence, such as summarizing why a model makes certain recommendations, helps build trust and supports adoption in clinical settings.
- Segment patient groups: Identifying which patients are most likely to benefit from specific treatments can guide personalized care and manage costs without sacrificing quality.
-
-
How can we decrease pharmacy spend on high-cost drugs by double digits without worse outcomes? --- Uplift modeling is a common tactic in marketing to target the specific people for a promotion that otherwise wouldn’t buy the product. While marketing in general can lead to overconsumption, in healthcare/#pharmacy, the same mathematical techniques used for uplift modeling could be repurposed to support #PrecisionMedicine or personalized medicine, where the goal is to identify which patients are most likely to benefit from a specific treatment while avoiding unnecessary treatments for patients who might not respond well. Identifying the cohort that is getting most of the outcomes from a drug varies by drug, but some drugs have only a fraction of the total population driving a larger share of clinical results. --- Here's the basic process for using #UpliftModeling (you can find more details from my Milliman white paper in the comments): 1. Treatment: Identify the treatment for which you want to predict response (e.g., a high-cost brand/specialty drug like GLP-1s). This could also be done for a medical device or any intervention. 2. Data collection: Gather comprehensive data and studies about patients, including their medical history, genetic information, and any other relevant attributes. This is often the limiter of building a good model. 3. Control group: Assemble a control group of patients who are similar to those receiving the treatment but are not receiving the treatment themselves. This helps establish a baseline for comparison. 4. Outcome measurement: Measure the effectiveness of the treatment for both the treatment group and the control group. This could involve monitoring health improvements, cardiac events, or other relevant medical outcomes. For FDA-approved drugs, this could come from published research on the “absolute risk reduction” or “number needed to treat.” 5. Model building: Develop predictive models using machine learning algorithms that estimate the likelihood of a positive response to the treatment for each individual. 6. Uplift calculation: Calculate the difference in response rates between the treatment group and the control group to determine the net impact of the treatment. 7. Segment: Divide patients into different segments based on their predicted response probabilities. 8. Action: Use the insights from uplift modeling to guide treatment, coverage, or other decisions. --- A payer or employer can use this information how they’d like, but I imagine it will be used to adjust formularies or utilization management strategies. It could also be used when setting up contracts for how a drug should be used while carving out certain drugs or disease states (e.g. oncology drugs at a center of excellence). There are more potential use cases in the white paper in the comments. --- Would you use this strategy for #PharmacyBenefits or #ValueBasedCare models that take on risk for cost of care?
-
Multimodal Machine Learning (MML) is a field of AI that focuses on integrating and understanding information from multiple modalities, such as text, images, audio, video, and sensor data. The concept has been around for decades, however, modern MML began gaining prominence in the early 2000s, driven by advancements in Machine-/Deep-Learning over the last decade or so and the availability of large, multimodal datasets. For readers familiar with Multi-View Learning (MVL), MML is conceptually similar. Both fields (MML and MVL) involve: • working with multiple types of inputs to improve learning. In MVL, these inputs are called "views," while in MML, they are "modalities". • focus on integrating or aligning information from different data sources to enhance performance on tasks like classification, regression, or clustering. • leverage complementary information from different data sources to improve predictive accuracy, robustness, and generalization. The MML domain has significantly enhances medical informatics by integrating diverse data types (e.g., clinical notes, imaging, genomic data, and sensor readings), offering benefits over traditional unimodal machine learning, which typically analyzes a single data type. MML combines diverse data (e.g., MRI scans with patient history), providing a comprehensive view of patient health that aids in better diagnosis and treatment planning. By leveraging complementary information from multiple modalities, MML also improves predictive accuracy for tasks like disease detection and prognosis. MML-powered tools streamline clinical workflows, such as radiology reporting or real-time decision-making during surgeries. Combining electronic health records (EHR) with medical images enhances diagnosis and prognosis by improving the accuracy of machine learning models in clinical prediction. The asynchronous and complementary nature of EHR and medical images presents unique challenges. Missing modalities due to clinical and administrative factors are inevitable in practice, and the importance of each data modality varies depending on the patient and the prediction target, leading to inconsistent predictions and suboptimal model performance. To address these challenges, the authors of [1] propose an MML workflow, DrFuse, to enable effective clinical multimodal fusion. It addresses the issue of missing modalities by disentangling features shared across modalities from those unique to each modality. Additionally, DrFuse tackles modal inconsistency through a disease-wise attention layer that generates patient- and disease-specific weightings for each modality to make the final prediction. #MedicalInformatics They validate the proposed DrFuse method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR. Experimental results show that the proposed method significantly outperforms the state-of-the-art models. The links to the preprint [1] and #Python GitHub repo [2] are shared in the first comment.
-
A friend of mine suggested that in order to get a job in the current market, I need to level up my skills and also learn new AI/LLM/RAG pipelines. So I started upskilling myself and initiated projects to build my portfolio. Here is the first portfolio project: Oncology Drug Response Prediction System ================================== This project tackles a critical challenge in Healthcare AI: explaining complex drug response predictions. It's not enough to predict; we must ground the decision in evidence. How the Hybrid Architecture Works> This solution combines two powerful technologies into one integrated system: Specialized Deep Learning (The Predictor): I trained a Convolutional Neural Network (CNN) on TCGA genomic (RNA-Seq/Gene Expression) data to predict patient response (Responder/Non-Responder) to a specific Oncology therapy with an AUC of X.XX. LLM and RAG (The Interpreter): I implemented Retrieval-Augmented Generation (RAG) to connect an LLM (e.g., Llama 3 or GPT) to a curated vector database of Clinical Guidelines and biomedical abstracts. The LLM utilizes Function Calling/Tool Use to execute the Deep Learning Model and then generates a cited, evidence-based explanation for the prediction. This ensures Knowledge Grounding and provides Explainable AI (XAI) essential for adoption in Clinical Applications. Key Skills learned: Generative AI: Orchestrating complex workflows using LLMs, RAG, and Vector Databases (e.g., ChromaDB). Machine Learning Engineering: End-to-End system design, MLOps principles (using MLflow for tracking), and robust Python development. Deep Learning: Training and deploying Transformer-based or CNN models on high-dimensional sequential/genomic data. Cloud & Deployment: Deployed the solution on Streamlit (or FastAPI/Gradio) for public access, showcasing API Integration and Scalability. 💻 Code & Full Documentation: https://lnkd.in/g-H3PYWj #AI #MachineLearning #LLM #RAG #DeepLearning #Bioinformatics #HealthcareAI #DataScience #MLOps #GenAI #PredictiveModeling #ClinicalInformatics #Google
-
The promise of large language models is to allow patients and physicians to interact with AI through human-like discussions, text. The promise of machine learning models is to elevate how we deal with repetitive, data-based medical tasks. But what if we combine the two? Authors of a new study developed a Digital Twin—GPT (a sort of LLM) to extend LLM-based forecasting solutions to clinical trajectory prediction. "Benchmarking on non-small cell lung cancer, intensive care unit, and Alzheimer’s disease datasets, DT-GPT outperformed state-of-the-art machine learning models, reducing the scaled mean absolute error by 3.4%, 1.3% and 1.8%, respectively." Essentially, it creates virtual patient “digital twins” from electronic health records to forecast disease progression and treatment outcomes in real time. Source: https://lnkd.in/e2tuu8A5
-
Can machine learning predict heart attacks before they happen? Early identification of vulnerable coronary plaques is essential for preventing major heart events. A new study combines advanced imaging and AI to predict which coronary arteries are most likely to cause these events with high accuracy. Novel approach integrated 𝗿𝗮𝗱𝗶𝗼𝗺𝗶𝗰 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 (texture and energy measures) from CT imaging 𝘄𝗶𝘁𝗵 𝗯𝗶𝗼𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝗰𝗮𝗹 𝗺𝗮𝗿𝗸𝗲𝗿𝘀 (stress and strain) derived from finite element analysis. • Radiomics alone: 86% balanced accuracy for artery-level predictions. • Biomechanics alone: 89% balanced accuracy. • 𝗧𝗵𝗲 𝗰𝗼𝗺𝗯𝗶𝗻𝗲𝗱 𝗺𝗼𝗱𝗲𝗹: 𝟵𝟰% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗳𝗼𝗿 𝗮𝗿𝘁𝗲𝗿𝘆 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻𝘀 𝗮𝗻𝗱 𝟵𝟮% 𝗳𝗼𝗿 𝗽𝗮𝘁𝗶𝗲𝗻𝘁 𝘀𝘁𝗿𝗮𝘁𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. The combined machine learning model was effective despite small datasets, showing room for improvement with more data. Great work by Anna Corti, Gabriele Dubini, and co! 🔗 Read the full study: https://lnkd.in/gyaDYgNN Would you trust the results of an AI model if it told you that you were at risk for heart disease? I post the latest developments in health AI & tips for research – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! #AI #Cardiology #Radiomics #Biomechanics #PrecisionMedicine
-
Would you like to learn how multimodal healthcare AI works? Welcome to the kitchen! As part of our ongoing series on foundational model patterns for healthcare AI, together with Alberto Santamaria-Pang, PhD and Peter Lee we have published a new blog post that explores a fascinating problem: combining two distinct embedding models to create a powerful system capable of reasoning across multiple imaging modalities. In this post, we explore how the radiology embedding model, MedImageInsight, and the digital pathology embedding model, Gigapath, can be integrated seamlessly. While each model encodes its input data into its own embedding space, we demonstrate how transforming these vectors into a unified latent space is not only easily achievable but also computationally efficient. This unified space then serves as the foundation for building an effective regression model to predict cancer hazard scores based on imaging data. For those eager to see it in action, our accompanying Jupyter notebook offers a hands-on sample implementation. Both models are open-weight models available in our #microsoft #aifoundry Model Catalog (https://lnkd.in/gs-KmGuC). Read the blog post here: https://lnkd.in/gRh-dFcy Explore and download the Jupyter Notebook here: https://lnkd.in/gkhuVxjJ
-
📉 False positives down. Predictive power up. This is what happened with lung cancer screening (LCS) when AI learns across data types. Researchers trained a multimodal, multitask foundation model on: • 160,000+ CT scans • Clinical and demographic data • 17 LCS related tasks and 49 data elements 📈 The results: ↳ 20% improvement in lung cancer risk prediction ↳ 10% improvement in cardiovascular risk prediction 💡 Why it matters: In real-world practice, radiologists often interpret scans without access to the full clinical data. AI models like this bridges that gap by pulling data from mutiple formats: images, text, and more into one integrated assessment AI isn’t just about automating what we already do. It’s about helping us see the whole picture faster, clearer, and more reliably. The future of healthcare isn’t siloed. It’s collaborative between clinicians and intelligent systems. 👥 Curious, how you are seeing AI reshaping clinical workflows? Dominique, Jaspal Singh, Hasnain, Roshen, Husam, Tajalli, Vishisht, Bradley, Sanjay Pic source: Nature Publication. Full study in comments. #Medicine #Healthcare #Innovation #AI #Technology #ArtificialIntelligence #Collaboration #Data
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development