Half a million genomes. 1.5 billion variants. One breakthrough: we are all truly unique. Twenty years ago, the Human Genome Project took 13 years and $2.7B to sequence a single genome. Today? We can sequence a genome in less than 24 hours for under $1,000. Last week, UK Biobank released 490,640 whole genomes — the largest genetic dataset ever (Nature, 2025). What did we learn? • Each person carries 4–5 million variants • 76% appear in fewer than 10 people — your genome is almost entirely yours • 1 in 10 carries clinically actionable mutations where doctors can intervene today (e.g., BRCA1/2 for cancer, LDLR for heart disease) Why it matters: • Previous genetic tests captured ~6% of human variation. This dataset reveals 40× more • In non-coding regions — the biological switches controlling genes — researchers found 63 new disease associations • Adding 31,785 non-European genomes uncovered 82 disease links invisible in Eurocentric studies From genetics to health impact This transforms medicine today: • Prevention - Polygenic risk scores flag disease decades before symptoms • Diagnosis - Rare disease patients waiting years for answers finally find them • Treatment - Pharmacogenomics matches the right drug, right dose, to your genome The next frontier: genetics + everything else Genetics is the hardware. Health is the software running in real time. Your DNA is fixed, but biology is dynamic, shaped by: • Epigenetics: how environment and lifestyle switch genes on/off • Proteomics & metabolomics: molecular signals revealing your current health state • Digital biomarkers: continuous data from stress, sleep, glucose, heart rate • Stress biology & neuroendocrine signaling: how cortisol and brain-body responses reshape your health trajectory Layer these dynamic signals onto genetic foundations, power them with AI, and you create living health models, not just predicting disease, but understanding when, why, and how it manifests in YOU. The critical question? We've spent decades treating the "average patient" — who doesn't exist. Now we can better see each person as they truly are: biologically unique, dynamically changing, infinitely complex. The healthcare winners of the next decade won't just collect data: they'll integrate genetics, epigenetics, molecular and phenotypic tests, lifestyle, stress biology, and digital signals to deliver truly personalized, preventive care at scale. There is no "normal" genome, only 8 billion unique experiments in being human. And we just decoded the first half million. 👉 Which excites you more: knowing your genetic blueprint, or understanding how your daily choices rewrite it?
Bioinformatics for Drug Discovery
Explore top LinkedIn content from expert professionals.
-
-
This is the best article I've read on #23andMe, #privacy, and #bankruptcy. By Keith Porcaro. Here's the essence: "Why should anyone be able to buy the genetic data of millions of Americans in a bankruptcy proceeding? The answer is simple: Lawmakers allow them to." They shouldn't. And judges can stop it. "A bankruptcy court could require that users individually opt in before their genetic data can be transferred to 23andMe’s new owners, regardless of who those new owners are. Anyone who didn’t respond or who opted out would have the data deleted." "Bankruptcy proceedings involving personal data don’t have to end badly. In 2000, the Federal Trade Commission settled with the bankrupt retailer ToySmart to ensure that its customer data could not be sold as a stand-alone asset, and that customers would have to affirmatively consent to unexpected new uses of their data. And in 2015, the FTC intervened in the bankruptcy of RadioShack to ensure that it would keep its promises never to sell the personal data of its customers. (RadioShack eventually agreed to destroy it.)" "The U.S. Trustee has requested the appointment of an ombuds in this case. While scholars have called for the role to have more teeth and for the FTC and states to intervene more often, a framework for protecting personal data in bankruptcy is available." "Here, 23andMe has a more permissive privacy policy than ToySmart or RadioShack. But the risks incurred if genetic data falls into the wrong hands or is misused are severe and irreversible." https://lnkd.in/edBQkj_h
-
GENERATIVE BIOLOGY AI just wrote genetic instructions that cells actually followed – a breakthrough that turns biology into a programming language. For the first time ever, researchers at the Center for Genomic Regulation created AI-generated DNA sequences that successfully controlled gene expression in healthy mammalian cells. Think of it as writing software, but for living organisms. Why this matters: → The AI can design custom 250-letter DNA fragments with specific instructions like "activate this gene in stem cells becoming red blood cells but not platelets" → These synthetic enhancers worked EXACTLY as predicted when tested in mouse blood cells → Unlike previous efforts focused on cancer cells, this team worked with healthy cells, uncovering subtle mechanisms that shape our immune system → The researchers built a library of 64,000+ synthetic enhancers tested across seven stages of blood cell development Most fascinating was discovering "negative synergy" - where two factors that individually activate genes can completely shut them down when combined. This unlocks precision we never had before. The implications are enormous for gene therapy. Instead of being limited to DNA sequences evolution produced, we can now design ultra-selective gene switches customized to specific cells and tissues - potentially making treatments more effective with fewer side effects. Full paper: https://lnkd.in/en3bGZP9 Follow-up with @EricTopol's post about curing rare diseases with the existing genomic technology stack https://lnkd.in/eGCYMjGJ
-
Synthetic biology is - quite literally - our future. A goundbreaking new biological foundation model Evo2 achieves state-of-the-art prediction of genetic variation impacts and generates coherent genome sequences, spanning all domains of life. A diverse team from leading research institutions including Arc Institute Stanford University NVIDIA University of California, Berkeley trained the model on 9.3 trillion DNA base pairs and has fully shared all code, parameters, and data. A few highlights from the paper (link in comments) 🔬 Zero-shot prediction achieves state-of-the-art accuracy in genetic variant interpretation. Evo 2 can predict the functional consequences of genetic mutations across all domains of life without specialized training. It surpasses existing models in assessing the pathogenicity of both coding and noncoding variants, including BRCA1 cancer-linked mutations. This generalist capability suggests Evo 2 could revolutionize genetic disease research, reducing reliance on expensive, manually curated datasets. 🛠 Genome-scale generation paves the way for synthetic life design. Evo 2 can generate full-length genome sequences with realistic structure and function, including mitochondrial genomes, bacterial chromosomes, and yeast DNA. Unlike prior models, Evo 2 ensures natural sequence coherence, improving synthetic biology applications like engineered microbes or artificial organelles. This sets the stage for programmable biology at an unprecedented scale. 🧬 Unprecedented long-context understanding revolutionizes genomic analysis. Evo 2 operates with a context window of up to 1 million nucleotides—far beyond the capabilities of previous models—allowing it to analyze genomic features across vast distances. This ability enables it to accurately identify regulatory elements, exon-intron boundaries, and structural components critical for understanding genome function. Its long-context recall is a major breakthrough for interpreting complex biological sequences. 🎛 Inference-time search enables controllable epigenomic design. Evo 2’s generative abilities extend beyond raw DNA sequence to epigenomic features, allowing researchers to design sequences with specific chromatin accessibility patterns. This approach successfully encoded Morse code messages into synthetic epigenomes, demonstrating a new method for controlling gene regulation via AI. This could lead to breakthroughs in gene therapy and epigenetic engineering. 🔮 Future potential: Toward AI-driven biological design and virtual cell modeling. Evo 2 represents a major leap toward AI-powered genomic engineering. Future iterations could integrate additional biological layers—such as transcriptomics and proteomics—to create virtual cell models that simulate complex cellular behaviors. This could revolutionize drug discovery, genetic therapy, and even synthetic life creation.
-
Amazon launches AI drug discovery platform to accelerate antibody design and testing 🔘Amazon has launched Amazon Bio Discovery, a platform designed to help scientists generate and test antibody drug candidates faster by combining biological foundation models with AI agents in a single environment 🔘The key shift is accessibility, AI agents guide researchers through model selection, optimisation, and experiment design, meaning scientists without deep computational expertise can run advanced drug discovery workflows 🔘The platform acts as a marketplace of models, bringing together Amazon, open source, and partner algorithms such as Apheris and Profluent, while also allowing organisations to train and deploy their own proprietary models 🔘A major innovation is the “lab in the loop” system, where AI generated candidates are physically synthesised and tested by partners like Twist Bioscience and Ginkgo Bioworks, with results fed back to continuously improve the models 🔘Early results suggest significant acceleration, work with Memorial Sloan Kettering Cancer Center generated hundreds of thousands of antibody designs and moved from design to wet lab testing in weeks rather than up to a year 💬Drug discovery is shifting from isolated AI tools to integrated systems that connect models, data, and lab testing into a continuous learning loop, making research faster and more accessible #digitalhealth #ai #pharma
-
Identifying cancer-related mutations accurately is a critical step in precision medicine. Today, we’ve published new research in Nature Biotechnology on 🧬DeepSomatic🧬, an AI-powered tool that uses machine learning to identify genetic variants, or mutations, in cancer cells more accurately than current methods. This work is aimed at helping researchers pinpoint what's driving a cancer and informing more effective treatment plans. Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies offer potential advantages to discover variants in the hardest to sequence parts of the genome. 🧬 About the model: DeepSomatic was rigorously trained on high-confidence data, a feat made possible by working with our partners at UC Santa Cruz. The model is capable of accurately differentiating actual genetic cancer variants from the technical artifacts introduced during sample preservation, addressing a critical hurdle in early detection. 🧬 Superior Accuracy and Clinical Impact: DeepSomatic consistently outperformed other tools across all major sequencing platforms. It shows major improvements in identifying complex insertions and deletions (Indels). Furthermore, in a new study with partners at Children's Mercy, DeepSomatic successfully found ten small variants in pediatric leukemia cells that were missed by other tools. 🧬 Flexible and Broad Use: The model is flexible, working across all major sequencing platforms, and can be applied to both tumor-normal and challenging tumor-only samples, extending its utility for complex cancer types. 🧬 Open Access: We are making DeepSomatic and the CASTLE dataset openly available to the research community. DeepSomatic is the most recent addition to our 10-year journey developing open source methods for geneticists to study the genomes of humans, plants, and animals. We are excited to see how researchers and drug manufacturers will use these resources to develop more effective, personalized treatments for cancer patients. The ability to accurately identify these subtle genetic drivers is key to unlocking new therapies. More in our blog authored by Kishwar Shafin and Andrew Carroll: https://goo.gle/4n23gIB Read the full article in Nature Biotechnology: https://lnkd.in/drxii8fz
-
𝟗𝟖% 𝐨𝐟 𝐘𝐨𝐮𝐫 𝐆𝐞𝐧𝐨𝐦𝐞 𝐖𝐚𝐬 𝐈𝐠𝐧𝐨𝐫𝐞𝐝. 𝐔𝐧𝐭𝐢𝐥 𝐍𝐨𝐰 What if we could read the genome like a story — not just in fragments, but as a whole, with rhythm, meaning, and twist endings? Google just brought us one step closer. Introducing AlphaGenome — DeepMind and Google’s newest AI model that predicts how your DNA is read, regulated, and sometimes... misinterpreted. But first — a quick decode: DNA is made of 4 letters — A, T, C, G — the alphabet of life. These 3 billion letters tell your cells what to do. But they’re not read linearly like a book — they fold and loop in 3D, bringing distant parts together to turn genes on or off. This folding is everything. A mutation buried 100,000 letters away could still influence gene activity — just because folding brought it into the “wrong neighborhood.” That’s where AlphaGenome shines. It reads up to 1 million DNA letters at a time — enough to catch complex folding patterns and regulatory cues in one shot. “But wait — we have 3 billion letters!” Yes — and like reading a novel one chapter at a time, AlphaGenome moves across the genome in overlapping tiles, decoding each “functional neighborhood” in high detail. And here’s what it does inside each tile: - Predicts where genes start, stop, splice, and fold - Estimates RNA expression and protein-binding activity - Identifies how a tiny mutation might ripple across the system - Models splicing errors that cause rare diseases - Spots cancer-driving mutations in “non-coding” DNA — the 98% we used to overlook Previous models (like Enformer) had to trade off resolution vs. context. AlphaGenome offers both. It’s like watching your genome in high-def, panoramic, slow-mo… at the same time. Now available via API for non-commercial research. Not for clinical use — yet. But a massive step toward understanding how life truly runs under the hood. Sometimes, it’s not about what the DNA says — but where it folds… where it pauses… and what happens in the quiet. AlphaGenome teaches us something deeper: That meaning doesn’t always lie in the loudest signals. Often, it’s in the 98% we ignore. The background. The regulation. The timing. Life, too, is like that. It’s not just what you do. It’s when, how, and in what context you show up. The pauses between the notes matter. The unseen structure holds the story. And maybe… that’s where the real change begins Amit Saxena Ajay Nandgaonkar Suchitaa Paatil Sanju S Anju Goel Taruna Anand #AlphaGenome #GoogleDeepMind #GenomicsExplained #FutureOfHealth #VariantEffectPrediction #SyntheticBiology #PrecisionMedicine #DNAMagic #AIinHealthcare #AI #AccessForAll
-
We’re getting better at reading genes. Now we’re learning how to read them in 3D. A new study introduces a method to resolve signal overlap in spatial transcriptomics data, one of the biggest technical bottlenecks in mapping gene expression inside intact tissue. In dense biological samples, transcripts from neighboring cells often overlap, making it difficult to accurately assign signals to the correct cellular source. This blurring limits how precisely we can reconstruct tissue architecture. By improving how overlapping signals are separated computationally in three-dimensional space, researchers can generate far more accurate maps of how cells are organized in situ. This doesn’t just refine the data, it changes the reliability of downstream biological interpretation. For neuroscience, this is particularly significant. The brain is a tightly packed 3D network of gradients, microenvironments and dynamic cellular interactions. Circuit function, disease progression and developmental processes all depend on spatial context. If our spatial resolution is compromised, our models of brain function are incomplete. As biology moves from bulk averages toward high-resolution spatial systems, segmentation accuracy becomes foundational infrastructure, not a minor technical upgrade. Precision in three dimensions is what enables precision in understanding. Source: Nature Biotechnology, 2026 — “Identifying 3D signal overlaps in spatial transcriptomics data with ovrlpy.” #Neuroscience #SpatialTranscriptomics #SystemsBiology #Genomics #BrainResearch #Biotechnology #Innovation #Research
-
+1
-
Machine learning (ML) is revolutionizing genomics, but common pitfalls can lead to misleading results. Here's a thread on how to avoid them 🧵 1/ Pitfall 1: Distributional Differences Genomic data often exhibits inherent biological structure, leading to distributional differences. This can impact model performance when training & test sets have different distributions than the prediction set. Example: Models trained on in vitro data often perform poorly on in vivo data, as seen in transcription factor binding site prediction. This highlights the need to carefully consider the context in which a model will be applied. 2/ Pitfall 2: Dependent Examples Genomic data is often interconnected, violating the independence assumption of many ML models. This can inflate performance estimates during cross-validation. Example: When predicting protein-protein interactions, pairs sharing a protein are correlated. This can be mitigated by employing group k-fold cross-validation, ensuring dependent examples don't cross the train-test divide. 3/ Pitfall 3: Confounding Unmeasured variables can create or mask associations, leading to misinterpretations. Example: In GWAS, population structure can confound genotype-phenotype relationships. The infamous autism spectrum disorder prediction model initially seemed successful but failed to replicate after accounting for population structure. 4/ Pitfall 4: Leaky Preprocessing Data processing can inadvertently leak information from the test set into the training set, resulting in over-optimistic performance estimates. Example: Feature selection based on the entire dataset before cross-validation, common in DNA methylation analysis, introduces leakage. (this is probably one of the most common mistakes I see...) hold a test dataset that you never touch until the final step. Pitfall 5: Unbalanced Classes Datasets with uneven class distribution can lead to models overfitting the majority class. Example: Predicting enhancers is challenging due to the small proportion of positive examples. Resampling techniques and choosing appropriate performance metrics like auPR can help address this. Also use PR not ROC to evaluate your model https://lnkd.in/eh9JHmxc Key Takeaways: • Genomic data has unique characteristics that require careful consideration when applying ML. • Thoroughly inspect your data, considering potential dependencies, confounders, & class imbalance. • Employ appropriate techniques like group k-fold cross-validation, balancing methods, & robust performance metrics. By understanding these pitfalls and taking steps to mitigate them, we can ensure that ML applications in genomics yield reliable and insightful results. 💪 dive deep into the paper https://lnkd.in/eDFWnW9U I hope you've found this post helpful. Follow me for more. Subscribe to my FREE newsletter https://lnkd.in/erw83Svn
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Event Planning
- Training & Development