Weekly recap (Oct 3, 2025)
R updates (from rOpenSci, Blaze, R Works, RWeekly, Posit), Slidecrafting, AI highs & lows, structural variation, biotech, RAG, tinytable, "vibe work," and a few compbio and related papers
Happy Friday, colleagues. It’s the end of the week and once again I’m going through my long list of idle browser tabs trying to catch up where I can. Lots of R and AI-related news this week.
Emil Hvitfeldt: Slidecrafting (slidecrafting-book.com). This is a really wonderful one-stop shop for tips on making beautiful slides with reveal.js and Quarto.
An AI model trained on healthcare records uses a person’s medical history to estimate whether and when any of more than 1,200 diseases might arise. Original paper: Learning the natural history of human disease with generative transformers. And the accompanying News & Views synopsis: AI uses medical records to accurately predict onset of disease 20 years into the future.
Building Biotechs Podcast: Putting the “Tech” in Biotech: An Unconventional Path to Building a Therapeutics Company, with Federico Paoletti.
Posit’s weekly AI newsletter: Anthropic model degradation incident, Codex updates (which you can use in Positron), some AI updates from posit::conf(2025), terms and definitions (agents, tool calls).
Linked from that newsletter were the materials from Garrick Aden-Buie and Joe Cheng’s Programming with LLM APIs workshop.
CZI: Accelerating AI in Biology With Community-Driven Benchmarks.
The NSF’s Graduate Research Fellowship Program solicitation was published, but notably excludes second-year Ph.D. students. See the article in Science published after the announcement: ‘Completely shattered.’ Changes to NSF’s graduate student fellowship spur outcry.
Claus Wilke: How to write an NSF GRFP personal statement.
Claus Wilke again: How to write an NSF GRFP research plan.
Decoding Bio - BioByte 134: ToolUniverse Democratizes AI Scientists, mBER Tackles Nanobody-Based Binder Design, Nephrobase Cell+ Comprehensively Models Kidney Biology, and the Release of PRISM for Neuron Tagging.
UVA Data Points podcast: Trustworthy AI. Featuring my colleague here Farhana Faruqe (data scientist, researcher, entrepreneur), along with Larry Medsker, (leading expert in AI ethics and policy).
rOpenSci News Digest, September 2025: Tips for making your software outlive your job, new packages, software peer review, package development corner. I learned of a new function in the pkgcheck package, which contains a standalone function to use during package development, to quickly check whether your function names are unique, e.g.: pkgcheck::fn_names_on_cran(c("min", "max"))
.
Joe Rickert on R Works: August 2025 Top 40 New CRAN Packages, organized in 18 categories: Causal Inference, Data, Differential Privacy, Ecology, Environmental Studies, Epidemiology, Geology, Genetics, Genomics, Health Technology Assessment, Machine Learning, Medical Statistics, Statistics, Surveys, Time Series, Toxicology, Utilities, and Visualization.
R Weekly 2025-W40 Ducklake, Slidecrafting, Shiny & LLMs: Ducklake, Slidecrafting with reveal.js and Quarto; Shiny & LLMs; testing with testthat; new and updated R packages and CRANberries
Data Scientist with R weekly newsletter: R community, conferences & roundups, R packages: releases, testing & maintenance, Quarto, R Markdown, ggplot2 visualizations, charts & mapping in R, Applied analyses & workflows with R (health, bio & maps), Statistical inference, simulations & Bayesian thinking, Academic Research.
Anthropic published the Claude Sonnet 4.5 system card. It’s extensive (148 pages!) and includes a detailed section on biosecurity done in collaboration with Signature Science where I worked for many years. The CBRN evals start on page 124. Claude Sonnet 4.5 matched or exceeded predecessors in virology molecular sequence design, laboratory protocol design, and designing sequences that assembled plasmids or evaded synthesis screening protocols.
From Heng Li and Alvin Qin: ~60% of human SVs fall in ~1% of GRCh38. Read the paper: Challenges in structural variant calling in low-complexity regions and the accompanying blog post, Our journey through low-complexity regions.
The real (economic) AI apocalypse is nigh. There’s no shortage of articles in the AI-is-a-bubble genre, but this one from Cory Doctorow is really good, well-researched, with plenty of references and links to other interesting essays, papers, etc.
I think I first came across the tinytable R package last year, but never gave it much thought. I’m taking a look through the vignettes again and realizing I need to take a much closer look. Works with HTML, LaTeX, Word, PDF, PNG, Markdown, & Typst.
Lada Nuzhna: Where are all the trillion dollar biotechs? (editorial note from Stephen: pharma≠biotech).
Letters to the Editor (NYT): A.I. in School: What It Can and Can’t Do.
Towards Data Science: RAG Explained: Reranking for Better Answers. How reranking improves retrieval-augmented generation by surfacing the most relevant results.
GitHub Blog: GitHub Copilot gets smarter at finding your code: Inside our new embedding model. Learn about a new Copilot embedding model that makes code search in VS Code faster, lighter on memory, and far more accurate.
Point, counterpoint. Futurism: AI Coding Is Massively Overhyped. Simon Willison: Claude Sonnet 4.5 is probably the “best coding model in the world”.
Microsoft introduces Vibe Work. September 29, 2025 will be recorded as the zenith of the Peak of Inflated Expectations on the Garner Hype Cycle.
Finally, a few other papers and preprints that caught my attention this week:
Enhancing genome recovery across metagenomic samples using MAGmax
ProteinDJ: a high-performance and modular protein design pipeline
LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines (304 pages!)
High-accuracy SNV calling for bacterial isolates using deep learning with AccuSNV
Limited overlap between genetic effects on disease susceptibility and disease survival