The Rendeiro Lab Manual serves as a comprehensive guide to the lab’s
operations, culture, and best practices.
It encompasses essential protocols for maintaining an inclusive and
respectful work environment, while fostering innovation and
collaboration.
This document provides detailed instructions for onboarding new
members, structuring research projects, keeping records, and utilizing
modern computational tools and resources.
Key sections include guidelines for source code management, lab
notebook maintenance, data organization, manuscript preparation, and
project planning, emphasizing a shallow-pass strategy for iterative
research execution.
With a strong focus on reproducibility and efficiency, the manual
also outlines standards for effective communication, collaborative
efforts, and the use of cutting-edge technologies in data analysis and
visualization.
Designed to be a living resource, the manual ensures alignment with
CeMM’s standards and the lab’s mission of advancing molecular medicine
through robust and innovative science.
Lab manual
Welcome to the Rendeiro Lab Manual. This manual provides
comprehensive information about the lab’s culture, procedures, and
workflows to ensure a collaborative and efficient research
environment.
The manual is hosted in the lab-manual
repository on GitHub. It is written in Markdown and
can be converted to HTML and PDF using Pandoc.
This manual is open source and maintained collaboratively. Anyone on
GitHub can propose changes.
Building the manual
The project includes a Makefile
to streamline the development process.
Key targets include:
format: Formats Markdown files
consistently using mdformat.
build: Converts the manual into a
single HTML file using pandoc and generates a PDF file
using wkhtmltopdf.
clean: Removes generated files to
ensure a fresh build.
Styling for the manual is controlled by a custom CSS file, which ensures a nice appearance in
both HTML and PDF formats.
Editing content
To contribute: 1. Edit or create files directly on GitHub or locally
on your system. 2. Submit a pull request with a clear, one-line
description of the changes made. 3. Follow best practices by adding
reviewers and referencing related issues, if applicable.
For adding a table of contents to any document, use mdformat-toc. Insert
<!-- mdformat-toc start --> where the table of
contents should appear, and run mdformat <file.md> on
the edited file, or make format to format all.
Acknowledgements
We thank the following labs for sharing their open-source lab
manuals, which inspired this project:
The default email addressess have the form
<user>@cemm.oeaw.ac.at due to our relationship with
the Austrian Academy of Sciences. The short form
<user>@cemm.at can also receive emails and is
mandatory to access/register Microsoft Office 365 cloud resources:
OneDrive, Teams, Word, Excel, PowerPoint, etc.
Web portal: portal.office.com
Webmail: outlook.com/cemm.at
Forwarding CeMM emails to external services is not allowed.
Shared drives and
directories
Windows shares:
Groupdata: smb://int.cemm.at/files/groupdata/
Labs: smb://int.cemm.at/files/labs/
Home Folder:
smb://int.cemm.at/files/home/<username> Cluster
shares: smb://hpctransfer
Login with your CeMM credentials and the domain name (if asked) is
“cemmint”.
At the Rendeiro Lab, we are committed to unraveling the complexities
of human aging and pathology through the integration of computational
innovation and molecular precision. As part of CeMM and the Ludwig
Boltzmann Institute for Network Medicine, our mission is to decode the
architectural patterns of the human body—from cellular microenvironments
to organ-wide structures—and to understand how these patterns influence
health, aging, and the onset of disease. We aim to uncover the
mechanisms through which cellular and tissue-level changes contribute to
the progressive decline in physiological function, and to leverage this
knowledge to predict disease risk, facilitate early diagnosis, and
inspire transformative therapeutic strategies.
Our work is grounded in the development and application of
computational methods that analyze spatial data, including digital
pathology, spatial transcriptomics, and highly multiplexed imaging. By
integrating these high-dimensional data streams with molecular and
clinical insights across the human lifespan, we strive to answer key
questions: How do cellular alterations scale to tissue dysfunction? What
are the molecular underpinnings of age-related diseases? How can we
differentiate between normal aging and pathology? These questions guide
our efforts to generate actionable insights that bridge the gap between
fundamental research and practical applications in healthcare.
Through close collaboration with clinicians, pathologists, and
researchers across disciplines, we ensure that our findings are
validated in real-world contexts. We believe that understanding the
intricate connections between cellular dynamics and organ-level
architecture will lead not only to better interventions for
age-associated diseases but also to a deeper appreciation of the
processes that sustain human life. With a commitment to innovation,
collaboration, and real-world impact, the Rendeiro Lab aims to shape the
future of aging and pathology research for the benefit of society.
Philosophy
Our lab operates at the intersection of biomedicine, computational
biology, and systems biology research, guided by these core
principles:
Interdisciplinary collaboration: We believe that
tackling complex biological questions requires an interdisciplinary
approach. Our team fosters collaboration across molecular medicine,
computational biology, and clinical expertise to drive
innovation.
Curiosity-driven science: Fundamental curiosity
about the human body fuels our research. We pursue questions that
challenge traditional paradigms and push the boundaries of
knowledge.
Innovation and impact: From pioneering
tissue-specific aging clocks to developing AI-driven spatial methods, we
strive to create tools and insights with transformative potential for
science and medicine.
Inclusion and mentorship: We are committed to
creating an inclusive, supportive, and intellectually stimulating
environment. Mentorship and professional growth for all team members are
central to our philosophy.
Reproducibility and open science: We emphasize
rigorous, reproducible research and actively contribute to open science.
We aim to benefit the broader scientific community by sharing data and
tools.
Human-centric focus: While our methods are
primarily computationally driven, our ultimate goal is to improve human
health. We aim to bridge the gap between molecular and physiological
insights, and real-world diagnostic, therapeutic, and clinical
applications.
Code of conduct
Our lab aims to be a safe, professional, and encouraging place where
everybody feels respected, appreciated, and free to contribute equally.
All lab members must abide by the code of conduct:
Be respectful
No forms of harassment or discrimination are tolerated. All
individuals, regardless of their age, gender identity, racial or
ethnical background, sexual orientation, religion, culture, academic
record, personal background, disability status, economic status, or
mental health status, shall be treated with equal respect and
recognition.
Please follow the guidelines below:
treat others professionally and respectfully at all times;
exclusionary comments or jokes, threats or violent language are not
acceptable;
do not address others in an angry, intimidating, or demeaning
manner;
use the individual’s preferred pronouns, if known. Gender-neutral
language is welcome, particularly when the identity of the person is not
known or is irrelevant to the context (e.g. use ‘they’ for a manuscript
reviewer);
be kind and sensitive with respect to personal, family and health
issues of others, and provide moral support whenever possible;
be mindful of the privacy, personal space and belongings of
others;
ask if unsure how someone wants be referred to or treated.
Be professional
All members are expected to conduct themselves in a professional
manner. This involves being honest, with integrity, accountability, and
respectfulness to others.
Please follow the behaviours below:
conduct research work with integrity (i.e., responsibly, honestly
and respectfully);
provide and receive positive feedback and encouragement on research
ideas and results;
directly address issues as soon as possible;
acknowledge the contributions of others;
acknowledge your mistakes;
be prepared for meetings and other duties;
be punctual for all events;
do not gossip;
kindly point out any potential unsafe or incorrect work procedures
to others;
report incidents or concerns to Andre or other senior personell at
CeMM;
abide by the rules of CeMM.
Be proactive and inclusive
We can all be better if we help each other and do things for the good
of the lab.
The following is encouraged:
welcome, integrate, and include everyone;
ask questions - we’re all here to learn;
collaboration in research with other lab members;
equal participation by all other group members in group
discussions;
report issues and accept responsability in them to prevent similar
issues in the future;
suggestions of group activities to foster the lab community,
particularly those that encourage cultural exchange.
This code of conduct was inspired by and adapts parts from the
following:
Mondays, every week, scheduled via Outlook calendar
Prepare in advance
What are upcoming events (2-3 months; e.g. committee meeting,
conference, meetings with collaborators)
Progress and results
Other activities (including non-scientific like courses attended,
tutorials, infrastructure)
Next goals (list next steps in research and what tasks are needed to
accomplish them)
Roadblocks/needs (list critical needs that are/you anticipate are
blocking progress - especially things you need Andre to do. e.g. setup a
VM, learn about a framework, download lots of data, more space in
cluster)
Literature (indicate a paper you found interesting/relevant that
Andre should read or that should be discussed in a journal club)
The above can be shared with Andre in advance (e.g. in a markdown
file)
The meta layer / check-in:
Talk about how you are doing, progress, frustration, goals beyond
research
Once per month, more often if needed
The scheduled meetings should not prevent you from reaching out to
Andre at any time if needed (in person, via email, chat, phone)!
Lab meetings
Tuesdays, early week (12:30)
Schedule:
The schedule for the presentations is set by Andre and available here.
Format:
Housekeeping (announcements, scheduling)
Any update relevant to the lab by anyone
Quick round reviewing week progress (20 min max, 3-5 min per
person):
What did you try to achieve? (broadly and specifically)
What were the challenges?
One or more scheduled presentations (20-30 min) including:
Scientific question/hypothesis
Background (what has been done/can we build on)
Approach/methodology
Results
Outlook/next steps
Code review
Every 2 weeks (to begin, then maybe more sparse)
1h30 to 2h00 duration
The goal is to improve everyone’s programming skills/process - not
shaming!
Typical workflow:
Nominate one script/function to study (e.g. caused problems or is
inefficient, not Pythonic, etc…) OR ask Andre before hand to look at a
repository to select one piece of code to focus on.
Ideally only 200-400 lines of code at one time
Everyone studies the goals of the code and the implementation
Together try to assess:
Readability and documentation
Architecture
Re-usability
Additional: packaging, speed, security, etc…
End by highlighting one new Github repository that you find
cool/useful for the group - everyone quickly installs and explore
Occasionally/alternatively we can talk about the meta part: cluster
issues, installation, text editors, .bashrc setup, etc…
Strategic collaborative
projects (SCPs)
Everyone is encouraged to participate in more than one depending on
interest/relevance
SCP2: Aging & longevity:
One meeting per month
Two organizers (generally from De Rooij and Rendeiro labs)
Mailing list: scp2@cemmat.onmicrosoft.com
Send papers with the usual format/tag in subject:
[paper] <title>, and body:
<title> \n <url> \n <comment>
Hackatons
one afternoon every 2-3 months
ideas:
CeMMome (involve more people?):
Literature mining: papers, research reports, press releases
Social gatherins for example in celebrations of personal and
professional achievements are welcome and encouraged. They should
however follow the guidelines and rules from CeMM.
Lab infrastructure
This document describes the infrastructure used in the lab, including
CeMM-provided or our own infrastructure. So far it details only
computational infrastructure.
This guide can serve as a standalone reference for
project planning and fellowship writing. Each section builds on
conceptual, strategic, and practical aspects separately, but they should
be taken into consideration together for best results.
Background and concepts
Ideal vs
reality of designing a project for public funding
Basic vs applied science: Funding often requires
navigating the balance between fundamental discovery and real-world
application.
Funding agencies as…
Investors (ideal): They take on risk to enable
novel, groundbreaking ideas.
Consumers (reality): They seek ‘safe bets’ by
prioritizing projects with a track record of feasibility and
evidence.
Triangle of knowledge
Hierarchy: Data → Information → Knowledge →
Wisdom
Flow between layers:
Data is raw input (e.g., measurements, numbers).
Information is data contextualized with meaning.
Knowledge is synthesized information with clear insight.
Wisdom incorporates experience, values, and context beyond natural
science (includes personal experience, philosophy, and
metaphysics).
Strategies
Project design
Balancing feasibility and
novelty
Aim 1: Incremental research (low risk, highly
feasible).
Aim 2: Novel, higher-risk research (potential for
greater impact but riskier).
Profiling aim: Descriptive/observational, e.g.,
understanding how mutation X is linked to pathway A, or observing
population-level traits.
Perturbing aim: Functional/causal, e.g., testing if
gene X causes phenotype Y, or clinical trials.
The two strategies can be linked. For instance: Aim 1 is slightly
incremental with profiling only; Aim 2: more ambitious/novel based on
perturbations.
Shallow-pass
strategy for project execution
The Shallow-pass strategy is a systematic approach
to project execution that emphasizes early, rapid, and minimally viable
progress through a project’s full timeline, followed by iterative
deepening of depth (both conceptually and technically) as needed. It
draws on established principles from Agile development,
Rapid prototyping, Incremental
research, and Lean startup approaches.
Core concept
The strategy envisions a 2D space where:
Horizontal axis: Timeline from start to finish
(milestones, objectives, deliverables).
Vertical axis: Depth of conceptual understanding,
technical precision, or experimental thoroughness.
Instead of moving deeply through each individual
task in sequence (e.g., perfecting each aim before moving to the next),
the Shallow-pass strategy encourages a shallow, complete
pass over all key components of the project first.
This approach creates a “minimum viable version” of the entire
project, akin to a minimum viable product (MVP), and
allows for early identification of bottlenecks, feasibility issues, and
unknowns.
Once this shallow layer is complete, depth is added
step-by-step to specific areas where gains are most needed or
most valuable. This “depth-first” progression occurs only when justified
by clear metrics or emerging insights.
Execution
Initial pass (shallow path): The goal is to
complete a simple, end-to-end version of the project with minimal depth.
This quick, rough pass identifies critical barriers, risks, and
time-sinks. For example, instead of training a deep learning model on a
large dataset, you might start with a simple logistic regression on a
small subset, or run a pilot experiment to validate feasibility.
Iterative deepening (vertical progression): Revisit
earlier steps and selectively add depth to areas that yield the most
benefit. If an aspect works well at shallow depth, further refinement
may be unnecessary. For example, after a successful logistic regression,
you might increase depth by training a neural network, scaling up the
dataset, or adding multi-modal inputs.
When to stop (completion): Avoid the “infinite
perfection” trap by setting success metrics (e.g., 90%
model accuracy) at the outset. Once these criteria are met, stop. If
your classifier achieves 95% accuracy when the goal was 90%, additional
work may have little value. Completion is defined by sufficiency, not
perfection.
Common pitfalls and how
to avoid them
Pitfall
How to avoid
Going too deep, too soon
Focus on the shallow pass first. Move on even if the result isn’t
“perfect.”
Perfectionism
Set “success metrics” early. If you meet them, stop!
Sunk cost fallacy
Shallow-pass shows which paths are dead ends. Pivot early.
Failure to prioritize
Invest time in paths that provide the most “return on depth”.
Getting stuck at step 1
Even if step 1 is imperfect, keep moving to step 2.
Writing process
1. Planning and supervisor
coordination
Agree on timeline: Clarify deadlines for draft and
final versions.
Clarify expectations: What level of polish should
the draft have?
Define the review process: Involve other
stakeholders if needed (e.g., senior lab members).
Co-design the project: Discuss aims and approaches
early with the supervisor.
Accept constraints: Some projects are pre-defined
due to funding calls, and flexibility is limited.
Maintain perspective: The fellowship is often an
“academic exercise” — flexibility in practice is often possible once
funding is secured.
2. Content development
Start with the big picture
Identify the central question or goal.
“Walk back” from the goal to define specific aims,
objectives, and the path to achieving them.
Clarity is
everything
Be simple, clear, and precise. Over-complication weakens the
proposal.
Make abstract concepts tangible by showing concrete
outputs, metrics, and practical outcomes.
Is a hypothesis necessary?
Hypotheses are not always required, but without one, the proposal
may require stronger justification for the work.
3. Proposal structure
Top-down approach
Hypothesis/Objectives/Goals
Impact:
Why should this be addressed?
What impact will success have (on science, society, policy,
etc.)?
What will we learn even if the project fails?
Aims: Typically 2-3 aims.
Tasks for each aim (1-2 per aim):
Goal of the task
Why it matters
Required resources, data, or inputs
Methods, experiments, and tools required
Introduction (3-4 points):
The problem being addressed.
What has been tried previously.
Key technologies, datasets, or resources now available to address
the problem.
A hint of the specific approach you will take.
Challenges and mitigation
Think of the top challenges in the proposal design.
Separate them by conceptual and technical
Try to preemptively address them at design stage
4. Proposal writing process
Gradual, hierarchical writing
Expand each bullet point into 1-2 sentences in the order that you
designed them:
Hypothesis/objectives/goals
Impact
Aims/tasks (very shallowly)
Introduction
Build each section gradually.
Figures & visual aids
Sketch first: Draw rough concepts on paper or
whiteboard.
Collaborate: Co-sketch ideas with a supervisor or
team member.
Repurpose existing visuals: Use (but adapt) figures
from previous grants/papers.
Preliminary data: Include early evidence or
preliminary results where relevant.
5. Use of AI/LLMs (Large
Language Models)
Dos and Don’ts
Don’t outsource the thinking: AI works best after
you’ve defined a clear structure with bullet points and key ideas.
Provide context: Add as much detail as possible for
the AI (including PDFs of previous work).
Emphasize priorities: Explicitly state which points
are most important.
LLMs are tools, not authors: You are in control of
the ideas, while the AI is a writing assistant.
Why use LLMs?
Overcome writer’s block: It’s easier to edit text
than to create it from scratch.
Improve language quality: English polish can match
the top-tier writing of competitive grant applications.
Use parallel sessions: Write in one session and
criticize/review in another.
6. Feedback process
Be open to feedback: Supervisors may have different
perspectives, and their feedback often reflects reviewers’ likely
reactions.
Ask for clarification: If something feels unfair or
unclear, discuss it with the supervisor.
Learn from changes: Understand why edits were made
— this is where the learning happens.
Summary of key takeaways
Designing a project: Balance feasibility and
novelty. Balance profiling (description) and perturbation
(causation).
Executing a project: Use the “square triangle”
approach — shallow, fast progress first; deeper, slower progress
later.
Writing a proposal: Work step-by-step, define key
objectives first, and address the big picture.
Clarity and structure are essential for
success.
Involve supervisors early in project design, figure
development, and editing.
Leverage LLMs for support, but control the
process.
Learn from feedback — it’s one of the most valuable
aspects of the process.
Additional Resources
[!CAUTION] TODO
Research
Project life cycle
Initializing a project
Register your project in the lab’s project register:
Go to https://cemmat.sharepoint.com/sites/rendeirolab and find the
‘Lab project register.xlsx’ or directly at
https://cemmat.sharepoint.com/:x:/r/sites/rendeirolab/_layouts/15/Doc.aspx?sourcedoc=%7B4c72f84b-f33b-4162-a5e8-f05556fdf66b%7D&action=editnew
Start a new row for your project and increment the
Project ID by one.
Choose an intuitive name for the project, avoiding adding personal
information e.g. collaborator names.
Create a project directory structure from the lab’s template:
cookiecutter gh:rendeirolab/_project_template
Create a git repository on GitHub (https://github.com/rendeirolab)
with the same name as the project, make a first commit and push to
Github.
Create a directory for the project in the CeMM cluster at:
/research/groups/lab_rendeiro/projects/
Create a directory inside called data to store raw
data.
Create a directory for the project in the CeMM cluster at:
/nobackup/groups/lab_rendeiro/projects/ 2. Create a soft
link between /research/.../data and
/nobackup/.../data
Create a cemm_metadata.json file in
/research/groups/lab_rendeiro/projects/$PROJECT/
Create a cemm_metadata.json file in
/nobackup/groups/lab_rendeiro/projects/$PROJECT/
You can find various metadata JSON templates at
/research/groups/lab_rendeiro/projects/_templates/.
Make sure to maintain your metadata JSON files, in line with the
existing data on disk.
Reporting research
No powerpoint!
Publications
[!CAUTION] TODO
Authorship
[!CAUTION] TODO
Tools and
technologies of standard use within the lab
File types of digital data
Below are the preferred types of technology to be used in the
lab:
documentation: Markdown
programming language: Python
image files: OME-TIFF
tabular form files: CSV (or csv.gz), Parquet
plots: SVG, PDF
Engaging in new projects
[!CAUTION] TODO
Meta-science
The Night Science Series
(https://night-science.org/genome-biology-miniseries/)
The Night Science Podcast
(https://night-science.org/the-night-science-podcast/)
The importance of stupidity in scientific research
(https://doi.org/10.1242/jcs.033340)
Parachute use to prevent death and major trauma when jumping from
aircraft: randomized controlled trial
(http://dx.doi.org/10.1136/bmj.k5094)
Record keeping guidelines
This document serves as a comprehensive guide for maintaining and
organizing records within the lab. It covers the management of source
code, lab notebooks, shared files, and other key resources to ensure
efficient and consistent documentation of lab activities. Effective
record keeping is not only essential for scientific research by
facilitating reproducibility, collaboration, and project management, but
also to comply with requirements from funders and CeMM.
Source Code
The creation, updating, and maintenance of source code are conducted
on GitHub: Rendeiro Lab
GitHub. Each project is assigned its own GitHub repository, with its
name registered in the Lab
Project Registry.
Structuring Projects
Follow the guidelines outlined in the Lab Project
Template to structure and maintain repositories effectively.
Ensure that source code is kept up-to-date in the respective
repository for each project.
Lab Notebooks
It is essential to maintain detailed, up-to-date records of daily
activities. This includes:
Notes and sketches on ongoing projects.
Progress on analyses.
Logs of seminars attended, papers read, or other relevant
activities.
Digital and Physical Notes
Use markdown-based notes in Obsidian for
digital documentation.
Optionally, concatenate markdown notes and print them on sticky
paper for inclusion in physical lab notebooks.
Shared OneDrive Folder
The official platform for file sharing is the CeMM-provided OneDrive,
accessible here: Shared
Documents.
Best Practices
Update the folder regularly, especially when reaching milestones
such as paper submissions, talk presentations, or poster creations.
Organize files in a compartmentalized manner by subject (e.g.,
figures, presentations, documents, notes).
Prefix all file names with the format YYYY-MM-DD- to
ensure chronological order.
Create subfolders per lab member if necessary.
Presentations
Save all internal and external presentations in both
.pptx and .pdf formats.
CeMM has specific guidelines for data management, in particular for
projects that exist on the HPC cluster. Read more about it on the
intranet.
In particular, each project should have a
cemm_metadata.json file.
The research page also provides information
on this.
Learning
Asking questions
It is important to have independent learning skills, for example
finding information on your own. However, it is also important to engage
others and ask questions when you are stuck. While not knowing something
is okay, being unwilling to learn or asking questions without minimally
trying to find the answer yourself is not acceptable.
When asking questions, be sure to:
Show you’ve tried: Before asking, try to find the
answer yourself. This can be by searching the internet, reading the
documentation, or trying different things. If you’ve tried and failed,
then ask for help.
Be clear and specific: The more specific you are,
the more likely you are to get a specific answer. For example, if you
are asking about a specific error message, include the error message in
your question.
Be polite and patient: People are more likely to
help you if you are polite and patient. Remember that people are
volunteering their time to help you.
Be ready to give details: If someone asks you a
question to understand your problem better, be ready to answer it with
details and don’t second guess the reason for the questions asked.
Be ready to help: Be ready to help someone else in
the future. No matter how much you (think you) know, there is always
someone who knows less than you and who could benefit from your
help.
Topics to learn
As a member of the lab you are responsible to be aware, understand,
and eventually master the following topics:
Data Science / statistics:
Mean / variance relationship
Homo/heteroskedastic variables
Common data transformations: log, normalize by total, centering and
scaling
Note that learning is a iterative and continuous process. You will
need to revise and revise your knowledge in each topic over the time,
each time getting a deeper understanding or a new perspective on it.
Literature to know
Reviews
“Deep learning in histopathology: the path to the clinic”, Jeroen
van der Laak, Geert Litjens & Francesco Ciompi, Nature Medicine
(2021), https://doi.org/10.1038/s41591-021-01343-4
“The emerging landscape of spatial profiling technologies”, Jeffrey
R. Moffitt, Emma Lundberg & Holger Heyn, Nature Reviews Genetics
(2022), https://doi.org/10.1038/s41576-022-00515-3
“Graph representation learning in biomedicine and healthcare”, Li,
M.M., Huang, K. & Zitnik, M. Nature Biomedicine (2022),
https://doi.org/10.1038/s41551-022-00942-x
Data types
H&E whole slide images (WSI):
Coudray et al. Classification and mutation prediction from non–small
cell lung cancer histopathology images using deep learning, Nat.
Medicine (2018), https://doi.org/10.1038/s41591-018-0177-5
Amgad et al. Structured crowdsourcing enables convolutional
segmentation of histology images, Bioinformatics (2019),
https://doi.org/10.1093/bioinformatics/btz083
IF (immunofluorescence)
IHC/ISH (immunohistochemistry)
Multiplex imaging methods:
mIF: Gerdes, M. J. et al. Highly multiplexed
single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue.
Proc. Natl. Acad. Sci. U. S. A. 110, 11982–11987 (2013).
https://doi.org/10.1073/pnas.1300136110
4i: Multiplexed protein maps link subcellular
organization to cellular states Gut et al. Science (2018),
https://doi.org/10.1126/science.aar7042
IMC: Giesen, C. et al. Highly multiplexed imaging
of tumor tissues with subcellular resolution by mass cytometry. Nat.
Methods 11, 417–422 (2014), https://doi.org/10.1038/nmeth.2869
MIBI: Angelo, M. et al. Multiplexed ion beam
imaging of human breast tumors. Nat. Med. 20, 436–442 (2014),
https://doi.org/10.1038/nm.3488
CyCIF: Lin, J.-R., Fallahi-Sichani, M. &
Sorger, P. K. Highly multiplexed imaging of single cells using a
high-throughput cyclic immunofluorescence method. Nat. Commun. 6, 8390
(2015), https://doi.org/10.1038/ncomms9390
CODEX: Goltsev, Y. et al. Deep profiling of mouse
splenic architecture with CODEX multiplexed imaging. Cell (2018)
https://doi.org/10.1016/j.cell.2018.07.010
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. &
Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA
profiling in single cells. Science 348, aaa6090 (2015),
https://doi.org/10.1126/science.aaa6090
Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in
tissues by RNA seqFISH. Nature 568, 235–239 (2019),
https://doi.org/10.1038/s41586-019-1049-y
Spatial transcriptomics (umbrella term):
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at
near-cellular resolution with SlideseqV2. Nat. Biotechnol. (2020),
https://10.1038/s41587-020-0739-1
Merritt, C. R. et al. Multiplex digital spatial profiling of
proteins and RNA in fixed tissue. Nat. Biotechnol. 38, 586–599 (5/2020),
https://doi.org/10.1038/s41587-020-0472-9
Salmén, F. et al. Barcoded solid-phase RNA capture for Spatial
Transcriptomics profiling in mammalian tissue sections. Nat. Protoc. 13,
2501–2534 (2018), https://doi.org/10.1038/s41596-018-0045-2
Digital pathology
Towards a general-purpose foundation model for computational
pathology (2024),
https://www.nature.com/articles/s41591-024-02857-3
A whole-slide foundation model for digital pathology from real-world
data (2024), https://www.nature.com/articles/s41586-024-07441-w
A Multimodal Generative AI Copilot for Human Pathology (2024),
https://www.nature.com/articles/s41586-024-07618-3
A visual-language foundation model for computational pathology
(2024), https://www.nature.com/articles/s41591-024-02856-4
A visual–language foundation model for pathology image analysis
using medical Twitter (2024),
https://www.nature.com/articles/s41591-023-02504-3
Chen et al. Pan-cancer integrative histology-genomic analysis via
multimodal deep learning Cancer Cell (2022),
https://10.1016/j.ccell.2022.07.004
Chen et al. Scaling Vision Transformers to Gigapixel Images via
Hierarchical Self-Supervised Learning (2022),
https://arxiv.org/abs/2206.02647
Lu et al. CLAM: Data-efficient and weakly supervised computational
pathology on whole-slide images (2021),
https://doi.org/10.1038/s41551-020-00682-w
Pati et al. HACT-Net: A Hierarchical Cell-to-Tissue Graph Neural
Network for Histopathological Image Classification (2020),
https://arxiv.org/abs/2007.00584
Specific literature
Genetics of tissue and organ
shape
Genetic and functional insights into the fractal structure of the
heart (2020), https:doi.org/10.1038/s41586-020-2635-8
Self-supervised learning for characterising histomorphological
diversity and spatial RNA expression prediction across 23 human tissue
types (2023), https://doi.org/10.1101/2023.08.22.554251
The sex of organ geometry (2024),
https://www.nature.com/articles/s41586-024-07463-4
Target discovery,
prioritization and drugs
“A guide to drug discovery”, collection at Nature Reviews Drug
Discovery: https://www.nature.com/collections/hkgvrspwtn
“Target selection in drug discovery”,
https://www.nature.com/articles/nrd986
“Multi-parameter phenotypic profiling: using cellular effects to
characterize small-molecule compounds”,
https://www.nature.com/articles/nrd2876
“Applications of single-cell RNA sequencing in drug discovery and
development”, https://www.nature.com/articles/s41573-023-00688-4
“Computational approaches in target identification and drug
discovery”,
https://www.sciencedirect.com/science/article/pii/S2001037016300058
“Moving targets in drug discovery”,
https://www.nature.com/articles/s41598-020-77033-x
“Drug target prediction through deep learning functional
representation of gene signatures”,
https://www.nature.com/articles/s41467-024-46089-y
Integrative multi-omics and drug response profiling of childhood
acute lymphoblastic leukemia cell lines,
https://www.nature.com/articles/s41467-022-29224-5 <- prototypical
study of “let’s throw all the omics at cell lines and pray”
Two-step multi-omics modelling of drug sensitivity in cancer cell
lines to identify driving mechanisms,
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0238961
<- prototypical study of “let’s take a bunch of data, train
classifiers and find anecdotal evidence”
“Network medicine for disease module identification and drug
repurposing with the NeDRex platform”,
https://www.nature.com/articles/s41467-021-27138-2 <- prototypical
study of “network of disease module will eventually find something”
“Genome-wide investigation of gene-cancer associations for the
prediction of novel therapeutic targets in oncology”,
https://www.nature.com/articles/s41598-020-67846-1 <- prototypical
study of “let’s put all the data together and pray”
“Multi-omics reveals clinically relevant proliferative drive
associated with mTOR-MYC-OXPHOS activity in chronic lymphocytic
leukemia”, https://www.nature.com/articles/s43018-021-00216-6 <-
example of great study on primary human samples (leukemia) combining
omics with functional readouts (drug sensitivity). Even if it is a great
study, it may miss the mark by focusing on a disease for which there are
already very good treatment options
“Systems immunology-based drug repurposing framework to target
inflammation in atherosclerosis”,
https://www.nature.com/articles/s44161-023-00278-y <- nice example of
a study wtih great readouts focused on atherosclerosis that goes
straight to drug discovery
Deep learning for image analysis,
https://www.embl.org/about/info/course-and-conference-office/events/mac23-01/
Practical deep learning for coders (Fast.ai),
https://course.fast.ai/,
https://www.youtube.com/watch?v=_QUEXsHfsA0&list=PLfYUBJiXbdtRL3FMB3GoWHRI8ieU6FhfM
Deep learning in life sciences (MIT),
https://www.youtube.com/playlist?list=PLypiXJdtIca5sxV7aE3-PS9fYX3vUdIOX
Introduction to Deep Learning - 170 Video Lectures from Adaptive
Linear Neurons to Zero-shot Classification with Transformers (Sebastian
Raschka), https://sebastianraschka.com/blog/2021/dl-course.html
Introduction to Coding Neural Networks with PyTorch and Lightning:
https://www.youtube.com/watch?app=desktop&v=khMzi6xPbuM&feature=youtu.be&s=09
Deep learning course (François Fleuret, University of Geneva)
(https://fleuret.org/dlc/?s=09)
The Ancient Secrets of Computer Vision (Univ. Washington)
(https://pjreddie.com/courses/computer-vision/,
https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p)
Software development: many paths to learning:
https://www.youtube.com/watch?v=66tfvFeALBQ
Software packages
These are some of the software packages we use often in the lab. You
can more easily be aware of the direction of their development and new
versions by subscribing to their releases on Github (bell sign ->
custom -> releases).
This document provides practical guidelines for writing a manuscript.
For insights on planning a project please see the project planning guide instead.
Manuscript writing should really be called “manuscript crafting” as
it involves a lot more than text writing and formatting or figure making
as it takes a lot of time, effort and skill to craft a good
manuscript.
Figures
A crucial part of crafting a great manuscript is good communication
of ideas through visual elements (figures), and their alignment with the
text.
Here are Andre’s tips for figure making based on practice. This has
changed a little over the time, but mostly coalesced on a fairly simple
system.
Making plots (Python,
matplotlib, seaborn):
Try to compartmentalize data processing from visualization (not just
the early ‘pipeline’-like processing, but also the analytical analysis)
- but that does not mean that
Each script produces a set of plots into a results
directory, with subdirectories matching the script or analysis name
(more hierarchical if needed, e.g. different datasets).
Each plot file name should be self explanatory and contain enough
information to track down its origin.
Except for obvious groupings or interrelations, avoid making figures
with many subplots.
For the most part aim for each subplot to have a square shape with
about 3 by 3 inches. If creating a figure with multiple subplots, scale
the figure size accordingly:
fig, axes = plt.subplots(4, 2, figsize=(2 * 3, 4 * 3)
Always label the axes, and make use of a single statement to set
many properties at once:
ax.set(xlabel="Time", ylabel="Expression (log)", yscale="log")
For plot elements with many objects (e.g. scatter), rasterize that
specific single element in order to reduce the size of the figure:
g = sns.clustermap(...); g.ax_heatmap.set(rasterized=True).
Do not rasterized the whole axes as that will make e.g. text element
uneditable.
Choose continuous colormaps for continuous variables (magma,
viridis). Use divergent colormaps (coolwarm, RdBu_r, PuOr_r) only when a
central value has meaning (e.g. a Z-score).
Recommend settings for
matplotlib
Inkscape: Export to SVG
import matplotlib.pyplot as pltplt.rcParams['savefig.bbox'] ='tight'# To ensure that legends and elements outside the axes are includedplt.rcParams['savefig.dpi'] =300# To make sure any rasterized elements have good qualityplt.rcParams['savefig.transparent'] =True# To remove the white backgroundplt.rcParams['svg.fonttype'] ='none'# To allow font to be editable in Inkscapeplt.rcParams['font.family'] ='Arial'# Use Arial as the font
Adobe Illustrator: Export to
PDF
This is similar as above, except that we change the font
settings,
import matplotlib.pyplot as pltplt.rcParams['savefig.bbox'] ='tight'plt.rcParams['savefig.dpi'] =300plt.rcParams['savefig.transparent'] =Trueplt.rcParams['pdf.fonttype'] =42plt.rcParams['ps.fonttype'] =42plt.rcParams['font.family'] ='Arial'
Assemble plots manually
into a figure:
Inkscape
Check the journal requirements and limitations for figures and their
dimensions. Use A4 by default, not letter.
Add a plot to the canvas, resize it to an approximate desired size,
remove all groupings, remove redundant objects, possibly despine plot
(i.e. remove top right, top axes). Make whole plot into a group (or
layer).
Use a consistent font family (Arial) for all text elements and only
one or two font sizes (12 and 10). You can use the Find/Replace tool
with Object types = 'text' to do this.
Add a lowercase letter label to each panel (font 16).
Add a label to the top of the Figure (i.e. Figure 1)
Name the figure files consistently Figure1.svg,
SupplementaryFigure1.svg
Post assembly, automatic assembly and conversion of file types (bash,
inkscape, minify, pdfunite):
Use generated individual PDFs to embed in a manuscript tex
file.
Use generated individual PNGs to embed in a manuscript file like
docx.
Use the joint PDFs to print, share or submit to journal.
Only in very rare cases it is worth it to have a full final figure
generated as a whole.
Adobe Illustrator:
Open PDF in Adobe Illustrator
Select everything:
Mac: ⌘ Command + A
Windows: Ctrl + A
Go to Object -> Clipping Mask ->
Release, press until it can’t be pressed anymore:
Mac: ⌥ Option + ⌘ Command + 7
Windows: Alt + Ctrl + 7
Congratulations! Now you have a clean plot ready for editing.
[!CAUTION] Removing clipping masks may affect elements, such as bars
that extend offscreen.
Text
Manuscript text is usually written in Microsoft Word (docx) or
LibreOffice Writer (odt). Latex is also supported, talk with Andre about
starting with a LaTeX template.
Formatting
Use styles to format your text. Do not use whitespace (e.g. newlines
and spaces) in the text to format the document.