Interview Cheatsheet
Default Baseline
Section titled “Default Baseline”- one process per GPU
- DDP before FSDP unless memory forces otherwise
- deterministic distributed sampler
- rank-0 concise logs, all-rank structured errors
- full-job restart from latest good checkpoint
Must-Save Checkpoint State
Section titled “Must-Save Checkpoint State”- model
- optimizer
- scheduler
- AMP scaler
- sampler progress
- RNG state
- step / epoch / config metadata
Metrics To Mention Fast
Section titled “Metrics To Mention Fast”- step time by phase
- samples/sec
- data-loader wait
- all-reduce time
- checkpoint latency
- restart count
High-Signal Sentences
Section titled “High-Signal Sentences”- “I’m optimizing for the smallest correct distributed baseline first.”
- “Data partitioning and resume semantics are correctness problems, not just performance details.”
- “I would only introduce a more complex parallelism axis when the current bottleneck is explicit.”
- “Checkpoint frequency is an RPO and cost decision.”
- “In a notebook I’ll preserve production boundaries even if I mock the backing systems.”
Biotech Addendum
Section titled “Biotech Addendum”Additional defaults for roles involving protein, molecular, or genomic data.
Biotech data defaults
- scaffold split over random split; time split for prospective validation
- missing bioassay labels are not negatives—mask them out of the loss
- per-task AUROC, BEDROC, and EF@1% over accuracy and raw loss
- sparse matrix row-slicing in
__getitem__for genomic datasets
Biotech model defaults
- bfloat16 over float16 for long-sequence stability
use_reentrant=Falsein activation checkpointing for transformer blocks- layer normalization only in equivariant networks; batch norm breaks equivariance
- sequence length curriculum from short to long to avoid early OOM events
Biotech high-signal sentences
- “Scaffold split is the correctness bar; random split is the optimism bar.”
- “Missing labels in a bioassay database are not negatives—treating them as negatives is a silent training error.”
- “Equivariant network training imposes preprocessing contracts: frame canonicalization must precede data sharding.”
- “Wet lab hit rate is the only metric the business actually cares about; computational metrics exist to let us move faster while we wait for it.”
- “Active learning turns the dataset contract from a precondition into an ongoing invariant managed by the orchestrator.”