Interview Cheatsheet

Default Baseline

“I’m optimizing for the smallest correct distributed baseline first.”
“Data partitioning and resume semantics are correctness problems, not just performance details.”
“I would only introduce a more complex parallelism axis when the current bottleneck is explicit.”
“Checkpoint frequency is an RPO and cost decision.”
“In a notebook I’ll preserve production boundaries even if I mock the backing systems.”

Additional defaults for roles involving protein, molecular, or genomic data.

Biotech data defaults

Biotech model defaults

bfloat16 over float16 for long-sequence stability
use_reentrant=False in activation checkpointing for transformer blocks
layer normalization only in equivariant networks; batch norm breaks equivariance
sequence length curriculum from short to long to avoid early OOM events

Biotech high-signal sentences

“Scaffold split is the correctness bar; random split is the optimism bar.”
“Missing labels in a bioassay database are not negatives—treating them as negatives is a silent training error.”
“Equivariant network training imposes preprocessing contracts: frame canonicalization must precede data sharding.”
“Wet lab hit rate is the only metric the business actually cares about; computational metrics exist to let us move faster while we wait for it.”
“Active learning turns the dataset contract from a precondition into an ongoing invariant managed by the orchestrator.”