A hierarchical cross-entropy loss is presented, which incorporates ontology structure into training and improves the out-of-distribution performance of large-scale single-cell annotation models ...