Tag: safety

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping (20 Jul 2023)
How can Interpretability Help Alignment? (28 May 2020)
What is Interpretability? (17 Mar 2020)