I am a fourth-year Ph.D. student at School of Mathematical Sciences, Shanghai Jiao Tong University (SJTU). Before that, I received my Bachelor’s degree from Zhiyuan College of SJTU in 2021.
I am currently advised by Prof. Zhiqin Xu. My research interests are in understanding deep learning from the training process, loss landscape, generalization and application, and also the interpretability of large language models. If you’re interested in my research, please feel free to contact me (Wechat).
🔥 News
- 2024.09: 🎉🎉 I won the 2024 China National Scholarship!
- 2024.09: 🎉🎉 One paper accepted to NeurIPS 2024!
- 2024.01: 🎉🎉 One paper accepted to ICLR 2024!
- 2024.01: 🎉🎉 One paper accepted to TPAMI!
📝 Publications
* denotes equal contribution, † denotes corresponding author, see the full list in Google Scholar.
Implicit Regularization of Dropout
Zhongwang Zhang, Zhi-Qin John Xu†
- This paper proposes a theoretical derivation of an implicit regularization of dropout, which is validated through experiments and numerically studied to understand how dropout improves generalization during neural network training by promoting weight condensation and finding flatter solutions.
Stochastic Modified Equations and Dynamics of Dropout Algorithm
Zhongwang Zhang, Yuqing Li†, Tao Luo†, Zhi-Qin John Xu†
- This paper proposes a rigorous theoretical derivation of the stochastic modified equations to approximate the discrete iterative process of dropout and empirically investigates the mechanisms by which dropout facilitates the identification of flatter minima through intuitive approximations exploiting the structural analogies in the Hessian of loss landscape and the covariance of dropout.
Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu†
- This paper investigates the mechanisms of how transformers behave on unseen compositional tasks using anchor functions, revealing that the parameter initialization scale determines whether the model learns inferential solutions that capture the underlying compositional primitives or symmetric solutions that simply memorize mappings, and provides insights into the role of initialization scale in shaping the type of solution learned and their ability to generalize compositional functions.
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang*, Zhiwei Wang*, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu†
- This paper introduces the concept of an anchor function, a type of benchmark function designed for studying language models in learning tasks that follow an “anchor-key” pattern, which serves as a valuable and accessible framework for exploring various tasks, and demonstrates its utility by revealing two basic operations by attention structures in language models: shifting tokens and broadcasting one token from one position to many positions.
Embedding principle of loss landscape of deep neural networks
Yaoyu Zhang†, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu†
- This paper proves an embedding principle that the loss landscape of a deep neural network (DNN) contains all the critical points of narrower DNNs, and proposes a critical embedding such that any critical point of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy while preserving the DNN output function, providing a new perspective to study the general easy optimization of wide DNNs and unraveling a potential implicit low-complexity regularization during training.
📖 Educations
- 2021.09 - now, Ph.D., School of Mathematical Sciences, Shanghai Jiao Tong University.
- 2017.09 - 2021.06, Undergraduate, Zhiyuan College, Shanghai Jiao Tong University.