I am a third year Ph.D. student at School of Mathematical Sciences, Shanghai Jiao Tong University (SJTU). Before that, I received my Bachelor degree from Zhiyuan College of SJTU in 2021.

I am currently advised by Prof. Zhiqin Xu. My research interests are in understanding deep learning from training process, loss landscape, generalization and application, and also the interpretability of large language models.

📝 Publications

* denotes equal contribution, † denotes corresponding author, see full list in Google Scholar.

TPAMI
sym

Implicit Regularization of Dropout

Zhongwang Zhang, Zhi-Qin John Xu†

  • This paper proposes a theoretical derivation of an implicit regularization of dropout, which is validated through experiments and numerically studied to understand how dropout improves generalization during neural network training by promoting weight condensation and finding flatter solutions.
ICLR2024
sym

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Zhongwang Zhang, Yuqing Li†, Tao Luo†, Zhi-Qin John Xu†

  • This paper proposes a rigorous theoretical derivation of the stochastic modified equations to approximate the discrete iterative process of dropout and empirically investigates the mechanisms by which dropout facilitates the identification of flatter minima through intuitive approximations exploiting the structural analogies in the Hessian of loss landscape and the covariance of dropout.
Arxiv
sym

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu†

  • This paper investigates the mechanisms of how transformers behave on unseen compositional tasks using anchor functions, revealing that the parameter initialization scale determines whether the model learns inferential solutions that capture the underlying compositional primitives or symmetric solutions that simply memorize mappings, and provides insights into the role of initialization scale in shaping the type of solution learned and their ability to generalize compositional functions.
Arxiv
sym

Anchor function: a type of benchmark functions for studying language models

Zhongwang Zhang*, Zhiwei Wang*, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu†

  • This paper introduces the concept of an anchor function, a type of benchmark function designed for studying language models in learning tasks that follow an “anchor-key” pattern, which serves as a valuable and accessible framework for exploring various tasks, and demonstrates its utility by revealing two basic operations by attention structures in language models: shifting tokens and broadcasting one token from one position to many positions.
Neurips 2021 spotlight
sym

Embedding principle of loss landscape of deep neural networks

Yaoyu Zhang†, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu†

  • This paper proves an embedding principle that the loss landscape of a deep neural network (DNN) contains all the critical points of narrower DNNs, and proposes a critical embedding such that any critical point of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy while preserving the DNN output function, providing a new perspective to study the general easy optimization of wide DNNs and unraveling a potential implicit low-complexity regularization during training.

📖 Educations

  • 2021.09 - now, Ph.D., School of Mathematical Sciences, Shanghai Jiao Tong Univeristy.
  • 2017.09 - 2021.06, Undergraduate, Zhiyuan College, Shanghai Jiao Tong Univeristy.