Hi! I’m Chenxi Wang (Aurora), a second-year M.Sc. student in NLP at MBZUAI, supervised by Prof. Xiuying Chen. Prior to MBZUAI, I got my B.Eng. degree in Computer Science at Xi’an Jiaotong University.
My research focuses on human-centric AI. Integrating insights from cognitive science, psychology, and philosophy, I use interpretability methods to uncover the cognitive and reasoning mechanisms within large language models (LLMs). I believe that the knowledge and behaviors exhibited during default inference represent only a fraction of what these models truly contain. To unlock these hidden potentials, I take an interpretability-first post-training approach that awakens latent capabilities which do not spontaneously emerge, enabling models to adaptively express specific skills or reasoning patterns during inference without additional large-scale pretraining. Ultimately, my goal is to pave the way for personalized and human-aligned AI, building a virtuous cycle where human intelligence inspires AI, AI evolves through this understanding, and the progress of AI, in turn, better serves human well-being.
🌟 My current work explores interpretable post-training methods that make LLMs more adaptable, transparent, and cognitively grounded, enabling them to reason, communicate, and align more effectively with diverse human goals.
I welcome thoughtful discussions on dependency risk, and I’m happy to chat about philosophy of mind and what makes AI feel more human :)
Actively seeking PhD positions for Fall 2026. Please see my CV and feel free to get in touch.
