Yuxuan Lou
yuxuanlou@u.nus.edu | +65 82600153
yuxuanlou.info | Google Scholar | Github
Here is Yuxuan Lou, a Ph.D student majoring in Computer Science at NUS. My advisor is Prof. Yang You. I received my B.Sc. in Applied Mathematics from Fudan University and M.Sc. in Statistics from NUS. My research interests lie in efficient large language model scaling with Mixture of Experts, multimodal foundation model adaptation, and multimodal embodied LLM agents. I am currently working on scalable machine learning algorithms and large-scale pretrained models. You can also check my CV here or visit my Google Scholar Page for further information.
Research Interest
- Efficient large language model scaling with Mixture of Experts
- Multimodal Foundation Model Adaptation from large language models
- Multimodal Embodied LLM Agent
Education
National University of Singapore, Singapore
National University of Singapore, Singapore
Fudan University, Shanghai, China
Selected Publications
MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model
MoRS: Distill Large Language Model into Mixture of Reasoning Students
EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Cross-token Modeling with Conditional Computation
Research Experiences
Speech-Text Foundation Model with Mixture of Experts
Developed MoST, a novel speech-text foundation model featuring a Modality-Aware Mixture of Experts (MAMOE) architecture which directs tokens to specialized pathways for enhanced cross-modal understanding. Engineered an efficient, three-stage transformation pipeline to adapt a pre-trained Mixture of Experts (MoE) language model for speech-text tasks. Achieved competitive performance across multiple speech-text benchmarks using exclusively open-source data, contributing to reproducible AI research through full code and model release.
Key Words: Multimodal Foundation Models, Mixture of Experts, Speech-Text Processing, Cross-modal Understanding
Mixture of Reasoning Students Distilled from Dense Model
Developed MoRS (Mixture of Reasoning Students), a four-stage distillation method that compresses large language models (70B parameters) into efficient mixture-of-experts architectures (12B parameters, 3B activated parameters) while preserving specialized reasoning capabilities across multiple domains. Achieved better or comparable results compared with comparable models by significant margins - up to +14.5% on reasoning benchmarks (ARC Challenge: 78.0%, MMLU: 62.2%, HumanEval: 40.4%) while requiring fewer training tokens than competitors. Created the first framework to distill dense language models into MoE architectures without relying on pre-existing small models.
Key Words: Knowledge Distillation, Mixture of Experts, Model Compression, Reasoning Capabilities
Multimodal LLM Agent with Retrieval Augmented Planning
Developed RAP, a Multimodal planning agent which leverages past successful experiences to enhance decision-making process. Developed EnvBridge, a Multimodal embodied agent which can transfer knowledge from diverse embodied environments and enhance planning ability. SOTA results on text-only environments (ALFWorld, Webshop), Significant improvements on multimodal robotics benchmarks (Franka Kitchen, Meta-World, RLBench).
Key Words: Multimodal LLM Agents, Retrieval-Augmented Planning, Embodied AI, Cross-Environment Transfer
Vision Model Scaling with Mixture of Experts
Developed large-scale vision models: Sparse-MLP, Widenet based on Mixture of Experts. Proposed a fully-MLP architecture with conditional computation in two directions and extended MoE to spatial dimension of image representation.
Key Words: Mixture of Experts, Vision Models, Conditional Computation, Large-scale Architecture Design
Open Source Projects
Colossal-AI: Making large AI models cheaper, faster, and more accessible
A collection of parallel components for distributed training of large deep learning models. Managed and contributed to Colossal-AI examples.
awesome mixture-of-experts
A collection of awesome Mixture of Experts papers and projects.
MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model
Official implementation of MoST, a novel speech-text foundation model featuring a Modality-Aware Mixture of Experts (MAMOE) architecture which directs tokens to specialized pathways for enhanced cross-modal understanding.
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Official implementation of RAP, a Multimodal planning agent which leverage past successful experiences to enhance decision-making process.
Professional Experience
HPC-AI Tech
Key Words: Colossal-AI, Open domain dialog, Internet knowledge agumentation
Interactive Entertainment Group, Tencent
Key Words: Data mining, Emotion classification, Abstractive text summarization
Skills & Technologies
- Deep learning libraries: PyTorch, TensorFlow, Keras, DeepSpeed, Colossal-AI
- Parallel Training & Optimization: Model parallel, sequence parallel, data parallel training on GPU/TPU clusters
- Programming Languages: Python, R, C++, Pascal
- Software: Matlab, LaTeX, MS Office
- Database: SQL, Spark
Activities & Interests
During my undergraduate years, I was a member of the debate team of Mathematical Science school. I really enjoyed presenting evidence, defending and questioning arguments and developing persuasion techniques. And we won 2017 Fudan Debating Championship.
Besides, I served as a volunteer mathematical teacher for kindergarten children in Yangpu district. I had a lot of fun introducing basic mathematical concepts and knowledge to children.
In my free time, I enjoy reading science fictions. The three-body problem and Foundation series are my favourites