Yuxuan Lou

National University of Singapore, Singapore
yuxuanlou@u.nus.edu | +65 82600153
yuxuanlou.info | Google Scholar | Github

Here is Yuxuan Lou, a Ph.D student majoring in Computer Science at NUS. My advisor is Prof. Yang You. I received my B.Sc. in Applied Mathematics from Fudan University and M.Sc. in Statistics from NUS. My research interests lie in efficient large language model scaling with Mixture of Experts, multimodal foundation model adaptation, and multimodal embodied LLM agents. I am currently working on scalable machine learning algorithms and large-scale pretrained models. You can also check my CV here or visit my Google Scholar Page for further information.

Research Interest

Efficient large language model scaling with Mixture of Experts
Multimodal Foundation Model Adaptation from large language models
Multimodal Embodied LLM Agent

Education

National University of Singapore, Singapore

Ph.D. in Computer Science

School of Computing, HPC-AI Lab

Advised by Prof. Yang You

2023 – Present

National University of Singapore, Singapore

M.Sc. in Statistics

School of Statistics and Probability

2020 – 2022

Fudan University, Shanghai, China

B.Sc. in Applied Mathematics

School of Mathematical Science

2016 – 2020

Selected Publications

MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model

Yuxuan Lou, Kai Yang, Yang You

2025

[Project Page]

MoRS: Distill Large Language Model into Mixture of Reasoning Students

Yuxuan Lou, Yang You

2025, In Submission

EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI

Tomoyuki Kagaya*, Yuxuan Lou*, Thong Jing Yuan*, Subramanian Lakshmi*, Jayashree Karlekar, Sugiri Pranata, Natsuki Murakami, Akira Kinose, Koki Oguri, Felix Wick, Yang You

2024

[arXiv]

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

Tomoyuki Kagaya*, Yuxuan Lou*, Thong Jing Yuan*, Subramanian Lakshmi*, Jayashree Karlekar, Sugiri Pranata, Natsuki Murakami, Akira Kinose, Koki Oguri, Felix Wick, Yang You

2024

[arXiv]

Cross-token Modeling with Conditional Computation

Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

2022

[arXiv]

Research Experiences

Speech-Text Foundation Model with Mixture of Experts

NUS - Apple

Developed MoST, a novel speech-text foundation model featuring a Modality-Aware Mixture of Experts (MAMOE) architecture which directs tokens to specialized pathways for enhanced cross-modal understanding. Engineered an efficient, three-stage transformation pipeline to adapt a pre-trained Mixture of Experts (MoE) language model for speech-text tasks. Achieved competitive performance across multiple speech-text benchmarks using exclusively open-source data, contributing to reproducible AI research through full code and model release.

Key Words: Multimodal Foundation Models, Mixture of Experts, Speech-Text Processing, Cross-modal Understanding

Jan 2025 - May 2025

Mixture of Reasoning Students Distilled from Dense Model

NUS - Apple

Developed MoRS (Mixture of Reasoning Students), a four-stage distillation method that compresses large language models (70B parameters) into efficient mixture-of-experts architectures (12B parameters, 3B activated parameters) while preserving specialized reasoning capabilities across multiple domains. Achieved better or comparable results compared with comparable models by significant margins - up to +14.5% on reasoning benchmarks (ARC Challenge: 78.0%, MMLU: 62.2%, HumanEval: 40.4%) while requiring fewer training tokens than competitors. Created the first framework to distill dense language models into MoE architectures without relying on pre-existing small models.

Key Words: Knowledge Distillation, Mixture of Experts, Model Compression, Reasoning Capabilities

Sep 2024 – Jan 2025

Multimodal LLM Agent with Retrieval Augmented Planning

NUS - Panasonic

Developed RAP, a Multimodal planning agent which leverages past successful experiences to enhance decision-making process. Developed EnvBridge, a Multimodal embodied agent which can transfer knowledge from diverse embodied environments and enhance planning ability. SOTA results on text-only environments (ALFWorld, Webshop), Significant improvements on multimodal robotics benchmarks (Franka Kitchen, Meta-World, RLBench).

Key Words: Multimodal LLM Agents, Retrieval-Augmented Planning, Embodied AI, Cross-Environment Transfer

Oct 2023 - May 2024

Vision Model Scaling with Mixture of Experts

HPC-AI Lab

Developed large-scale vision models: Sparse-MLP, Widenet based on Mixture of Experts. Proposed a fully-MLP architecture with conditional computation in two directions and extended MoE to spatial dimension of image representation.

Key Words: Mixture of Experts, Vision Models, Conditional Computation, Large-scale Architecture Design

Mar 2021 – Jan 2022

Open Source Projects

Colossal-AI: Making large AI models cheaper, faster, and more accessible

41k stars

A collection of parallel components for distributed training of large deep learning models. Managed and contributed to Colossal-AI examples.

awesome mixture-of-experts

1.2k stars

A collection of awesome Mixture of Experts papers and projects.

MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model

Official implementation of MoST, a novel speech-text foundation model featuring a Modality-Aware Mixture of Experts (MAMOE) architecture which directs tokens to specialized pathways for enhanced cross-modal understanding.

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

Official implementation of RAP, a Multimodal planning agent which leverage past successful experiences to enhance decision-making process.

Professional Experience

HPC-AI Tech

Deep Learning Engineer

Colossal-AI, Colossal-AI Examples co-developer. Building the open domain dialog system with internet knowledge augmentation.

Key Words: Colossal-AI, Open domain dialog, Internet knowledge agumentation

February 2022 - December 2022

Interactive Entertainment Group, Tencent

Machine Learning Engineer Intern

Data mining and data cleaning of users' comments on specific games. Built machine learning models to classify the emotional levels of comments. Built deep learning abstractive text summarization models to extract the summary of comment contexts.

Key Words: Data mining, Emotion classification, Abstractive text summarization

March 2020 - June 2020

Skills & Technologies

Deep learning libraries: PyTorch, TensorFlow, Keras, DeepSpeed, Colossal-AI
Parallel Training & Optimization: Model parallel, sequence parallel, data parallel training on GPU/TPU clusters
Programming Languages: Python, R, C++, Pascal
Software: Matlab, LaTeX, MS Office
Database: SQL, Spark

Activities & Interests

During my undergraduate years, I was a member of the debate team of Mathematical Science school. I really enjoyed presenting evidence, defending and questioning arguments and developing persuasion techniques. And we won 2017 Fudan Debating Championship.

Besides, I served as a volunteer mathematical teacher for kindergarten children in Yangpu district. I had a lot of fun introducing basic mathematical concepts and knowledge to children.

In my free time, I enjoy reading science fictions. The three-body problem and Foundation series are my favourites