I am a Ph.D. student in Computer Science at the National University of Singapore, advised by Prof. Yang You. I received my B.Sc. in Applied Mathematics from Fudan University and M.Sc. in Statistics from NUS. I was also a research intern at Harvard University. I am currently working on speech-text diffusion models, scalable MoE architectures, and large-scale pretrained models.
Diffusion Large Language ModelsEfficient LLM Scaling with Mixture of ExpertsMultimodal Foundation Model Adaptation
News
May 2026Released three new papers: OrScale (code) on orthogonalized optimization with layer-wise trust-ratio scaling, AsyncLane on diffusion language model decoding, and ECUpcycle on slice-structured expert-choice diffusion LM upcycling.
Apr 2026Started research internship at InclusionAI (Ant Group), working on diffusion large language models.
Apr 2026Project lead for the 2026 Google Research Award (PI: Prof. Yang You) — "Pioneering the Systems Foundations for Diffusion-Based Large Language Models on TPUs", up to $100K in Google Cloud credits.
Apr 2026Released ClawFinetune — a portable fine-tuning skill package for agent-driven workflows on Tinker and the HPC-AI SDK.
Feb 2026Released ClawTrade — secure multi-broker trading middleware for AI agents.
@article{ren2026asynclane,
title = {AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding},
author = {Ren, Yingxuan and Lou, Yuxuan and Liu, Yong and Fang, Pengcheng and Wang, Ziming and Zhou, Pengfei and You, Yang},
journal = {arXiv preprint arXiv:2606.08411},
year = {2026}
}
ECUpcycle: Slice-Structured Expert-Choice Diffusion Language Model Upcycling
Yong Liu*, Yuxuan Lou*, Hailun Xu, Yingxuan Ren, Haoying Li, Kanchan Sarkar, Kun Xu, Yingyan Celine Lin, Cho-Jui Hsieh, Yang You
2026
DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
Yuxuan Lou*, Ziming Wu*, Yaochen Wang, Yong Liu, Yingxuan Ren, Fuming Lai, Shaobing Lian, Jie Tang, Yang You
@article{lou2026diffuspeech,
title = {DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion},
author = {Lou, Yuxuan and Wu, Ziming and Wang, Yaochen and Liu, Yong and Ren, Yingxuan and Lai, Fuming and Lian, Shaobing and Tang, Jie and You, Yang},
journal = {arXiv preprint arXiv:2601.22889},
year = {2026}
}
MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model
@inproceedings{lou2026most,
title = {MoST: Modality-Aware Mixture of Experts for Efficient Speech-Text Foundation Model},
author = {Lou, Yuxuan and Yang, Kai and You, Yang},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2026}
}
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Geng Zhang, Yuxuan Han, Yuxuan Lou, Yiqi Zhang, Wangbo Zhao, Yang You
@inproceedings{zhang2026mone,
title = {MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE},
author = {Zhang, Geng and Han, Yuxuan and Lou, Yuxuan and Zhang, Yiqi and Zhao, Wangbo and You, Yang},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}
EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI
Tomoyuki Kagaya*, Yuxuan Lou*, Thong Jing Yuan*, Subramanian Lakshmi*, Jayashree Karlekar, Sugiri Pranata, Natsuki Murakami, Akira Kinose, Koki Oguri, Felix Wick, Yang You
@article{kagaya2024envbridge,
title = {EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI},
author = {Kagaya, Tomoyuki and Lou, Yuxuan and Yuan, Thong Jing and Lakshmi, Subramanian and Karlekar, Jayashree and Pranata, Sugiri and Murakami, Natsuki and Kinose, Akira and Oguri, Koki and Wick, Felix and You, Yang},
journal = {arXiv preprint arXiv:2410.16919},
year = {2024}
}
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Tomoyuki Kagaya*, Yuxuan Lou*, Thong Jing Yuan*, Subramanian Lakshmi*, Jayashree Karlekar, Sugiri Pranata, Natsuki Murakami, Akira Kinose, Koki Oguri, Felix Wick, Yang You
@article{kagaya2024rap,
title = {RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents},
author = {Kagaya, Tomoyuki and Lou, Yuxuan and Yuan, Thong Jing and Lakshmi, Subramanian and Karlekar, Jayashree and Pranata, Sugiri and Murakami, Natsuki and Kinose, Akira and Oguri, Koki and Wick, Felix and You, Yang},
journal = {arXiv preprint arXiv:2402.03610},
year = {2024}
}
@article{lou2022crosstoken,
title = {Cross-token Modeling with Conditional Computation},
author = {Lou, Yuxuan and Xue, Fuzhao and Zheng, Zangwei and You, Yang},
journal = {arXiv preprint arXiv:2109.02008},
year = {2022}
}
Funded Research
2026 Google Research Award
Project Lead — PI: Prof. Yang You
Apr 2026
"Pioneering the Systems Foundations for Diffusion-Based Large Language Models on TPUs", up to $100K in Google Cloud credits. Project lead under PI Prof. Yang You; resulting publications: DiffuSpeech, AsyncLane, ECUpcycle.
Research Experiences
Diffusion-based Speech-Text Language Model
NUS - Tencent
Sep 2025 – Mar 2026
Developed DiffuSpeech, the first diffusion-based speech-text language model supporting both understanding and generation, introducing a "Silent Thought, Spoken Answer" paradigm where internal text reasoning informs spoken responses. Unified discrete text and tokenized speech under a single masked diffusion framework with modality-specific masking schedules. Constructed ThinkingTalk, the first speech QA dataset with paired text reasoning traces (26K samples, 319 hours), achieving state-of-the-art speech-to-speech QA accuracy (+9 points over best baseline) and best TTS quality among generative models (6.2% WER).
Efficient Foundation Models with Mixture of Experts
NUS - Apple
Sep 2024 – May 2025
Developed MoST, a novel speech-text foundation model featuring a Modality-Aware Mixture of Experts (MAMOE) architecture; achieved competitive performance across multiple speech-text benchmarks using exclusively open-source data. Designed modality-specific expert routing strategies and shared-expert configurations that improve cross-modal knowledge integration while maintaining inference efficiency. MoST was accepted by ICML 2026.
Mixture of ExpertsFoundation ModelsSpeech-Text
Multimodal LLM Agent with Retrieval Augmented Planning
NUS - Panasonic
Oct 2023 – May 2024
Developed RAP, a multimodal planning agent leveraging past successful experiences to enhance decision-making. Developed EnvBridge, a multimodal embodied agent transferring knowledge across diverse environments. Achieved SOTA on text-only environments (ALFWorld, Webshop) and significant improvements on multimodal robotics benchmarks (Franka Kitchen, Meta-World, RLBench).
LLM AgentsRetrieval-Augmented PlanningEmbodied AICross-Environment Transfer
Vision Model Scaling with Mixture of Experts
HPC-AI Lab
Mar 2021 – Jan 2022
Developed large-scale vision models: Sparse-MLP, Widenet based on Mixture of Experts. Proposed a fully-MLP architecture with conditional computation in two directions and extended MoE to the spatial dimension of image representation.
Mixture of ExpertsVision ModelsConditional Computation
Making large AI models cheaper, faster, and more accessible. A collection of parallel components for distributed training. Managed and contributed to Colossal-AI examples.
Portable fine-tuning skill package for agent-driven workflows; generates and launches LoRA SFT runs on Tinker and the HPC-AI SDK with preflight validation and run-state tracking.
Working on diffusion large language models at InclusionAI, the open-source AI research arm of Ant Group.
Diffusion LLMFoundation Models
Tencent
Research Intern
Sep 2025 – Mar 2026
Developed DiffuSpeech, the first diffusion-based speech-text language model supporting both understanding and generation. Collaborated on building the ThinkingTalk dataset and speech-text diffusion framework.
Diffusion LLMSpeech-TextMultimodal Reasoning
HPC-AI Tech
Research Intern
Feb 2022 – Dec 2022
Colossal-AI and Colossal-AI Examples co-developer. Built an open domain dialog system with internet knowledge augmentation.
Colossal-AIDistributed TrainingDialog Systems
Interactive Entertainment Group, Tencent
Machine Learning Engineer Intern
Mar 2020 – Jun 2020
Data mining and cleaning of users' comments. Built ML models for emotion classification and deep learning abstractive text summarization.
TensorFlow, JAX/Flax, Keras (Google Cloud TPU pods)
Parallel Training & Optimization
Model parallel, tensor parallel, pipeline parallel, sequence parallel, data parallel, mixture-of-experts parallel training
Activities & Interests
During my undergraduate years, I was a member of the debate team of the Mathematical Science school. I really enjoyed presenting evidence, defending and questioning arguments, and developing persuasion techniques. We won the 2017 Fudan Debating Championship.
I also served as a volunteer mathematics teacher for kindergarten children in Yangpu district, introducing basic mathematical concepts and knowledge.
In my free time, I enjoy reading science fiction. The Three-Body Problem and the Foundation series are my favourites.