Skip to content

Mononito Goswami

Applied Scientist at Amazon · Seattle, WA

I build AI agents that reason reliably over complex real-world data — and develop the evaluation methodology to know when they are trustworthy enough to deploy. I'm an Applied Scientist at Amazon, where I helped lead the science behind the AWS DevOps Agent, a continually learning system for incident response and software operations.

My research spans foundation models (MOMENT, ICML 2024, 2.5M+ downloads), evaluation science for AI systems (TimeSeriesGym, TimeSeriesExamAgent, AQuA), and agents for high-stakes decision-making in domains from software engineering to healthcare. Foundation models provide the reasoning backbone; rigorous evaluation provides the trust layer — what I think of as the science of agents.

I completed my Ph.D. in Robotics at Carnegie Mellon University (advised by Artur Dubrawski) and received the Robotics Institute Distinguished Dissertation Award. Previously at Google Research and AWS AI Labs.

Mononito Goswami

News

Selected Publications

Agents

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization
Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock
ArXiv Preprint, 2025
Dense embedding retrieval for localizing code changes from natural language issue descriptions in large software repositories.
TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents
Yifu Cai, Xinyu Li, Mononito Goswami, Michał Wiliński, Gus Welter, Artur Dubrawski
ArXiv Preprint, 2025
A benchmarking environment for evaluating ML engineering agents on the full time-series modeling stack, from data exploration and modeling to testing and deployment.

Foundation Models

Chronos-2: From Univariate to Universal Forecasting
Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, Michael Bohlke-Schneider
ArXiv Preprint, 2025
Extends the Chronos forecasting model to handle multivariate, covariate-conditioned, and probabilistic forecasting in a single unified architecture.
Exploring Representations and Interventions in Time Series Foundation Models
Michał Wiliński, Mononito Goswami, Willa Potosnak, Nina Żukowska, Artur Dubrawski
International Conference on Machine Learning (ICML), 2025
First to show that time series foundation models learn interpretable concepts (trends, seasonality) despite self-supervised training, and that these representations can be targeted for intervention.
MOMENT: A Family of Open Time-series Foundation Models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski
International Conference on Machine Learning (ICML), 2024
One of the first open-source time series foundation models. 2.5M+ downloads on HuggingFace, 700+ GitHub stars.

Evaluation Science

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Małgorzata Gwiazda, Yifu Cai, Mononito Goswami, Arjun Choudhry, Artur Dubrawski
International Conference on Learning Representations (ICLR), 2026
Benchmark for temporal reasoning in LLMs, with scalable task generation via LLM agents and item response theory.
AQuA: A Benchmarking Tool for Label Quality Assessment
Mononito Goswami, Vedant Sanil, Arjun Choudhry, Arvind Srinivasan, Chalisa Udompanyawit, Artur Dubrawski
Neural Information Processing Systems (NeurIPS), 2023 Datasets and Benchmarks Track
A comprehensive benchmarking tool for evaluating label error detection methods across diverse datasets and annotation types.
Unsupervised Model Selection for Time-series Anomaly Detection Spotlight
Mononito Goswami, Cristian Challu, Laurent Callot, Lenon Minorics, Andrey Kan
International Conference on Learning Representations (ICLR), 2023
Shows that ensembles of unsupervised heuristics and weak supervision can accurately select anomaly detection models without any ground-truth labels.
View all publications →

Awards & Honors

View all awards & honors →