Mononito Goswami

Applied Scientist at Amazon · Seattle, WA

I build AI agents that reason reliably over complex real-world data — and develop the evaluation methodology to know when they are trustworthy enough to deploy. I'm an Applied Scientist at Amazon, where I helped lead the science behind the AWS DevOps Agent, a continually learning system for incident response and software operations.

My research spans foundation models (MOMENT, ICML 2024, 2.5M+ downloads), evaluation science for AI systems (TimeSeriesGym, TimeSeriesExamAgent, AQuA), and agents for high-stakes decision-making in domains from software engineering to healthcare. Foundation models provide the reasoning backbone; rigorous evaluation provides the trust layer — what I think of as the science of agents.

I completed my Ph.D. in Robotics at Carnegie Mellon University (advised by Artur Dubrawski) and received the Robotics Institute Distinguished Dissertation Award. Previously at Google Research and AWS AI Labs.

News

Jul 2026 Organizing Foundation Models for Structured Data at ICML 2026 (Seoul). Submissions open — send us your best work!
Apr 2026 Our Time Series in the Age of Large Models workshop at ICLR 2026 (Rio de Janeiro) wrapped up with 75 accepted papers.
Apr 2026 Our paper TimeSeriesExamAgent, on using LLM agents to generate temporal reasoning benchmarks at scale, accepted at ICLR 2026.
Oct 2025 Chronos-2 is out – the first zero-shot forecasting model that handles covariates. Already surpassed 11M downloads.
Jul 2025 Our ICML 2025 Workshop on Foundation Models for Structured Data drew 99 submissions and 500+ attendees.
Jul 2025 Spoke on the Breaking Into Industry panel at ICML 2025 about making the leap from academia to industry research.
Jul 2025 New work on understanding and steering representations inside time series foundation models accepted at ICML 2025.
Jun 2025 Joined Amazon Web Services as an Applied Scientist, leading development of the AWS DevOps Agent.
May 2025 Defended my Ph.D. at Carnegie Mellon and received the Robotics Institute Distinguished Dissertation Award (selected from ~35 graduating PhD students).
May 2025 TimeSeriesGym released – a scalable benchmark for evaluating ML engineering agents.
Oct 2024 MOMENT crossed 2.2 million downloads on HuggingFace.
Oct 2024 Three papers at NeurIPS 2024 workshops, including a Spotlight for TimeSeriesExam. Also received Best Paper Honorable Mention at ICAIF 2024.
Jul 2024 Spent the summer at Google Research (Athena) as a Student Researcher.
May 2024 MOMENT accepted at ICML 2024.
Mar 2024 Co-organized the AAAI Spring Symposium on Clinical Foundation Models at Stanford.
Jan 2024 JoLT won Best Student Abstract at AAAI 2024.
Sep 2023 AQuA, our benchmark for label quality assessment, accepted at NeurIPS 2023 Datasets & Benchmarks.
Jan 2023 Our work on unsupervised model selection for anomaly detection accepted as a Spotlight at ICLR 2023.
May 2022 Started as an Applied Scientist Intern at AWS AI Labs.
Sep 2021 Awarded the Center for Machine Learning and Health (CMLH) Fellowship.
Aug 2020 Started my Ph.D. in Robotics at Carnegie Mellon, advised by Artur Dubrawski.

Selected Publications

Agents

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock

ArXiv Preprint, 2025

Dense embedding retrieval for localizing code changes from natural language issue descriptions in large software repositories.

PDF

TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents

Yifu Cai, Xinyu Li, Mononito Goswami, Michał Wiliński, Gus Welter, Artur Dubrawski

ArXiv Preprint, 2025

A benchmarking environment for evaluating ML engineering agents on the full time-series modeling stack, from data exploration and modeling to testing and deployment.

PDF Code

Foundation Models

Chronos-2: From Univariate to Universal Forecasting

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, Michael Bohlke-Schneider

ArXiv Preprint, 2025

Extends the Chronos forecasting model to handle multivariate, covariate-conditioned, and probabilistic forecasting in a single unified architecture.

PDF

Exploring Representations and Interventions in Time Series Foundation Models

Michał Wiliński, Mononito Goswami, Willa Potosnak, Nina Żukowska, Artur Dubrawski

International Conference on Machine Learning (ICML), 2025

First to show that time series foundation models learn interpretable concepts (trends, seasonality) despite self-supervised training, and that these representations can be targeted for intervention.

PDF

MOMENT: A Family of Open Time-series Foundation Models

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski

International Conference on Machine Learning (ICML), 2024

One of the first open-source time series foundation models. 2.5M+ downloads on HuggingFace, 700+ GitHub stars.

PDF Code HuggingFace Data

Evaluation Science

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

Małgorzata Gwiazda, Yifu Cai, Mononito Goswami, Arjun Choudhry, Artur Dubrawski

International Conference on Learning Representations (ICLR), 2026

Benchmark for temporal reasoning in LLMs, with scalable task generation via LLM agents and item response theory.

PDF

AQuA: A Benchmarking Tool for Label Quality Assessment

Mononito Goswami, Vedant Sanil, Arjun Choudhry, Arvind Srinivasan, Chalisa Udompanyawit, Artur Dubrawski

Neural Information Processing Systems (NeurIPS), 2023 Datasets and Benchmarks Track

A comprehensive benchmarking tool for evaluating label error detection methods across diverse datasets and annotation types.

PDF Code

Unsupervised Model Selection for Time-series Anomaly Detection Spotlight

Mononito Goswami, Cristian Challu, Laurent Callot, Lenon Minorics, Andrey Kan

International Conference on Learning Representations (ICLR), 2023

Shows that ensembles of unsupervised heuristics and weak supervision can accurately select anomaly detection models without any ground-truth labels.

PDF Code

View all publications →

Awards & Honors

2025 Robotics Institute Distinguished Dissertation Award — Carnegie Mellon University, Robotics Institute Graduate Student Awards
2025 RISS Outstanding Graduate Student Mentor and Engagement Award — Carnegie Mellon University, Robotics Institute Graduate Student Awards
2021 Center for Machine Learning and Health (CMLH) Fellowship — Carnegie Mellon University

View all awards & honors →