Microsoft AI & Pizza talk- 13 June 2024

Dear Cambridge AI & Machine Learning Enthusiast,

The AI & Pizza talk series rolls on! We’re excited to announce our next AI & Pizza talks event, scheduled for at 5:30 pm on Thursday, June 13th, 2024. Save your date for the latest advancements in AI and machine learning, right here in Cambridge.

P.S., We are constantly looking for speakers from related fields. Please do reach out to us (chaoma@microsoft.com; wenbogong@microsoft.com ) if you are interested!

Location: The Small Lecture Theatre, 21 Station Rd

Date: Thursday, June 13th, 2024

Agenda:

5:30 pm – 6:00 pm: Talks

6:00 pm – 7:00 pm: Networking, Pizza, and Refreshments

Speaker: Meyer Scetbon (Microsoft Research)

Title: FiP: a Fixed-Point Approach for Causal Generative Modeling

5:30 pm – 5:45 pm

Abstract: Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard inverse problem in general. In this work, we propose a new and equivalent formalism that does not require DAGs to describe them, viewed as fixed-point problems on the causally ordered variables, and we show three important cases where they can be uniquely recovered given the topological ordering (TO). To the best of our knowledge, we obtain the weakest conditions for their recovery when TO is known. Based on this, we design a two-stage causal generative model that first infers the causal order from observations in a zero-shot manner, thus by-passing the search, and then learns the generative fixed-point SCM on the ordered variables. To infer TOs from observations, we propose to amortize the learning of TOs on generated datasets by sequentially predicting the leaves of graphs seen during training. To learn fixed-point SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems.

Bio: Meyer Scetbon is is a Researcher at Microsoft Research Cambridge, previously a Research Scientist Intern at Meta AI Paris. He earned his PhD from CREST – ENSAE, Institut Polytechnique de Paris in April 2023, supervised by Marco Cuturi. His research interests include foundation models, particularly their integration of causality with observable world understanding. His doctoral work focused on applying optimal transport to large-scale Machine Learning challenges and related applications.

Speaker: Wanru Zhao (University of Cambridge)

Title: Decentralised Collaborative Large Language Model: Multilingual Applications and Data Quality Control

5:45 pm – 6:00 pm

Abstract: Recent research has highlighted the importance of datasets in scaling large language models (LLMs); however, data is spread across silos and locations in different formats and is hard to find. This talk covers our recent works which explore decentralised collaborative development of LLMs as a solution for data-sharing constraints, which could benefit the broader public in the era of LLMs. Automated data quality control faces unique challenges in collaborative settings where data cannot be directly shared between different silos. To tackle this issue, we propose a novel data quality control technique based on training dynamics to enhance the quality of data from different private domains in the collaborative training setting, such as model merging.

Bio:
Wanru Zhao is a PhD student in the Department of Computer Science and Technology at the University of Cambridge. Her research currently focuses on decentralized collaborative development of modular large language models with cheaply communicable updates and data attribution.

About the author