Microsoft AI & Pizza talk- 25 April 2024

Dear Cambridge AI & Machine Learning Enthusiast,

We’re thrilled to announce our next eagerly-anticipated AI & Pizza event! Mark your calendars for an insightful evening at 5:30 pm on Thursday, April 25th, 2024. It’s your chance to savour the latest advancements in AI and machine learning, right here in Cambridge. Of course, Pizza will be provided after the talk!

Location: The Auditorium, 21 Station Rd

Agenda:

5:30 pm – 6:00 pm: Talks

6:00 pm – 7:00 pm: Networking, Pizza, and Refreshments

Talk 1

Time: 5:30 pm-5:45 pm

Speaker: Samuel Holt

Title: Extending Large Language Models for Large Code Base Generation and Machine Learning Advances in Treatment Effect Analysis, Continuous-time Control and Symbolic Regression.

Abstract:

This talk covers (1) how large language models (LLMs) can be extended for large code base generation, (2) a novel approach to inferring unbiased treatment effects in longitudinal settings using a closed-form ordinary differential equation (ODE) instead of traditional neural network models, (3) machine learning (ML) advances in multi-modal transformers that can encode a dataset for symbolic regression, and ML advances in continuous-time control. First (1), Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, including HumanEval, significantly outperforming other coding methods in implementing the detailed user-specified task, and we provide valuable insights into the reasons for this performance gap. Second (2), inferring unbiased treatment effects has received widespread attention in the machine learning community. In recent years, our community has proposed numerous solutions in standard settings, high-dimensional treatment settings, and even longitudinal settings. While very diverse, the solution has mostly relied on neural networks for inference and simultaneous correction of assignment bias. New approaches typically build on top of previous approaches by proposing new (or refined) architectures and learning algorithms. However, the end result — a neural-network-based inference machine — remains unchallenged. In this paper, we introduce a different type of solution in the longitudinal setting: a closed-form ordinary differential equation (ODE). While we still rely on continuous optimization to learn an ODE, the resulting inference machine is no longer a neural network. Doing so yields several advantages such as interpretability, irregular sampling, and a different set of identification assumptions. Above all, we consider the introduction of a completely new type of solution to be our most important contribution as it may spark entirely new innovations in treatment effects in general. We facilitate this by formulating our contribution as a framework that can transform any ODE discovery method into a treatment effects method. Third (3), we propose a novel multi-modal transformer architecture that can encode an entire dataset, trained with PPO and fine-tuned at inference time to achieve a new state-of-the-art for the problem of symbolic regression, presenting a framework called Deep Generative Symbolic Regression. We also propose new state-of-the-art methods for (A) continuous-time control with observation costs and (B) continuous-time control with fixed delays. Both methods are model-based RL frameworks, with (A) using a probabilistic ensemble dynamics model and (B) using a newly proposed Neural Laplace dynamics model. In summary, all these works lay exciting foundations for future research in these areas.

Talk 2

Time: 5:45 pm-6:00 pm

Speaker: Ira J. S. Shokar

Title: Extending Deep Learning Emulation Across Parameter Regimes for Turbulent Flows

Abstract:

Given the computational expense associated with simultaneous multi-task learning, we leverage fine-tuning to generalise a transformer-based network emulating dynamical systems across a range of parameters, rather than ab initio training for each new parameter. This allows for rapid adaptation of the deep learning model, that can be used subsequently across a large range of the parameter space or tailored to a specific regime of study. We demonstrate the model’s ability to capture the relevant behaviour, even at parameter values not seen during training. Applied to an idealised model of atmospheric turbulence, the speed-up provided by the deep learning model over numerical integration makes statistical study of rare events in the physical system computationally feasible.

Best regards,

Wenbo Gong

About the author