Microsoft AI & Pizza talk- 30 November 2023

Dear Cambridge AI & Machine Learning Enthusiast,

We’re happy to announce our next AI & Pizza talk series! Mark your calendars for an insightful evening at 5:15 pm on Thursday, November 30th, 2023. It’s your chance to savour the latest advancements in AI and machine learning, right here in Cambridge.

P.S., We are constantly looking for speakers from related fields. Please do reach out to us (chaoma@microsoft.com; wenbogong@microsoft.com ) if you are interested!

Location: The Auditorium, 21 Station Rd

Date: Thursday, November 30th, 2023

Agenda:

5:15 pm – 5:45 pm: Talks

5:45 pm – 7:00 pm: Networking, Pizza, and Refreshments

Title: Taylorformer: Probabilistic Modelling for Random Processes including Time Series

Speaker: Raghul Parthipan and Omer Nivron, University of Cambridge and British Antarctic Survey

Time: 5:15 pm – 5:30 pm

Abstract:

What architectures and inductive biases should we use when modelling random processes like time series? We propose the Taylorformer. Its two key components are: 1) the LocalTaylor wrapper, which adapts Taylor approximations (used in dynamical systems) for use in neural network-based probabilistic models. And 2) the MHA-X attention block which makes predictions in a way inspired by Gaussian Processes. Taylorformer approximates a consistent stochastic process and provides uncertainty-aware predictions. It outperforms the state-of-the-art in terms of log-likelihood on 5/6 classic Neural Process tasks such as meta-learning 1D functions, and has at least a 14% MSE improvement on forecasting tasks, including electricity, oil temperatures and exchange rates.

Reference: https://openreview.net/pdf?id=JbwpM5rJs5

Title: Multimodal self-supervised learning for real-world signals – does the key to specialized models lie in language?

Speaker: Dimitris Spathis, Nokia Bell Labs

Time: 5:30 pm – 5:45 pm
Abstract:
The limited availability of labels for machine learning on multimodal data hampers progress in the field. Self-supervised learning (SSL) learns data representations without labels, however, current methods require expensive computations for negative pairs and are designed for single modalities, limiting their versatility. With CroSSL (Cross-modal SSL), we introduce a new approach by masking intermediate embeddings within a contrastive framework, enabling end-to-end cross-modal learning. CroSSL demonstrates superior performance compared to other SSL and supervised techniques, using minimal labeled data. We additionally analyze the impact of different masking ratios and strategies and assess the robustness of the learned representations to missing modalities.
Self-supervision offers promise, however, even large unlabeled biosignals from personal devices or health records are hard to collect as they are not publicly available. As a result, pre-trained models are limited in size and generalization capabilities compared to popular foundation models. What if we could use Large Language Models (LLMs) as data-agnostic pre-trained models? LLMs exhibit remarkable generalization but stumble with numerical and temporal data. We discuss tokenization challenges in LLMs and potential solutions to address the critical “modality gap.”
From CroSSL’s advancements in cross-modal learning to addressing LLM challenges, my research envisions a future where models learn to combine multiple “senses” with the knowledge encoded in large models towards more accurate and robust AI.
References
– Latent Masking for Multimodal Self-supervised Learning in Health Timeseries (WSDM ’24 & ICMLw ’23) https://arxiv.org/abs/2307.16847
– The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models (GenAI4PC @ Ubicomp ’23) https://arxiv.org/abs/2309.06236

About the author