The IAIFI Summer Workshop brings together researchers from across Physics and AI for plenary talks, poster sessions, and networking to promote research at the intersection of Physics and AI.

Registration is now open. The registration fee is $200 and includes a Workshop dinner.

Interested in submitting a contributed talk or poster? Submit here by Monday, June 24, 2024 to be considered. Submissions will be reviewed on a rolling basis.

Register for the Summer Workshop

  • The 2024 Summer Workshop will be held August 12–16, 2024
  • Location: MIT Media Lab
  • Registration deadline: July 31, 2024

Here’s what attendees at previous IAIFI Summer Workshops had to say about the experience:

Videos of the plenary talks from the 2023 IAIFI Summer Workshop are now available on YouTube.

Agenda Speakers FAQ Past Workshops Accommodations

About

The Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) is enabling physics discoveries and advancing foundational AI through the development of novel AI approaches that incorporate first principles, best practices, and domain knowledge from fundamental physics. The goal of the Workshop is to serve as a meeting place to facilitate advances and connections across this growing interdisciplinary field.

Agenda

Monday, August 12, 2024

9:15-9:30 am ET

Welcome

9:30–10:15 am ET

10,000 Einsteins: AI and the future of theoretical physics

Matt Schwartz, Harvard/IAIFI

Abstract AI has already proved revolutionary in many areas of physics, particularly those focused on data analysis. However, machines are also advancing rapidly in symbolic tasks. As much of what is done in theoretical physics is symbolic, there is tremendous potential for machines to transition from data analysis to formal theoretical work. This talk will discuss some initial progress in this direction and a vision for how machines and humans might collaborate in the future to solve some of the most challenging problems in fundamental physics.

10:15–11:00 am ET

Dynamic Models from Data

Nathan Kutz, University of Washington

Abstract Physics based models and governing equations dominate science and engineering practice. The advent of scientific computing has transformed every discipline as complex, high-dimensional and nonlinear systems could be easily simulated using numerical integration schemes whose accuracy and stability could be controlled. With the advent of machine learning, a new paradigm has emerged in computing whereby we can build models directly from data. In this work, integration strategies for leveraging the advantages of both traditional scientific computing and emerging machine learning techniques are discussed. Using domain knowledge and physics-informed principles, new paradigms are available to aid in engineering understanding, design and control.

11:00-11:30 am ET

Break

11:30 am–12:00 pm ET

Accurate, efficient, and reliable learning of deep neural operators for multiphysics and multiscale problems

Lu Lu, Yale University

Abstract It is widely known that neural networks (NNs) are universal approximators of functions. However, a less known but powerful result is that a NN can accurately approximate any nonlinear operator. This universal approximation theorem of operators is suggestive of the potential of deep neural networks (DNNs) in learning operators of complex systems. In this talk, I will present the deep operator network (DeepONet) to learn various operators that represent deterministic and stochastic differential equations. I will also present several extensions of DeepONet, such as DeepM&Mnet for multiphysics problems, DeepONet with proper orthogonal decomposition or Fourier decoder layers, MIONet for multiple-input operators, and multifidelity DeepONet. I will demonstrate the effectiveness of DeepONet and its extensions to diverse multiphysics and multiscale problems, such as bubble growth dynamics, high-speed boundary layers, electroconvection, hypersonics, geological carbon sequestration, and full waveform inversion. Deep learning models are usually limited to interpolation scenarios, and I will quantify the extrapolation complexity and develop a complete workflow to address the challenge of extrapolation for deep neural operators.

12:15–1:30 pm ET

Lunch

1:30–3:00 pm ET

Contributed Talks Session A - Representation/Manifold Learning

Symmetries and neural tangent kernels: using physical principles to understand deep learning, Jan Gerken (Chalmers University of Technology) Despite its extraordinary success in applications, a thorough theoretical understanding of deep learning is still lacking, making progress depend largely on costly trial-and-error procedures. At the same time, theoretical physics has a long history of developing deep mathematical understanding of complex systems. In this talk, I will present some recent work on how techniques from theoretical physics can be used to deepen our understanding of deep learning and lead to practically relevant insights. In particular, symmetries, which are an established cornerstone of theoretical physics, have reached widespread popularity as a guiding principle in deep learning as well. In machine learning, symmetries feature most importantly in the form of data augmentation and equivariant neural networks. At the same time, neural tangent kernels, which are closely related to statistical field theory, have emerged as a powerful tool to understand neural networks both at initialization and during training. Combining these paradigms leads to practically relevant statements in deep learning. Furthermore, it opens the door towards further deepening the connecting between theoretical physics and our understanding of neural networks.
Approximately-symmetric neural networks for quantum spin liquids, Dominik Kufel (Harvard University) We propose and analyze a family of approximately-symmetric neural networks for quantum spin liquid problems. These tailored architectures are parameter-efficient, scalable, and significantly out-perform existing symmetry-unaware neural network architectures. Utilizing the mixed-field toric code model, we demonstrate that our approach is competitive with the state-of-the-art tensor network and quantum Monte Carlo methods. Moreover, at the largest system sizes (N=480), our method allows us to explore Hamiltonians with sign problems beyond the reach of both quantum Monte Carlo and finite-size matrix-product states. The network comprises an exactly symmetric block following a non-symmetric block, which we argue learns a transformation of the ground state analogous to quasiadiabatic continuation. Our work paves the way toward investigating quantum spin liquid problems within interpretable neural network architectures.
Interpretable representation learning from Chandra X-ray data, Shivam Raval (Harvard University) We build novel, high-quality data representation vectors (also called embeddings) for X-ray spectral data obtained from the Chandra X-ray observatory, as a means to extract meaningful patterns. We show that these embeddings are interpretable and can be further processed for downstream machine-learning tasks. The data representations are generated using specialized state-of-the-art transformer architectures and state-space models that incorporate the symmetries and correlations in spectra data through a self-supervised or mask-modeling training scheme. The learned representations are powerful and be utilized for classification and regression tasks to predict various properties of the recorded observation such as the spectral parameters of the physical models that are more likely to describe the observations. This is a step towards generating representations of X-ray data that can be used for the classification of serendipitous X-ray sources, most of which remain unclassified, identification of spectral anomalies in high energy datasets, and also as the raw material for future foundation models in astrophysics. This is one of the earliest attempts to construct transformer-based representations in X-ray datasets, and a stepping stone to create learned representations in an energy regime poorly explored so far with machine learning algorithms.
A Neural Net Model for Distillation with Weights Explained, Berfin Simsek (NYU/Flatiron Institute) It is important to understand how large models represent knowledge to make them efficient and safe. We study a toy model of neural nets that exhibits non-linear dynamics and phase transition. Although the model is complex, it allows finding a family of the so-called "copy-average" critical points of the loss. The gradient flow initialized with random weights consistently converges to one such critical point for networks up to a certain width, which we proved to be optimal among all copy-average points. Moreover, we can explain every neuron of a trained neural network of any width. As the width grows, the network changes the compression strategy and exhibits a phase transition. We close by listing open questions calling for further mathematical analysis and extensions of the model considered here.

Physics-Motivated Optimization

Beyond Closure Models: Estimating Long-term Statistics of Chaotic-Systems via Physics-Informed Neural Operators, Chuwei Wang (Caltech) Accurately predicting the long-term behavior of chaotic systems is important in many applications. This requires iterative computations on a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world scenarios. The alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a 'closure model', which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, through the lens of Liouville flow in function spaces, we prove an even more fundamental limitation, viz., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments on fluid dynamics, our PINO model achieves a 120x speedup compared to FRS with a relative error ~5%. In contrast, the closure model coupled with a coarse-grid solver is 58x slower than PINO while having a much higher error 205% when the closure model is trained on the same FRS dataset.
Determining Heterogeneous Elastic Properties of Soft Materials using Physics-Informed Neural Networks, Wensi Wu (Children's Hospital of Philadelphia) The heterogeneous mechanical properties found in biological materials have profound implications for both engineering and medical applications. Within the engineering community, these properties are frequently studied to guide the design of mechanical devices such as artificial organs and soft robots. Concurrently, in the medical field, the mechanical properties of tissues play a crucial role in providing diagnostic information about various diseases and conditions. The significance of material mechanical properties across these diverse domains has driven a need to better understand the underlying mechanisms governing the microscopic properties of biological tissues and their associated functions, whether for improving material designs or disease diagnosis. In traditional engineering, identifying unknown material parameters requires iterative inverse finite element analyses and optimization of the constitutive parameters until the finite element model achieves an acceptable level of mechanical response, aligning with experimental data. While this method is efficient with homogeneous materials, optimizing the elasticity map of heterogeneous materials is challenging. In this work, we propose using physics-informed neural networks (PINNs) to identify the full-field elastic properties of highly nonlinear, hyperelastic materials. We applied our improved PINNs to six structurally complex materials and three constitutive material models (Neo-Hookean, Mooney-Rivlin, and Gent) to evaluate the accuracy of full-field elasticity maps estimated by PINNs. Our PINN model consistently produced highly accurate estimates of the full-field elastic properties, even when there was up to 10% noise present in the training data.

Contributed Talks Session B - Generative Models

Machine learning phase transitions: A probabilistic perspective, Julian Arnold (University of Basel) The identification of phase transitions and the classification of different phases of matter from data are among the most popular applications of machine learning in physics. Neural network (NN)-based approaches have proven to be particularly powerful due to the ability of NNs to learn arbitrary functions. Many such approaches work by computing indicators of phase transitions from the output of NNs trained to solve specific classification problems. In this talk, I will derive the optimal solutions to these classification problems given by Bayes classifiers that take into account the probability distributions underlying the physical system under consideration [1]. This probabilistic viewpoint allows us to gain a deeper understanding of previous NN-based studies, highlighting the strengths and weaknesses of individual methods [1], enables us to root the methods in information theory [2], yields more efficient numerical routines based on the incorporation of readily available generative models [3], and widens the application domain of these methods to systems outside physics (such as diffusion models or transformers) [4,5]. [1] J. Arnold and F. Schäfer, PRX 12, 031044 (2022) [2] J. Arnold et al., arXiv:2311.10710 (2023) [3] J. Arnold et al., PRL 132, 207301 (2024) [4] J. Arnold et al., arXiv:2311.09128 (2023) [5] J. Arnold et al., arXiv:2405.17088 (2024)
Accelerating Molecular Discovery with Machine Learning, Yuanqi Du (Cornell University) Recent advancements in machine learning have paved the way for groundbreaking opportunities in the realm of molecular discovery. At the forefront of this evolution are improved computational tools with proper inductive biases and efficient optimization. In this talk, I will delve into our efforts around these themes from a geometry, sampling and optimization perspective. I will first introduce how to encode symmetries in the design of neural networks and the balance of expressiveness and computational efficiency. Next, I will discuss how generative models enable a wide range of design and optimization tasks in molecular discovery. In the third part, I will talk about how the advancements in stochastic optimal control, sampling and optimal transport can be applied to find transition states in chemical reactions.
Understanding Diffusion Models by Feynman's Path Integral, Yuji Hirono (Osaka University) Score-based diffusion models have proven effective in image generation and have gained widespread usage. We introduce a novel formulation of diffusion models using Feynman's path integral [1]. We find this formulation providing comprehensive descriptions of score-based generative models, and demonstrate the derivation of backward stochastic differential equations and loss functions.The formulation accommodates an interpolating parameter connecting stochastic and deterministic sampling schemes, and we identify this parameter as a counterpart of Planck's constant in quantum physics. This analogy enables us to apply the Wentzel-Kramers-Brillouin (WKB) expansion, a well-established technique in quantum physics, for evaluating the negative log-likelihood to assess the performance disparity between stochastic and deterministic sampling schemes. Reference: [1] Yuji Hirono, Akinori Tanaka, Kenji Fukushima, accepted in ICML2024 [arXiv:2403.11262].
Neural Entropy, Akhil Premkumar (University of Chicago) What is the smallest neural network that can do a particular task? To answer this question we need to understand the capacity of neural networks to encode and store information. In the context of generative diffusion models, we show that it is possible to identify the entropy of the network, which characterizes precisely its storage capacity.
GANSky: fast full sky weak lensing simulations using physics-informed GANs, Supranta Sarma Boruah (University of Pennsylvania) Producing accurate weak lensing simulations for future cosmological surveys will be a severe computational bottleneck. We present a method that uses generative adversarial networks (GANs) to produce accurate weak lensing simulations from fast lognormal simulations. This enables us to produce full-sky weak lensing simulations in seconds. Our method: 1. Is physics-informed and explainable, where large-scales are described by an analytic lognormal model and the GAN only learns local redistribution of matter in these approximate maps. 2. Works on full-sky, as required for future wide-field surveys, 3. Requires fewer simulations to train, 4. Accurately reproduces non-Gaussianities of the weak lensing convergence field. This breakthrough enables fast simulation-based or field-based inference with weak lensing data.
Predicting Missing Regions in Charged Particle Tracks Using a Sparse 3D Convolutional Neural Network, Hilary Utaegbulam (University of Rochester) The 2x2 Demonstrator is a prototype detector for the Deep Underground Neutrino Experiment (DUNE)'s Near Detector. Both the 2x2 Demonstrator and the Near Detector itself will have inactive regions wherein there is reduced or no sensitivity to charge deposition and light signals that arise from charged particle interactions with liquid argon. In the 2x2, these inactive regions are positioned in-between the active detector modules, which introduces the challenge of inferring what charge signals ought to look like in these regions. This study explores the use of a Sparse 3D Convolutional Neural Network (ConvNet) to infer missing regions in charged particle tracks. Hits corresponding to energy depositions are voxelized into a three-dimensional (3D) grid for each track. Inactive regions within the tracks are replaced with a dense, rectangular 3D grid of voxels, ensuring consistent step sizes in X, Y, and Z directions. Voxels in these dense regions are initialized with an energy value of -1, indicating nonphysical energy or charge. The model is trained to predict which voxels should activate as part of the track and which should not, with the goal of eventually inferring the missing charge or energy values in these voxels. Results indicate that the model accurately predicts track voxels within ±3 unit in X, Y, or Z directions and effectively identifies non-track voxels, despite some overprediction. The approach shows promise in prediction of missing track regions with some accuracy.

3:00–3:30 pm ET

Break

3:30–4:15 pm ET

What Do Language Models Have To Say About Fundamental Physics?

Mariel Pettee, LBNL/Flatiron

Abstract The launch of ChatGPT in November 2022 ignited an ongoing worldwide conversation about the possible impacts of Large Language Models (LLMs) on the way we work. As scientists, however, the changes in our workflows since the advent of this technology have been relatively minor. Will this still be the case in 10 years? Could an analogous paradigm shift arise from a foundation model trained on a large amount of scientific data, transforming the way we conduct our research? If so, what can we learn from the development of other foundation models, particularly LLMs, in their evolution from specialists to (quasi-)generalists? In this talk, I will present some recent work exploring how language models could help form a foundation model of fundamental physics. I'll also share my perspective on how we should strive to shape such models to reflect our highest priorities as scientists.

4:15-5:00 pm ET

Solving the nuclear many-body problem with neural quantum state

Alessandro Lovato, Argonne National Laboratory

Abstract Artificial neural networks can be employed to accurately and compactly represent quantum many-body states relevant to many applications, including nuclear physics, quantum chemistry, and condensed matter problems. I will argue that a variational Monte Carlo algorithm based on neural-network quantum states provides a systematically improvable solution to the nuclear Schrödinger equation with a polynomial cost in the number of nucleons. After presenting recent progress in describing atomic nuclei, neutron-star matter, and hypernuclei, I will illustrate an application to condensed-matter systems, specifically ultra-cold Fermi gases near the unitary limit. Detailed benchmarks with continuum Quantum Monte Carlo methods will be presented.

5:00–7:00 pm ET

Poster Session

Details
  • Data Compression and Inference in Cosmology with Self-Supervised Machine Learning, Aizhan Akhmetzhanova (Harvard University)
  • CNN and Transformer architecture for jets events classification, Juvenal Bassa (University of Puerto Rico - Mayaguez)
  • Data-Driven Discovery of X-ray Transients with Machine Learning, Steven Dillmann (University of Cambridge)
  • Sampling Transition Dynamics with Machine Learning Approaches, Yuanqi Du (Cornell University)
  • Multi-Modal Generalized Class Discovery for Scalable Autonomous All-Sky Surveys, Sriram Elango (Harvard University)
  • Optimizing Self-Assembly Yields of Chain-like Structures: A Computational Approach using Implicit Differentiation, Livia Guttieres (Harvard University)
  • Inverse Design of Complex Fluids with Fully-Differentiable Lagrangian Particle Dynamics, Kaylie Hausknecht (Harvard University and MIT)
  • Perfect Jet Classification Through Equivariant Regression, Timothy Hoffman (University of Chicago)
  • Flow-Based Generative Emulation of Grids of Stellar Evolutionary Models, Marc Hon (MIT Kavli Institute for Astrophysics and Space Research)
  • Enhancing Cosmological Simulations with Efficient and Interpretable Machine Learning in the Wavelet Basis, Cooper Jacobus (UC Berkeley: Dept. Astrophysics, Lawrence Berkeley National Lab: Computational Cosmology Center)
  • Training neural operators to preserve invariant measures of chaotic attractors, Ruoxi Jiang (University of Chicago)
  • Hidden Giants: Redefining QSO Classification and Outlier Detection with Redshift Invariant Autoencoders, Thaddaeus Kiker (Columbia University)
  • KAN: Kolmogorov-Arnold Networks, Ziming Liu (MIT, IAIFI)
  • Phase Transitions in the Output Distribution of Large Language Models, Niels Loerch (University of Basel)
  • Tackling reasoning problems with AI, Rishabh Mallik (Forschungszentrum Jülich)
  • Recurrent Features of Amplitudes in Planar N = 4 Super Yang-Mills Theory, Garrett Merz (University of Wisconsin-Madison)
  • Ultrafast Jet Classification using Geometric Learning, Patrick Odagiu (ETH Zurich)
  • Deep Stochastic Mechanics, Elena Orlova (The University of Chicago)
  • Differentiable and Distributional Cosmological Stasis, Sneh Pandya (Northeastern / IAIFI)
  • Exploring Astronomical Catalog Crossmatching with Machine Learning, Victor Samuel Perez Diaz (Center for Astrophysics | Harvard & Smithsonian, IAIFI)
  • Towards an AI-enabled astronomy system: natural language processing of Chandra data archive, Shivam Raval (Harvard University)
  • Auto-decoding Poisson Processes for Unsupervised X-ray Sources Learning, Yanke Song (Harvard University, Department of Statistics)
  • Development of photothermal techniques for the detection of cancer biomarkers, Ilhem Soyah (Higher school of sciences and technology of Hammam Sousse)
  • Multi-Modal Contrastive Training for Robust VQA, Mitra Tajrobehkar (Vertical Oceans)
  • Zero-Shot Classification of Astronomical Images with Large Multimodal Models, Dimitrios Tanoglidis (University of Pennsylvania)
  • Vertex finding and jet class classification using Wasserstein Neural Network, Diego F. Vasquez Plaza (Univesity Puerto rico Mayagüez)
  • Learning Group Invariant CY Metrics by Fundamental Domain Projections, Moritz Walden (Uppsala University)
  • Accelerating Energy Computation in Many-electron Systems with Forward Laplacian, Chuwei Wang (Caltech)
  • Emulating the Effects of Pile-Up on X-ray Spectra, Justina Yang (Harvard University)
  • Hessian Methods for Periodic Orbits, Leo Yao (MIT)
  • HyperTagging: Reconstruction of Full Decays using Transformers and Hyperbolic Embedding, Boyang Yu (LMU Munich, Germany)
  • Neural scaling laws from large-N field theory, Zhengkang Zhang (University of Utah)
  • Revealing the 3D Cosmic Web with Physics Constrained Neural Fields, Brandon Zhao (Caltech)

6:00–8:00 pm ET

Welcome Reception

Tuesday, August 13, 2024

9:30–10:15 am ET

Trends in AI for particle accelerators

Verena Kain, CERN

Abstract AI is without doubt radically transforming science with many successful applications in molecular biology, astrophysics, nuclear physics and particle physics. It has enabled significant technological advances for robotics that can particularly enhance a system’s perception, navigational and manipulation abilities and interaction. For control, it enables novel and faster learning/teaching of tasks, replacing or augmenting classical control techniques for hard problems such as real-time control of the non-linear dynamics of the plasma in a tokamak of a fusion reactor, or navigating drones with super-human performance. Given the success and types of use cases that can be solved with AI algorithms, accelerator physics and associated technologies have also picked up on AI in the last 5 to 10 years with the number of ML applications steadily rising - and subsequently the number of ML related papers at the big particle accelerator conferences. This contribution will give a brief overview of the typical use cases for AI for particle accelerators, show recent trends and describe the potential and vision of AI for particle accelerators with the emphasis on control and optimisation of particle accelerators.

10:15–11:00 am ET

An introduction to neural ODEs in scientific machine learning

Patrick Kidger, Cradle.io

Abstract This is an introduction to neural ODEs for scientific applications. The goal is to (a) provide a modelling tool that enhances the expressivity of existing theory-driven approaches, (b) demonstrate that neural ODEs are easy to use via modern autodifferentiable software, and (c) give enough of the tips-and-tricks needed to make neural ODEs work in practice!

11:00-11:30 am ET

Break

11:30 am–12:00 pm ET

Details to come

Rose Yu, UCSD

Abstract Despite the success of equivariant neural networks in scientific applications, they require knowing the symmetry group a priori. However, it may be difficult to know which symmetry to use as an inductive bias in practice. Enforcing the wrong symmetry could even hurt the performance. In this talk, I will discuss our effort in developing a deep learning framework that can automatically discover symmetry from data. Our framework, LieGAN, represents symmetry as interpretable Lie algebra basis and uses a paradigm akin to generative adversarial training. We further generalized it LaLieGAN to discover non-linear symmetries from high-dimensional data. Empirically, the learned symmetry can also be readily used in existing equivariant neural networks to improve accuracy and generalization in prediction. It can also improve equation discovery and long-term forecasting for various dynamical systems.

12:15–1:30 pm ET

Lunch

1:30–3:00 pm ET

Contributed Talks - Session A - Foundational ML

Diversity with Similarity as a Measure of Dataset Quality, Josiah Couch (Beth Israel Deaconess Medical Center) Dataset size and class balance are important measures in deep learning. Maximizing them is seen as a way to ensure that datasets contain diverse images, which models are thought to need in order to generalize well. Yet neither size nor class balance measure image diversity directly, raising the possibility that better measures of dataset quality might exist. To test this hypothesis, we turned to a comprehensive framework of diversity measures that generalizes familiar quantities like Shannon entropy by accounting for the similarities and differences among images. (Size and class balance emerge from this framework as special cases.) We created several thousand diverse datasets by subsampling a variety of large medical-image datasets representing a range of imaging modalities, trained classifiers on these subsets, and calculated the correlation between subset diversity and model accuracy using diversity measures from the framework.
RG flow of the NTK dynamics at finite-width from Feynman diagrams, Max Guillen (Chalmers University of Technology) Deep Learning is nowadays a well-stablished method for different applications in science and technology. However, it has been unclear for a long time how the "learning process" actually occurs in different architectures, and how this knowledge could be used to optimize performance and efficiency. Recently, high-energy-physics-based ideas have been applied to the modelling of Deep Learning, thus translating the learning problem to an RG flow analysis in Quantum Field Theory (QFT). In this talk, I will explain how these quite complicated formulae describing such RG flows for different observables in neural networks at initialization, can be easily obtained from a few rules resembling Feynman rules in QFT. I will also comment on some work in progress which implements such rules for computing higher-order corrections to the frozen (infinite-width) NTK for particular activation functions, and how they evolve after a few steps of SGD.
Supervised learning of infinitely-overparameterized DNNs through the lens of Wilsonian RG, Anindita Maiti (Perimeter Institute) The key to the performance of ML algorithms is an ability to segregate relevant features in input datasets from the irrelevant ones. In a setup where data features play the role of an energy scale, we develop a Wilsonian RG framework to integrate out unlearnable modes associated with the Neural Network Gaussian Process (NNGP) kernel, in the regression context. Such a framework in the case of Gaussian features leads to a universal flow of the ridge parameter, whereas, non-Gaussianities in data features result in rich input-dependent RG flows. This framework goes beyond the usual analogies between RG flows and learning dynamics, and offers potential improvements to our understanding of feature learning and universality classes of models.
Input Space Mode Connectivity in Deep Neural Networks, Jakub Vrabel (CEITEC, Brno University of Technology) We extend the concept of loss landscape mode connectivity to the input space of deep neural networks. Mode connectivity was originally studied within parameter space, where it describes the existence of low-loss paths between different solutions (loss minimizers) obtained through gradient descent. We present theoretical and empirical evidence of its presence in the input space of deep networks, thereby highlighting the broader nature of the phenomenon. We observe that different input images with similar predictions are generally connected, and for trained models, the path tends to be simple, with only a small deviation from being a linear path. Our methodology utilizes real, interpolated, and synthetic inputs created using the input optimization technique for feature visualization. To prove the existence of general mode connectivity in high-dimensional input spaces, we employ percolation theory. We argue that the approximate linear mode connectivity post-training is a manifestation of some implicit bias. We exploit mode connectivity to obtain new insights about adversarial examples and demonstrate its potential for adversarial detection. Additionally, we discuss applications for the interpretability of deep networks.
Neural scaling laws from large-N field theory, Zhengkang Zhang (University of Utah) Many machine learning models based on neural networks exhibit scaling laws: their performance scales as power laws with respect to the sizes of the model and training data set. We use large-N field theory methods to solve a model recently proposed by Maloney, Roberts and Sully which provides a simplified setting to study neural scaling laws. Our solution extends the result in this latter paper to general nonzero values of the ridge parameter, which are essential to regularize the behavior of the model. In addition to obtaining new and more precise scaling laws, we also uncover a duality transformation at the diagrams level which explains the symmetry between model and training data set sizes. The same duality underlies recent efforts to design neural networks to simulate quantum field theories.
Fourier-enhanced deep operator network for geophysics with improved accuracy, efficiency, and generalizability, Min Zhu (Yale University) Full waveform inversion (FWI) and geologic carbon sequestration (GCS) are two significant topics in geophysics. FWI infers subsurface structure information from seismic waveform data by solving a non-convex optimization problem. On the other hand, solving multiphase flow in porous media is essential for CO2 migration and pressure fields in the subsurface associated with GCS. However, numerical simulations for both FWI and GCS are computationally challenging and expensive due to the highly nonlinear governing partial differential equations (PDEs). Here, we develop a Fourier-enhanced deep operator network (Fourier-DeepONet) to address this issue. For FWI, compared with existing data-driven FWI methods, Fourier-DeepONet achieves more accurate predictions of subsurface structures across a wide range of source parameters. Additionally, Fourier-DeepONet demonstrates superior robustness when handling data with Gaussian noise or missing traces. For GCS, compared to the state-of-the-art Fourier neural operator (FNO), Fourier-DeepONet offers superior computational efficiency, with 90% fewer unknown parameters, significantly reduced training time (approximately 3.5 times faster), and much lower GPU memory requirements (less than 35%). Furthermore, Fourier-DeepONet maintains good accuracy when predicting out-of-distribution (OOD) data. This excellent generalizability is enabled by its adherence to the physical principle that the solution to a PDE is continuous over time.

Contributed Talks Session B - Physics-Motivated Optimization

Search for new physics using Event-based anomaly detection at the ATLAS detector of CERN, Wasikul Islam (University of Wisconsin-Madison) Searches for new resonances in two-body invariant mass distributions are performed using an unsupervised anomaly detection technique in events produced in proton-proton collisions at a center of mass energy of 13 TeV recorded by the ATLAS detector at the LHC. Studies are conducted in data containing at least one isolated lepton. An autoencoder network is trained with 1% randomly selected collision events and anomalous regions are then defined which contain events with high reconstruction losses from the decoder. Nine invariant mass distributions are inspected which contain pairs of one light jet (or one b-jet) and one lepton, photon, or a second light jet (b-jet). The 95% confidence level upper limits on contributions from generic Gaussian signals are reported for the studied invariant mass distributions. The obtained model-independent limits show strong potential to exclude generic heavy states with complex decays.
PolarBERT: a Foundation Model for Neutrino Telescope Data, Inar Timiryasov (Niels Bohr Institue, Copenhagen Univeristy) Neutrinos are elusive particles that require massive detectors for observation. The IceCube neutrino observatory at the South Pole is a cubic kilometer of Antarctic ice, instrumented with 5,160 digital optical modules. Its results play an essential role in both particle physics and astrophysics. Deep learning methods, such as graph neural networks, have been successfully applied to the steady stream of incoming data IceCube is receiving. In this talk, we will present a foundation model for the IceCube data. It is trained in a self-supervised way without any data labeling. We further fine-tune this pretrained model for the downstream task of directional reconstruction of neutrino events. We show that this pretrained model significantly outperforms models trained from scratch. Remarkably, the foundation model does not require any knowledge of the IceCube detector geometry or characteristics of its electronics, since it extracts all the necessary information from the raw data.
Expediting Astronomical Discovery with Large Language Models: Progress, Challenges, and Future Directions, Yuan-Sen Ting (Ohio State University) The vast and interdisciplinary nature of astronomy, coupled with its open-access ethos, makes it an ideal testbed for exploring the potential of Large Language Models (LLMs) in automating and accelerating scientific discovery. In this talk, we present our recent progress in applying LLMs to tackle real-life astronomy problems. We demonstrate the ability of LLM agents to perform end-to-end research tasks, from data fitting and analysis to iterative strategy improvement and outlier detection, mimicking human intuition and deep literature understanding. However, the cost-effectiveness of closed-source solutions remains a challenge for large-scale applications involving billions of sources. To address this issue, we introduce our ongoing work at AstroMLab on training lightweight, open-source specialized models and our effort to benchmark these models with carefully curated astronomy benchmark datasets. We will also discuss our effort to construct the first LLM-based knowledge graph in astronomy, leveraging citation-reference relations. The open-source specialized LLMs and knowledge graph are expected to guide more efficient strategy searches in autonomous research pipelines. While many challenges lie ahead, we explore the immense potential of scaling up automated inference in astronomy, revolutionizing the way astronomical research is conducted, ultimately accelerating scientific breakthroughs and deepening our understanding of the Universe.
Marginalize, Don't Subtract: Spectral Component Separation for Faint Objects in DESI, Ana Sofia Uzsoy (Harvard University) Component separation is a critical step in disentangling multiple signals and in extracting useful information from spectra. In this talk, I present MADGICS (Marginalized Analytic Dataspace Gaussian Inference for Component Separation), a data-driven Bayesian component separation technique that can separate a spectrum into any number of Gaussian-distributed components. I then discuss the application of this technique for automatically determining redshifts for Lyman Alpha Emitter (LAE) galaxies observed with DESI while marginalizing over sky residuals to separate sky from target emission lines. We create a covariance matrix from visually inspected DESI LAE targets to provide physically motivated priors, and determine redshift by jointly inferring sky, LAE, and residual components for each individual spectrum. This component separation technique will allow us to create a high-quality catalog of LAE spectra and redshifts from DESI data and is also broadly generalizable to other spectral features of interest.
Hessian Methods for Periodic Orbits, Leo Yao (MIT) We discuss Hessian-based methods for analyzing periodic orbits in chaotic systems. Using a loss parametrization on the variational method of finding orbits, we use Hessian eigendecompositions to obtain solutions in the vicinity of unstable fixed points, to propagate along continuous families of periodic solutions, and to determine bifurcation points in the periodic orbit spectrum. We demonstrate applications to the PCR3BP and the double pendulum, including full collections of orbits between pairs of fixed points.
Revealing the 3D Cosmic Web with Physics Constrained Neural Fields, Brandon Zhao (Caltech) Weak gravitational lensing is the slight distortion of galaxy shapes by the gravitational effect of the large-scale structure. In our work, we seek to invert the weak lensing signal found in 2D telescope images to obtain a 3D reconstruction of the universe’s dark matter field. While typically this inversion is done in 2D to obtain a projection of the dark matter field, accurate 3D maps of the dark matter distribution are particularly useful as they allow us to detect and localize structures of interest such as galaxy clusters, as well as disambiguate them from intervening matter along the line of sight. This inversion is ill-posed for several reasons. First, images are only observed from a single viewing angle, which must be inverted into a 3D mass distribution. Second, the exact locations and shapes of unlensed galaxies is in general unknown, and can only be estimated with a degree of uncertainty. This introduces a large amount of noise to our measurement of the lensing signal. We propose a novel methodology using a physics-constrained, coordinate-based neural field to model the underlying continuous matter distribution. We take an analysis-by-synthesis approach, optimizing the weights of the neural network through a fully differentiable physical forward model to reproduce the lensing signal present in image measurements. We showcase reconstruction results on simulated measurements of dark matter distributions from a low resolution N-Body particle simulation, and compare our approach with earlier 3D inversion methods.

3:00–3:30 pm ET

Break

3:30–4:15 pm ET

KAN: Kolmogorov-Arnold Networks

Ziming Liu, MIT/IAIFI

Abstract Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

4:15-5:00 pm ET

Details to come

Pulkit Agrawal, MIT/IAIFI

Abstract Details to come

Wednesday, August 14, 2024

9:30–10:15 am ET

Navigating Complex Models: Neural Networks for High-Dimensional Statistical Inference

Christoph Weniger, University of Amsterdam

Abstract Details to come

10:15–11:00 am ET

Data-Driven High-Dimensional Inverse Problems: A Journey Through Strong Lensing Data Analysis

Laurence Levasseur, University of Montreal

Abstract Details to come

11:00-11:30 am ET

Break

11:30 am–12:00 pm ET

Machine Learning and Physics: The Alliance of the Titans

Ayan Paul, Northeastern

Abstract Leaps in our understanding of Physics have been concomitant with the adoption of new and increasingly powerful mathematical structures that shift our perspective of how we probe the dynamics of the universe and allow us to unravel complex concepts that were hitherto inaccessible to us. In the realm of data-driven science, where physics is firmly planted, machine learning is proving to be a long-awaited and much-needed mathematical structure that has showcased its worth in aiding landmark discoveries, understanding the underlying symmetries of theories that we propose, and connecting signals to kinematics interpretably, to mention a few. In this parable on the charm of machine learning in physics, we will discuss the nuances of some of these achievements and lay out what we can expect from the future.

12:15–1:30 pm ET

Lunch

1:30–2:15 pm ET

Geometric Machine Learning

Melanie Weber, Harvard University

Abstract A recent surge of interest in exploiting geometric structure in data and models in machine learning has motivated the design of a range of geometric algorithms and architectures. This lecture will give an overview of this emerging research area and its mathematical foundation. We will cover topics at the intersection of Geometry and Machine Learning, including relevant tools from differential geometry and group theory, geometric representation learning, graph machine learning, and geometric deep learning.

2:15–3:00 pm ET

Machine Learning for LHC Theory

Tilman Plehn, Heidelberg

Abstract Details to come

3:00–3:30 pm ET

Break

3:30–4:15 pm ET

Asteroseismic probes of far-ranging astrophysics with big data and machine learning

Earl Bellinger, Yale University

Abstract Space telescopes like the NASA Kepler and TESS missions as well as the forthcoming PLATO mission are driving a data revolution in stellar astrophysics. The ultra-precise observations provided by these missions are challenging our best models of how stars evolve, and are in turn granting insights into the formation and evolution of planetary systems and the Galaxy as a whole. They furthermore present novel opportunities to probe far-ranging physics, such as dark matter and theories of gravity beyond general relativity. In this talk, I will give an overview of the data, models, challenges, and opportunities in asteroseismology, and highlight the role that machine learning is playing in advancing our knowledge across astrophysics.

4:15-5:00 pm ET

Big data cosmology meets AI

Carol Cuesta-Lazaro, IAIFI Fellow

Abstract The upcoming era of cosmological surveys promises an unprecedented wealth of observational data that will transform our understanding of the universe. Surveys such as DESI, Euclid, and the Vera C. Rubin Observatory will provide extremely detailed maps of billions of galaxies out to high redshifts. Analyzing these massive datasets poses exciting challenges that machine learning is uniquely poised to help overcome. In this talk, I will highlight recent examples from my work on probabilistic machine learning for cosmology. First, I will explain how a point cloud diffusion model can be used both as a generative model for 3D maps of galaxy clustering and as a likelihood model for such datasets. Moreover, I will present a generative model developed to reconstruct the initial conditions of the Universe from spectroscopic survey observations. When combined with the wealth of data from upcoming surveys, these machine learning techniques have the potential to provide new insights into fundamental questions about the nature of the universe.

5:30-6:30 pm ET

Panel on Industry–Academia Collaboration

(Additional panelists to be announced)

  • Moderator: Carol Cuesta-Lazaro, IAIFI Fellow

  • Bill Freeman, Professor of EECS, MIT

  • Partha Saha, Distinguished Engineer, Data and AI Platform, Visa

Thursday, August 15, 2024

9:30–10:15 am ET

Uncertainty Quantification from Neural Network Correlation Functions

Yonatan Kahn, University of Illinois Urbana-Champaign

Abstract Details to come

10:15–11:00 am ET

Transformers to transform Scattering Amplitudes Calculation

Tianji Cai, SLAC

Abstract AI for fundamental physics is now a burgeoning field, with numerous efforts pushing the boundaries of experimental and theoretical physics. In this talk, I will introduce a recent innovative application of Natural Language Processing to state-of-the-art calculations for scattering amplitudes. Specifically, we use Transformers to predict the symbols at high loop orders of the three-gluon form factors in planar N=4 Super Yang-Mills theory. Our results have demonstrated great promises of Transformers for amplitude calculations, opening the door for an exciting new scientific paradigm where discoveries and human insights are inspired and aided by AI.

11:00-11:30 am ET

Break

11:30 am–12:00 pm ET

Neural ansatze for physical systems

Nima Dehmamy, IBM Research MIT-IBM Lab

Abstract Details to come

12:15–1:30 pm ET

Lunch

1:30–3:00 pm ET

Contributed Talks: Session A - Uncertainty Quantification/Robust AI

Jolideco: A Hybrid ML-Statistical Approach for Robust Image Deconvolution in Sparse Poisson Regimes, Axel Donath (Center for Astrophysics | Harvard & Smithsonian) Machine learning for sparse image data reconstruction remains challenging, particularly in Astronomy where ground truth is often unavailable. While simulations and transfer learning offer partial solutions, high-dimensional parameter spaces can render these approaches computationally expensive or infeasible. Moreover, in low-count Poisson domains, quantifying uncertainties is crucial. We present Jolideco, a novel hybrid method for joint likelihood image deconvolution that synergizes machine learning with classical statistical modeling. This approach leverages a hand-crafted forward model for the imaging process, incorporating prior information such as telescope characteristics and noise distributions. Simultaneously, it employs an high-dimensional, patch-based image prior trained via ML on astronomical images from other wavelengths to regularize image structure. Jolideco demonstrates significantly improved reconstruction quality across diverse source scenarios and signal-to-noise regimes. Its closed statistical framework facilitates multi-telescope data integration and robust uncertainty quantification. We showcase Jolideco's effectiveness using example data from the Chandra X-ray Observatory and the Fermi-LAT Gamma-ray Space Telescope, illustrating its potential to advance astronomical image analysis in the Poisson regime.
Towards Quantitatively Trustworthy AI, Nicholas Kersting (Visa, Inc.) Safe and effective application of AI to Science and Industry can only proceed through measuring trustworthiness quantitatively such that we may track and report progress. Traditional statistical metrics such as Precision, Recall, AUC, etc., no longer sufficient on their own, are supplemented with measures of reliability such as Explainable AI (XAI), most recently in Large Language Model Groundedness and Hallucination --- we report especially on progress in this latter in recent research and applications at Visa.
Evidence-based Inverse Problem Solvers for QCD: Demystifying Uncertainty in Inverse Problem Solutions of Parton Distribution Functions, Brandon Kriesten (Argonne National Laboratory) Representing parton distribution functions (PDFs) of hadrons through robust, high-fidelity parameterizations has been a long-standing goal of particle physics phenomenology. Additionally, quantitatively connecting the underlying theory assumptions and chosen fitted datasets to the properties of the PDF’s flavor and x-dependence is a long-standing challenge. We use a variational autoencoder-based inverse mapper to find solutions to the inverse problem of decoding PDFs from experimental measurements / lattice QCD data while simultaneously dissecting patterns of learned correlations between the encoded data and reconstructed PDFs. Finally using evidence-based techniques, we seek to quantify the uncertainty of these models and separate data (aleatoric) and knowledge (epistemic) uncertainty while identifying out of distribution samples. I will show progress towards implementing these evidence-based inverse problem solvers for PDFs in an implementation that mirrors a phenomenological fit.
Simulation Based Inference for FCC-ee, Lingfeng Li (Brown University) We apply machine-learning techniques to the effective-field-theory analysis of the e+e−→W+W− processes at future lepton colliders, and demonstrate their advantages in comparison with conventional methods, such as optimal observables. In particular, we show that machine-learning methods are more robust to detector effects and backgrounds, and could in principle produce unbiased results with sufficient Monte Carlo simulation samples that accurately describe experiments. This is crucial for the analyses at future lepton colliders given the outstanding precision of the e+e−→W+W− measurement (∼O(10−4) in terms of anomalous triple gauge couplings or even better) that can be reached. Our framework can be generalized to other effective-field-theory analyses, such as the one of e+e−→tt¯ or similar processes at muon colliders.
Embed and Emulate: Contrastive representations for simulation-based inference, Peter Lu (University of Chicago) Scientific modeling and engineering applications rely heavily on parameter estimation methods to fit physical models and calibrate numerical simulations using real-world data. In the absence of an analytic statistical model, modern simulation-based inference (SBI) approaches first use a numerical simulator to generate a dataset consisting of parameters and corresponding model outputs, such as trajectories from a dynamical system. Then, given real experimental data, the system parameters can be inferred using a variety of SBI methods, some of which use machine learning emulators to accelerate data generation and inference. However, parameter estimation for dynamical systems, such as weather and climate, is still often difficult due to the high-dimensional nature of the data as well as the complexity of the physical models and simulations. We introduce Embed and Emulate (E&E): a new likelihood-free inference method for estimating arbitrary parameter posteriors based on contrastive learning. E&E learns a low-dimensional embedding for the data (i.e. a summary statistic) and a corresponding fast emulator in the embedding space, bypassing the need for running an expensive simulation or a high-dimensional emulator during inference. We validate our theoretical results on an synthetic toy experiment, which illustrates properties of the learned embedding as a contrastive representation, and then benchmark E&E on a realistic multimodal parameter estimation task using the high-dimensional, chaotic Lorenz 96 system.
Going beyond the jet tagging frontier using knowledge distillation, Yuanchen Zhou (Brown University) Classifying jets for proton-proton collisions is a challenging problem, and several Artificial Intelligence / Machine Learning classifiers have been introduced to help handle the task. Different classifiers have tradeoffs in terms of their accuracy, model dependency, processing time, etc. We study these tradeoffs for different model architectures, and explore techniques to improve their overall performance. In particular, we study the technique of Knowledge Distillation, which distills knowledge from a complex model with high accuracy to a simpler model with faster processing time and potentially less model-dependence to see if it is possible to increase the accuracy of the simpler model while maintaining its other advantages.

Contributed Talks Session B - Representation/Manifold Learning

Multi-modal generalized class discovery for scalable autonomous all-sky surveys, Laura Domine (Center for Astrophysics, Harvard University) The Galileo Project is a systematic scientific research program focused on understanding the origins and nature of Unidentified Aerial Phenomena (UAP). To date there is very little data on UAP whose properties and kinematics purportedly reside outside the performance envelope of known phenomena. We are in the process of designing, building and commissioning a multi-modal, multi-spectral detector to continuously monitor the sky and collect UAP data through a rigorous aerial census of natural and human-made phenomena. This open-world setting is a major challenge for artificial intelligence (AI) techniques which need to both (i) accurately detect and classify objects from known classes and (ii) cluster unknown, out-of-distribution objects. Using a commissioning dataset, which includes several months of videos from an all-sky array of eight long wave-infrared cameras and audible recordings, I will discuss our work developing a multi-modal generalized class discovery method to automatically identify new classes of objects in unlabeled data in addition to known classes. It opens the door to an autonomous aerial census where categorization relies less on our prior expectations.
SPECTER: Efficient Evaluation of the Spectral EMD, Rikab Gambhir (MIT) The Energy Mover’s Distance (EMD) has seen use in collider physics as a metric between events and as a geometric method of defining infrared and collinear safe observables. Recently, the spectral Energy Mover’s Distance (SEMD) has been proposed as a more analytically tractable alternative to the EMD. In this work, we obtain a closed-form expression for the Riemannian-like p = 2 SEMD metric between events, eliminating the need to numerically solve an optimal transport problem. Additionally, we show how the SEMD can be used to define event and jet shape observables by minimizing the metric between event and parameterized energy flows (similar to the EMD), and we obtain closed-form expressions for several of these observables. We also present the SPECTER framework, an efficient and highly parallelized implementation of the SEMD metric and SEMD-derived shape observables. We demonstrate that the SEMD and SPECTER provide nearly thousand-fold compute time improvements over evaluation of the EMD.
Multi-modal Contrastive Learning for Robust Text Representation Classification, Mitra Tajrobehkar (Vertical Oceans) Contrastive representation learning has emerged as a powerful technique in both Computer Vision (CV) and Natural Language Processing (NLP) domains, enabling the acquisition of practical and meaningful representations from text data. This talk will explore the captivating realm of contrastive representation learning in NLP, investigating its underlying principles and applications in tasks such as question answering. We will delve into the remarkable success of contrastive learning in enhancing language understanding, transfer learning, and domain adaptation in NLP tasks. Additionally, we will address the challenges associated with training language models, including limitations arising from data scarcity and bias. Join us to discover the potential of contrastive representation learning in advancing the capabilities of pre-trained language models.
Parameter Symmetry and Formation of Latent Representations, Liu Ziyin (MIT, NTT Research) Symmetries exist abundantly in the loss function of neural networks. We characterize the learning dynamics of stochastic gradient descent (SGD) when exponential symmetries, a broad subclass of continuous symmetries, exist in the loss function. We establish that when gradient noises do not balance, SGD has the tendency to move the model parameters toward a point where noises from different directions are balanced. Here, a special type of fixed point in the constant directions of the loss function emerges as a candidate for solutions for SGD. As the main theoretical result, we prove that every parameter connects without loss function barrier to a unique noise-balanced fixed point. Lastly, we discuss how the theory can be leveraged to understand common phenomena in deep learning, such as progressive sharpening and flattening and the formation of latent representations.

Privacy in Machine Learning

Privacy Enhancing Technologies and Machine Learning. - a Match Made in Heaven, Anderson Nascimento (Visa) Data is the driving force behind the ongoing AI revolution. Sharing this data across various organizations and individuals has immense potential to transform diverse fields, from automating money laundering detection to collecting health data for drug development. However, privacy concerns and regulatory restrictions often make this level of data sharing impossible. Privacy-enhancing technologies like secure multiparty computation, fully homomorphic encryption, differential privacy, and federated learning, present a solution to this issue. They allow us to enjoy the advantages of data collaboration without compromising the privacy of those providing the data. They are also costly, affecting the runtime, accuracy, and fairness of the final result. This talk will introduce several of these privacy-enhancing technologies, focusing primarily on secure multiparty computation and differential privacy. We will demonstrate how these technologies can be integrated to achieve privacy-preserving machine learning, explain their limitations, and discuss how these research findings are being applied in real-world scenarios. We will also present current open research problems in this field. This talk is designed for a general audience. No prior experience with secure multiparty computation and differential privacy is necessary. Bio: Dr. Anderson C. A. Nascimento is an information theory and privacy expert with two decades of postdoctoral experience. His career milestones include being a Senior Director and Head of Security Research at Visa, an endowed professor and Computer Science chair at the University of Washington (Tacoma Campus), a permanent member of the Cryptography Lab at Nippon Telegraph and Telecom, a research scientist at Meta, and a professor with the University of Brasilia, Brazil. Anderson is a University of Tokyo Ph.D. graduate (2004), and he specializes in cryptography and privacy-preserving machine learning, developing novel techniques for various domains. He has won international research competitions, including the iDash competition for privacy-preserving genomic data, and he took second place in the PETS prize organized by the US and UK governments. He has 100+ publications, edited four books, supervised 26 theses and dissertations, and served as a panelist or reviewer for the National Science Foundation, The National Council of Research and Development of Brazil, and the European Science Foundation. He has also presented expert testimony to the Brazilian Supreme Court on privacy issues. His Erdos number is 3.

3:00–3:30 pm ET

Break

3:30–4:15 pm ET

Applications of Neural Networks to Mitigate Unique Challenges in Neutrino Experiments

Jessie Micallef, IAIFI Fellow

Abstract Details to come

4:15-5:00 pm ET

Equivariant Convolutional Networks & Group Steerable Kernels

Maurice Weiler, MIT

Abstract Equivariance imposes symmetry constraints on the connectivity of neural networks. This talk investigates the case of equivariant networks for feature vector fields or point clouds, which generally requires 1) spatial (convolutional) weight sharing, and 2) G-steerability constraints on the shared weights themselves. It gives an intuition for steerable convolution kernels, discusses how they can be implemented directly via harmonic bases or implicitly via equivariant MLPs, and clarifies the relation to typical message passing operations in equivariant MPNNs. A gauge theoretic formulation of equivariant CNNs and MPNNs shows that these models are not only equivariant under global transformations, but under more general local gauge transformations as well.

5:00-5:30 pm ET

Break

5:30-7:30 pm ET

Workshop Dinner

Friday, August 16, 2024

9:30–10:15 am ET

Neural Networks and Conformal Field Theory

Jim Halverson, Northeastern/IAIFI

Abstract I'll present an essential result in ML theory, explain how it motivates a new approach to field theory, and present some key findings. Next, I'll discuss new work, explaining a result of Dirac on the relationship between Lorentz invariance and conformal invariance, and how this can be applied in neural networks for constructing new conformal field theories.

10:15–11:00 am ET

Solving inverse problems with adaptive MCMC

Kaze Wong, Flatiron

Abstract Solving inverse problems is one of the most common tasks in natural science, which involves estimating the probability distribution of some parameters of interest given some data. MCMC has been the main workhorse in solving inverse problems for decades now. However, the relative inefficiency of MCMC compared to other methods such as simulation-based inference makes them not suitable for many modern datasets. In this talk, I will talk about adaptive MCMC, which combines MCMC and neural networks to improve the efficiency of MCMC while maintaining their guarantee.

11:00-11:30 am ET

Break

11:30 am–12:00 pm ET

Generative AI and the natural sciences: Governance strategies and historical perspectives

David Kaiser, MIT

Abstract Details to come

12:15–1:30 pm ET

Lunch

1:30–2:15 pm ET

Details to come

Dirk Englund, MIT

Abstract Details to come

2:15–3:00 pm ET

Details to come

Auralee Edelen, SLAC

Abstract Details to come

3:00–3:30 pm ET

Closing

Speakers

Speakers will be announced as they are confirmed.

Pulkit Agrawal
Assistant Professor, EECS, MIT
Earl Bellinger
Assistant Professor, Department of Astronomy, Yale University
Earl Bellinger
Research Associate, SLAC National Accelerator Laboratory
Carolina Cuesta-Lazaro
IAIFI Fellow, IAIFI
Nima Dehmamy
Research Staff Member, IBM Research
Auralee Edelen
Associate Scientist, SLAC National Accelerator Laboratory
Dirk Englund
Associate Professor, MIT
Jim Halverson
Associate Professor, Physics, Northeastern
Yonatan Kahn
Assistant Professor, Theoretical Physicist, UIUC
Verena Kain
Scientist, CERN
David Kaiser
Professor, History of Science/Physics, MIT
Patrick Kidger
Mathematician and Machine Learning Researcher, Cradle.bio
J. Nathan Kutz
Professor, University of Washington
Laurence Levasseur
Assistant Professor, University of Montreal
Ziming Liu
Grad Student, MIT
Alessandro Lovato
Physicist, Argonne National Laboratory
Lu Lu
Assistant Professor, Yale University
Jessie Micallef
IAIFI Fellow, IAIFI
Ayan Paul
Research Scientist, The Institute for Experiential AI - Northeastern University
Mariel Pettee
Chamberlain Postdoctoral Research Fellow, Lawrence Berkeley National Lab
Tilman Plehn
Professor, ITP - Heidelberg University
Matt Schwartz
Professor, Harvard
Melanie Weber
Assistant Professor of Applied Mathematics and of Computer Science, Harvard
Maurice Weiler
Deep Learning Researcher, University of Amsterdam
Christoph Weniger
Associate Professor, University of Amsterdam
Kaze Wong
Research Fellow, Flatiron Institute/CCA
Rose Yu
Assistant Professor, UC San Diego department of Computer Science and Engineering.


Accommodations

We have secured discounted rates at the following hotels:

  • Royal Sonesta Boston, 40 Edwin H Land Blvd, Cambridge, MA 02142.

    $224 nightly rate (1-2 people per room)

    Deadline to book: July 29

    Book now

Workshop attendees are also welcome to book dorms for a discounted rate at Boston University:

  • 10 Buick Street, Boston

    $97.50 nightly rate (1 person per room, shared bathroom with 1 other person)

    Book now

FAQ

  • Who can attend the Summer Workshop? Any researcher working at or interested in the intersection of physics and AI is encouraged to attend the Summer Workshop.
  • What is the cost to attend the Summer Worskhop? The registration fee for the Summer Workshop is 200 USD and includes a welcome dinner, as well as coffee breaks and snacks.
  • If I come to the Summer School, can I also attend the Workshop? Yes! We encourage you to stay for the Workshop and you can stay in the dorms for both events if you choose (at your expense).
  • Will the recordings of the talks be available? We plan to share the talks on our YouTube channel.

Submit a question or comment

2024 Organizing Committee

  • Fabian Ruehle, Chair (Northeastern University)
  • Demba Ba (Harvard)
  • Alex Gagliano (IAIFI Fellow)
  • Di Luo (IAIFI Fellow)
  • Polina Abratenko (Tufts)
  • Owen Dugan (MIT)
  • Sneh Pandya (Northeastern)
  • Yidi Qi (Northeastern)
  • Manos Theodosis (Harvard)
  • Sokratis Trifinopoulos (MIT)