IAIFI Papers

View high energy physics IAIFI papers on INSPIRE

Theoretical Physics

Pre-prints

The Frozen Phase of Heterotic F-theory Duality
Paul-Konstantin Oehlmann, Fabian Ruehle, Benjamin Sung
[ arXiv:2404.02191 ]

Abstract We study the duality between the Spin(32)/ℤ2 heterotic string without vector structure and F-theory with frozen singularities. We give a complete description in theories with 6d =(1,0) supersymmetry and identify the duals of Spin(32)/ℤ2-instantons on ADE singularities without vector structure in the frozen phase of F-theory using an ansatz introduced by Bhardwaj, Morrison, Tachikawa, and Tomasiello. As a consequence, we obtain a strongly coupled description of orbifold phases of type I string theory without vector structure, substantially expanding the list of known examples of 6d F-theory compactifications with frozen singularities. Supergravity theories can be fused from these instanton theories, in a way that commutes with switching off vector structure, which we use to propose new consistency checks via neutral hypermultiplet counting. Finally, we describe various Higgsings of this duality, and comment on constraints on higher form symmetries.

PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models
Siddharth Mishra-Sharma, Yiding Song, Jesse Thaler
[ arXiv:2403.08851 ]

Abstract We present PAPERCLIP (Proposal Abstracts Provide an Effective Representation for Contrastive Language-Image Pre-training), a method which associates astronomical observations imaged by telescopes with natural language using a neural network model. The model is fine-tuned from a pre-trained Contrastive Language-Image Pre-training (CLIP) model using successful observing proposal abstracts and corresponding downstream observations, with the abstracts optionally summarized via guided generation using large language models (LLMs). Using observations from the Hubble Space Telescope (HST) as an example, we show that the fine-tuned model embodies a meaningful joint representation between observations and natural language through tests targeting image retrieval (i.e., finding the most relevant observations using natural language queries) and description retrieval (i.e., querying for astrophysical object classes and use cases most relevant to a given observation). Our study demonstrates the potential for using generalist foundation models rather than task-specific models for interacting with astronomical data by leveraging text as an interface.

Moments of Clarity: Streamlining Latent Spaces in Machine Learning using Moment Pooling
Rikab Gambhir, Athis Osathapan, Jesse Thaler
[ arXiv:2403.08854 ]

Abstract Many machine learning applications involve learning a latent representation of data, which is often high-dimensional and difficult to directly interpret. In this work, we propose "Moment Pooling", a natural extension of Deep Sets networks which drastically decrease latent space dimensionality of these networks while maintaining or even improving performance. Moment Pooling generalizes the summation in Deep Sets to arbitrary multivariate moments, which enables the model to achieve a much higher effective latent dimensionality for a fixed latent dimension. We demonstrate Moment Pooling on the collider physics task of quark/gluon jet classification by extending Energy Flow Networks (EFNs) to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1 perform similarly to ordinary EFNs with higher latent dimension. This small latent dimension allows for the internal representation to be directly visualized and interpreted, which in turn enables the learned internal jet representation to be extracted in closed form.

On classical de Sitter solutions and parametric control
David Andriot, Fabian Ruehle
[ arXiv:2403.07065 ]

Abstract Finding string backgrounds with de Sitter spacetime, where all approximations and corrections are controlled, is an open problem. We revisit the search for de Sitter solutions in the classical regime for specific type IIB supergravity compactifications on group manifolds, an under-explored corner of the landscape that offers an interesting testing ground for swampland conjectures. While the supergravity de Sitter solutions we obtain numerically are ambiguous in terms of their classicality, we find an analytic scaling that makes four out of six compactification radii, as well as the overall volume, arbitrarily large. This potentially provides parametric control over corrections. If we could show that these solutions, or others to be found, are fully classical, they would constitute a counterexample to conjectures stating that asymptotic de Sitter solutions do not exist. We discuss this point in great detail.

Photonic probabilistic machine learning using quantum vacuum noise
Seou Choi, Yannick Salamin, Charles Roques-Carmes, Rumen Dangovski, Di Luo, Zhuo Chen, Michael Horodynski, Jamison Sloan, Shiekh Zia Uddin, Marin Soljacic
[ arXiv:2403.04731 ]

Abstract Probabilistic machine learning utilizes controllable sources of randomness to encode uncertainty and enable statistical modeling. Harnessing the pure randomness of quantum vacuum noise, which stems from fluctuating electromagnetic fields, has shown promise for high speed and energy-efficient stochastic photonic elements. Nevertheless, photonic computing hardware which can control these stochastic elements to program probabilistic machine learning algorithms has been limited. Here, we implement a photonic probabilistic computer consisting of a controllable stochastic photonic element - a photonic probabilistic neuron (PPN). Our PPN is implemented in a bistable optical parametric oscillator (OPO) with vacuum-level injected bias fields. We then program a measurement-and-feedback loop for time-multiplexed PPNs with electronic processors (FPGA or GPU) to solve certain probabilistic machine learning tasks. We showcase probabilistic inference and image generation of MNIST-handwritten digits, which are representative examples of discriminative and generative models. In both implementations, quantum vacuum noise is used as a random seed to encode classification uncertainty or probabilistic generation of samples. In addition, we propose a path towards an all-optical probabilistic computing platform, with an estimated sampling rate of ~ 1 Gbps and energy consumption of ~ 5 fJ/MAC. Our work paves the way for scalable, ultrafast, and energy-efficient probabilistic machine learning hardware.

Operator Learning Renormalization Group
Xiu-Zhe Luo, Di Luo, Roger G. Melko
[ arXiv:2403.03199 ]

Abstract n this paper, we present a general framework for quantum many-body simulations called the operator learning renormalization group (OLRG). Inspired by machine learning perspectives, OLRG is a generalization of Wilsons numerical renormalization group and Whites density matrix renormalization group, which recursively builds a simulatable system to approximate a target system of the same number of sites via operator maps. OLRG uses a loss function to minimize the error of a target property directly by learning the operator map in lieu of a state ansatz. This loss function is designed by a scaling consistency condition that also provides a provable bound for real-time evolution. We implement two versions of the operator maps for classical and quantum simulations. The former, which we call the Operator Matrix Map, can be implemented via neural networks on classical computers. The latter, which we call the Hamiltonian Expression Map, generates device pulse sequences to leverage the capabilities of quantum computing hardware. We illustrate the performance of both maps for calculating time-dependent quantities in the quantum Ising model Hamiltonian.

Multi-particle interpolating operators in quantum field theories with cubic symmetry
William Detmold, William I. Jay, Gurtej Kanwar, Phiala E. Shanahan, Michael L. Wagman
[ arXiv:2403.00672 ]

Abstract Numerical studies of lattice quantum field theories are conducted in finite spatial volumes, typically with cubic symmetry in the spatial coordinates. Motivated by these studies, this work presents a general algorithm to construct multi-particle interpolating operators for quantum field theories with cubic symmetry. The algorithm automates the block diagonalization required to combine multiple operators of definite linear momentum into irreducible representations of the appropriate little group. Examples are given for distinguishable and indistinguishable particles including cases with both zero and non-zero spin. An implementation of the algorithm is publicly available at this https URL.

Rigor with Machine Learning from Field Theory to the Poincaré Conjecture
Sergei Gukov, James Halverson, Fabian Ruehle
[ arXiv:2402.13321 ]

Abstract Machine learning techniques are increasingly powerful, leading to many breakthroughs in the natural sciences, but they are often stochastic, error-prone, and blackbox. How, then, should they be utilized in fields such as theoretical physics and pure mathematics that place a premium on rigor and understanding? In this Perspective we discuss techniques for obtaining rigor in the natural sciences with machine learning. Non-rigorous methods may lead to rigorous results via conjecture generation or verification by reinforcement learning. We survey applications of these techniques-for-rigor ranging from string theory to the smooth 4d Poincaré conjecture in low-dimensional topology. One can also imagine building direct bridges between machine learning theory and either mathematics or theoretical physics. As examples, we describe a new approach to field theory motivated by neural network theory, and a theory of Riemannian metric flows induced by neural network gradient descent, which encompasses Perelmans formulation of the Ricci flow that was utilized to resolve the 3d Poincaré conjecture.

Long-Distance Nuclear Matrix Elements for Neutrinoless Double-Beta Decay from Lattice QCD
Zohreh Davoudi, William Detmold, Zhenghao Fu, Anthony V. Grebe, William Jay, David Murphy, Patrick Oare, Phiala E. Shanahan, Michael L. Wagman
[ arXiv:2402.09362 ]

Abstract Neutrinoless double-beta (0νββ) decay is a heretofore unobserved process which, if observed, would imply that neutrinos are Majorana particles. Interpretations of the stringent experimental constraints on 0νββ-decay half-lives require calculations of nuclear matrix elements. This work presents the first lattice quantum-chromodynamics (LQCD) calculation of the matrix element for 0νββ decay in a multi-nucleon system, specifically the nn→ppee transition, mediated by a light left-handed Majorana neutrino propagating over nuclear-scale distances. This calculation is performed with quark masses corresponding to a pion mass of mπ=806 MeV at a single lattice spacing and volume. The statistically cleaner Σ−→Σ+ee transition is also computed in order to investigate various systematic uncertainties. The prospects for matching the results of LQCD calculations onto a nuclear effective field theory to determine a leading-order low-energy constant relevant for 0νββ decay with a light Majorana neutrino are investigated. This work, therefore, sets the stage for future calculations at physical values of the quark masses that, combined with effective field theory and nuclear many-body studies, will provide controlled theoretical inputs to experimental searches of 0νββ decay.

Real-time Dynamics of the Schwinger Model as an Open Quantum System with Neural Density Operators
Joshua Lin, Di Luo, Xiaojun Yao, Phiala E. Shanahan
[ arXiv:2402.06607 ]

Abstract Ab-initio simulations of multiple heavy quarks propagating in a Quark-Gluon Plasma are computationally difficult to perform due to the large dimension of the space of density matrices. This work develops machine learning algorithms to overcome this difficulty by approximating exact quantum states with neural network parametrisations, specifically Neural Density Operators. As a proof of principle demonstration in a QCD-like theory, the approach is applied to solve the Lindblad master equation in the 1+1d lattice Schwinger Model as an open quantum system. Neural Density Operators enable the study of in-medium dynamics on large lattice volumes, where multiple-string interactions and their effects on string-breaking and recombination phenomena can be studied. Thermal properties of the system at equilibrium can also be probed with these methods by variationally constructing the steady state of the Lindblad master equation. Scaling of this approach with system size is studied, and numerical demonstrations on up to 32 spatial lattice sites and with up to 3 interacting strings are performed.

Applications of flow models to the generation of correlated lattice QCD ensembles
Ryan Abbott, Aleksandar Botev, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
[ arXiv:2401.10874 ]

Abstract Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters. This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables. Three different proof-of-concept applications are demonstrated using a novel residual flow architecture: continuum limits of gauge theories, the mass dependence of QCD observables, and hadronic matrix elements based on the Feynman-Hellmann approach. In all three cases, it is shown that statistical uncertainties are significantly reduced when machine-learned flows are incorporated as compared with the same calculations performed with uncorrelated ensembles or direct reweighting.

Anomaly Detection in Collider Physics via Factorized Observables
Eric M. Metodiev, Jesse Thaler, Raymond Wynne
[ arXiv:2312.00119 ]

Abstract To maximize the discovery potential of high-energy colliders, experimental searches should be sensitive to unforeseen new physics scenarios. This goal has motivated the use of machine learning for unsupervised anomaly detection. In this paper, we introduce a new anomaly detection strategy called FORCE: factorized observables for regressing conditional expectations. Our approach is based on the inductive bias of factorization, which is the idea that the physics governing different energy scales can be treated as approximately independent. Assuming factorization holds separately for signal and background processes, the appearance of non-trivial correlations between low- and high-energy observables is a robust indicator of new physics. Under the most restrictive form of factorization, a machine-learned model trained to identify such correlations will in fact converge to the optimal new physics classifier. We test FORCE on a benchmark anomaly detection task for the Large Hadron Collider involving collimated sprays of particles called jets. By teasing out correlations between the kinematics and substructure of jets, our method can reliably extract percent-level signal fractions. This strategy for uncovering new physics adds to the growing toolbox of anomaly detection methods for collider physics with a complementary set of assumptions.

Safe but Incalculable: Energy-weighting is not all you need
Samuel Bright-Thonney, Benjamin Nachman, Jesse Thaler
[ arXiv:2311.07652 ]

Abstract Infrared and collinear (IRC) safety has long been used a proxy for robustness when developing new jet substructure observables. This guiding philosophy has been carried into the deep learning era, where IRC-safe neural networks have been used for many jet studies. For graph-based neural networks, the most straightforward way to achieve IRC safety is to weight particle inputs by their energies. However, energy-weighting by itself does not guarantee that perturbative calculations of machine-learned observables will enjoy small non-perturbative corrections. In this paper, we demonstrate the sensitivity of IRC-safe networks to non-perturbative effects, by training an energy flow network (EFN) to maximize its sensitivity to hadronization. We then show how to construct Lipschitz Energy Flow Networks (L-EFNs), which are both IRC safe and relatively insensitive to non-perturbative corrections. We demonstrate the performance of L-EFNs on generated samples of quark and gluon jets, and showcase fascinating differences between the learned latent representations of EFNs and L-EFNs.

T-Duality and Flavor Symmetries in Little String Theories
Hamza Ahmed, Paul-Konstantin Oehlmann, Fabian Ruehle
[ arXiv:2311.02168 ]

Abstract We explore the T-duality web of 6D Heterotic Little String Theories, focusing on flavor algebra reducing deformations. A careful analysis of the full flavor algebra, including Abelian factors, shows that the flavor rank is preserved under T-duality. This suggests a new T-duality invariant in addition to the Coulomb branch dimension and the two-group structure constants. We also engineer Little String Theories with non-simply laced flavor algebras, whose appearance we attribute to certain discrete 3-form fluxes in M-theory. Geometrically, these theories are engineered in F-theory with non-Kähler favorable K3 fibers. This geometric origin leads us to propose that freezing fluxes are preserved across T-duality. Along the way, we discuss various exotic models, including two inequivalent Spin(32)/ℤ2 models that are dual to the same E8×E8 theory, and a family of self-T-dual models.

Pairing-based graph neural network for simulating quantum materials
Di Luo, David D. Dai, Liang Fu
[ arXiv:2311.02143 ]

Abstract We develop a pairing-based graph neural network for simulating quantum many-body systems. Our architecture augments a BCS-type geminal wavefunction with a generalized pair amplitude parameterized by a graph neural network. Variational Monte Carlo with our neural network simultaneously provides an accurate, flexible, and scalable method for simulating many-electron systems. We apply this method to two-dimensional semiconductor electron-hole bilayers and obtain accurate results on a variety of interaction-induced phases, including the exciton Bose-Einstein condensate, electron-hole superconductor, and bilayer Wigner crystal. Our study demonstrates the potential of physically-motivated neural network wavefunctions for quantum materials simulations.

Lattice QCD Constraints on the Fourth Mellin Moment of the Pion Light Cone Distribution Amplitude using the HOPE method
William Detmold, Anthony V. Grebe, Issaku Kanamori, C.-J. David Lin, Robert J. Perry, Yong Zhao
[ arXiv:2311.01322 ]

Abstract The light-cone distribution amplitude (LCDA) of the pion contains information about the parton momentum carried by the quarks and is an important theoretical input for various predictions of exclusive processes at high energy, including the pion electromagnetic form factor. Progress towards constraining the fourth Mellin moment of the LCDA using the heavy-quark operator product expansion (HOPE) method is presented.

Metric Flows with Neural Networks
James Halverson, Fabian Ruehle
[ arXiv:2310.19870 ]

Abstract We develop a theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincaré conjecture. We apply these ideas to numerical Calabi-Yau metrics, including a discussion on the importance of feature learning.

Signal-to-noise improvement through neural network contour deformations for 3D SU(2) lattice gauge theory
William Detmold, Gurtej Kanwar, Yin Lin, Phiala E. Shanahan, Michael L. Wagman
[ arXiv:2309.00600 ]

Abstract Complex contour deformations of the path integral have been demonstrated to significantly improve the signal-to-noise ratio of observables in previous studies of two-dimensional gauge theories with open boundary conditions. In this work, new developments based on gauge fixing and a neural network definition of the deformation are introduced, which enable an effective application to theories in higher dimensions and with generic boundary conditions. Improvements of the signal-to-noise ratio by up to three orders of magnitude for Wilson loop measurements are shown in SU(2) lattice gauge theory in three spacetime dimensions.

Reconstructing S-matrix Phases with Machine Learning
Aurélien Dersy, Matthew D. Schwartz, Alexander Zhiboedov
[ arXiv:2308.09451 ]

Abstract An important element of the S-matrix bootstrap program is the relationship between the modulus of an S-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity constraint. We find that for a given modulus, when a phase exists it can generally be reconstructed to good accuracy with machine learning. Moreover, the loss of the reconstruction algorithm provides a good proxy for whether a given modulus can be consistent with unitarity at all. In addition, we study the question of whether multiple phases can be consistent with a single modulus, finding novel phase-ambiguous solutions. In particular, we find a new phase-ambiguous solution which pushes the known limit on such solutions significantly beyond the previous bound.

Score-based Diffusion Models for Generating Liquid Argon Time Projection Chamber Images
Zeviel Imani, Shuchin Aeron, Taritree Wongjirad
[ arXiv:2307.13687 ]

Abstract We show for the first time, high-fidelity generation of LArTPC-like data using a generative neural network. This demonstrates that methods developed for natural images do transfer to LArTPC-produced images which in contrast to natural images are globally sparse, but locally dense. We present the method we employ, which is a variant of score-based generative diffusion models. We evaluate the fidelity of the generated images using several different approaches that include using a variant of measures used to evaluate natural images, comparisons between high-dimensional distributions, and comparisons relevant to LArTPC experiments.

Neural Network Field Theories: Non-Gaussianity, Actions, and Locality
Mehmet Demirtas, James Halverson, Anindita Maiti, Matthew D. Schwartz, Keegan Stoner
[ arXiv:2307.03223 ]

Abstract Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-N) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in 1/N corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the 1/N-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, ϕ4 theory is realized as an infinite-N neural network field theory.

Hierarchical Neural Simulation-Based Inference Over Event Ensembles
Lukas Heinrich, Siddharth Mishra-Sharma, Chris Pollard, Philipp Windischhofer
[ arXiv:2306.12584 ]

Abstract When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).

SN2023ixf in Messier 101: A Variable Red Supergiant as the Progenitor Candidate to a Type II Supernova
Charles D. Kilpatrick, Ryan J. Foley, Wynn V. Jacobson-Galán, Anthony L. Piro, Stephen J. Smartt, Maria R. Drout, Alexander Gagliano, Christa Gall, Jens Hjorth, David O. Jones, Kaisey S. Mandel, Raffaella Margutti, Conor L. Ransome, V. Ashley Villar, David A. Coulter, Hua Gao, David Jacob Matthews, Yossef Zenati
[ arXiv:2306.04722 ]

Abstract We present pre-explosion optical and infrared (IR) imaging at the site of the type II supernova (SN II) 2023ixf in Messier 101 at 6.9 Mpc. We astrometrically registered a ground-based image of SN 2023ixf to archival Hubble Space Telescope (HST), Spitzer Space Telescope (Spitzer), and ground-based near-IR images. A single point source is detected at a position consistent with the SN at wavelengths ranging from HST R-band to Spitzer 4.5 μm. Fitting to blackbody and red supergiant (RSG) spectral-energy distributions (SEDs), we find that the source is anomalously cool with a significant mid-IR excess. We interpret this SED as reprocessed emission in a 8600 R⊙ circumstellar shell of dusty material with a mass ∼5×10−5M⊙ surrounding a log(L/L⊙)=4.74±0.07 and Teff=3920+200−160 K RSG. This luminosity is consistent with RSG models of initial mass 11 M⊙, depending on assumptions of rotation and overshooting. In addition, the counterpart was significantly variable in pre-explosion Spitzer 3.6 μm and 4.5 μm imaging, exhibiting ∼70% variability in both bands correlated across 9 yr and 29 epochs of imaging. The variations appear to have a timescale of 2.8 yr, which is consistent with κ-mechanism pulsations observed in RSGs, albeit with a much larger amplitude than RSGs such as α Orionis (Betelgeuse).

Quantum Computation and Simulation using Fermion-Pair Registers
Xiangkai Sun, Di Luo, Soonwon Choi
[ arXiv:2306.03905 ]

Abstract We propose and analyze an approach to realize quantum computation and simulation using fermionic particles under quantum gas microscopes. Our work is inspired by a recent experimental demonstration of large-scale quantum registers, where tightly localized fermion pairs are used to encode qubits exhibiting long coherence time and robustness against laser intensity noise. We describe how to engineer the SWAP gate and high-fidelity controlled-phase gates by adjusting the fermion hopping as well as Feshbach interaction strengths. Combined with previously demonstrated single-qubit rotations, these gates establish the computational universality of the system. Furthermore, we show that 2D quantum Ising Hamiltonians with tunable transverse and longitudinal fields can be efficient simulated by modulating Feshbach interaction strengths. We present a sample-efficient protocol to characterize engineered gates and Hamiltonian dynamics based on an improved classical shadow process tomography that requires minimal experimental controls. Our work opens up new opportunities to harness existing ultracold quantum gases for quantum information sciences.

First Impressions: Early-Time Classification of Supernovae using Host Galaxy Information and Shallow Learning
Alexander Gagliano, Gabriella Contardo, Daniel Foreman-Mackey, Alex I. Malz, Patrick D. Aleo
[ arXiv:2305.08894 ]

Abstract Substantial effort has been devoted to the characterization of transient phenomena from photometric information. Automated approaches to this problem have taken advantage of complete phase-coverage of an event, limiting their use for triggering rapid follow-up of ongoing phenomena. In this work, we introduce a neural network with a single recurrent layer designed explicitly for early photometric classification of supernovae. Our algorithm leverages transfer learning to account for model misspecification, host galaxy photometry to solve the data scarcity problem soon after discovery, and a custom weighted loss to prioritize accurate early classification. We first train our algorithm using state-of-the-art transient and host galaxy simulations, then adapt its weights and validate it on the spectroscopically-confirmed SNe Ia, SNe II, and SNe Ib/c from the Zwicky Transient Facility Bright Transient Survey. On observed data, our method achieves an overall accuracy of 82±2% within 3 days of an events discovery, and an accuracy of 87±5% within 30 days of discovery. At both early and late phases, our method achieves comparable or superior results to the leading classification algorithms with a simpler network architecture. These results help pave the way for rapid photometric and spectroscopic follow-up of scientifically-valuable transients discovered in massive synoptic surveys.

Normalizing flows for lattice gauge theory in arbitrary space-time dimension
Ryan Abbott, Michael S. Albergo, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Alexander G.D.G. Matthews, Sébastien Racanière, Ali Razavi, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
[ arXiv:2305.02402 ]

Abstract Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions. We report new algorithmic developments of gauge-equivariant flow architectures facilitating the generalization to higher-dimensional lattice geometries. Specifically, we discuss masked autoregressive transformations with tractable and unbiased Jacobian determinants, a key ingredient for scalable and asymptotically exact flow-based sampling algorithms. For concreteness, results from a proof-of-principle application to SU(3) lattice gauge theory in four space-time dimensions are reported.

Searching for ribbons with machine learning
Sergei Gukov, James Halverson, Ciprian Manolescu, Fabian Ruehle
[ arXiv:2304.09304 ]

Abstract We apply Bayesian optimization and reinforcement learning to a problem in topology: the question of when a knot bounds a ribbon disk. This question is relevant in an approach to disproving the four-dimensional smooth Poincaré conjecture; using our programs, we rule out many potential counterexamples to the conjecture. We also show that the programs are successful in detecting many ribbon knots in the range of up to 70 crossings.

Correlation function distributions for O(N) lattice field theories in the disordered phase
Cagin Yunus, William Detmold
[ arXiv:2304.03820 ]

Abstract Numerical computations in strongly-interacting quantum field theories are often performed using Monte-Carlo sampling methods. A key task in these calculations is to estimate the value of a given physical quantity from the distribution of stochastic samples that are generated using the Monte-Carlo method. Typically, the sample mean and sample variance are used to define the expectation values and uncertainties of computed quantities. However, the Monte-Carlo sample distribution contains more information than these basic properties and it is useful to investigate it more generally. In this work, the exact form of the probability distributions of two-point correlation functions at zero momentum in O(N) lattice field theories in the disordered phase and in infinite volume are determined. These distributions allow for a robust investigation of the efficacy of the Monte-Carlo sampling procedure and are shown also to allow for improved estimators of the target physical quantity to be constructed. The theoretical expectations are shown to agree with numerical calculations in the O(2) model.

Autoregressive Neural TensorNet: Bridging Neural Networks and Tensor Networks for Quantum Many-Body Simulationg
Zhuo Chen, Laker Newhouse, Eddie Chen, Di Luo, Marin Soljačić
[ arXiv:2304.01996 ]

Abstract Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology. However, due to the exponentially growing size of the Hilbert space with respect to the particle number, a direct simulation is intractable. While representing quantum states with tensor networks and neural networks are the two state-of-the-art methods for approximate simulations, each has its own limitations in terms of expressivity and optimization. To address these challenges, we develop a novel architecture, Autoregressive Neural TensorNet (ANTN), which bridges tensor networks and autoregressive neural networks. We show that Autoregressive Neural TensorNet parameterizes normalized wavefunctions with exact sampling, generalizes the expressivity of tensor networks and autoregressive neural networks, and inherits a variety of symmetries from autoregressive neural networks. We demonstrate our approach on the 2D J1-J2 Heisenberg model with different systems sizes and coupling parameters, outperforming both tensor networks and autoregressive neural networks. Our work opens up new opportunities for both scientific simulations and machine learning applications.

Artificial intelligence for artificial materials: moiré atom
Di Luo, Aidan P. Reddy, Trithep Devakul, Liang Fu
[ arXiv:2303.08162 ]

Abstract Moiré engineering in atomically thin van der Waals heterostructures creates artificial quantum materials with designer properties. We solve the many-body problem of interacting electrons confined to a moiré superlattice potential minimum (the moiré atom) using a 2D fermionic neural network. We show that strong Coulomb interactions in combination with the anisotropic moiré potential lead to striking ``Wigner molecule" charge density distributions observable with scanning tunneling microscopy.

Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows
Owen Dugan, Peter Y. Lu, Rumen Dangovski, Di Luo, Marin Soljačić
[ arXiv:2302.12235 ]

Abstract Studying the dynamics of open quantum systems holds the potential to enable breakthroughs both in fundamental physics and applications to quantum engineering and quantum computation. Due to the high-dimensional nature of the problem, customized deep generative neural networks have been instrumental in modeling the high-dimensional density matrix ρ, which is the key description for the dynamics of such systems. However, the complex-valued nature and normalization constraints of ρ, as well as its complicated dynamics, prohibit a seamless connection between open quantum systems and the recent advances in deep generative modeling. Here we lift that limitation by utilizing a reformulation of open quantum system dynamics to a partial differential equation (PDE) for a corresponding probability distribution Q, the Husimi Q function. Thus, we model the Q function seamlessly with off-the-shelf deep generative models such as normalizing flows. Additionally, we develop novel methods for learning normalizing flow evolution governed by high-dimensional PDEs, based on the Euler method and the application of the time-dependent variational principle. We name the resulting approach Q-Flow and demonstrate the scalability and efficiency of Q-Flow on open quantum system simulations, including the dissipative harmonic oscillator and the dissipative bosonic model. Q-Flow is superior to conventional PDE solvers and state-of-the-art physics-informed neural network solvers, especially in high-dimensional systems.

Geometry of contact: contact planning for multi-legged robots via spin models duality
Baxi Chong, Di Luo, Tianyu Wang, Gabriel Margolis, Juntao He, Pulkit Agrawal, Marin Soljačić, Daniel I. Goldman
[ arXiv:2302.03019 ]

Abstract Contact planning is crucial in locomoting systems.Specifically, appropriate contact planning can enable versatile behaviors (e.g., sidewinding in limbless locomotors) and facilitate speed-dependent gait transitions (e.g., walk-trot-gallop in quadrupedal locomotors). The challenges of contact planning include determining not only the sequence by which contact is made and broken between the locomotor and the environments, but also the sequence of internal shape changes (e.g., body bending and limb shoulder joint oscillation). Most state-of-art contact planning algorithms focused on conventional robots (e.g.biped and quadruped) and conventional tasks (e.g. forward locomotion), and there is a lack of study on general contact planning in multi-legged robots. In this paper, we show that using geometric mechanics framework, we can obtain the global optimal contact sequence given the internal shape changes sequence. Therefore, we simplify the contact planning problem to a graph optimization problem to identify the internal shape changes. Taking advantages of the spatio-temporal symmetry in locomotion, we map the graph optimization problem to special cases of spin models, which allows us to obtain the global optima in polynomial time. We apply our approach to develop new forward and sidewinding behaviors in a hexapod and a 12-legged centipede. We verify our predictions using numerical and robophysical models, and obtain novel and effective locomotion behaviors.

Simulating 2+1D Lattice Quantum Electrodynamics at Finite Density with Neural Flow Wavefunctions
Zhuo Chen, Di Luo, Kaiwen Hu, Bryan K. Clark
[ arXiv:2212.06835 ]

Abstract We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions. The gauge field is represented by a neural network which parameterizes a discretized flow-based transformation of the amplitude while the fermionic sign structure is represented by a neural net backflow. This approach directly represents the U(1) degree of freedom without any truncation, obeys Guass's law by construction, samples autoregressively avoiding any equilibration time, and variationally simulates Gauge-Fermion systems with sign problems accurately. In this model, we investigate confinement and string breaking phenomena in different fermion density and hopping regimes. We study the phase transition from the charge crystal phase to the vacuum phase at zero density, and observe the phase seperation and the net charge penetration blocking effect under magnetic interaction at finite density. In addition, we investigate a magnetic phase transition due to the competition effect between the kinetic energy of fermions and the magnetic energy of the gauge field. With our method, we further note potential differences on the order of the phase transitions between a continuous U(1) system and one with finite truncation. Our state-of-the-art neural network approach opens up new possibilities to study different gauge theories coupled to dynamical matter in higher dimensions.

Gauge Equivariant Neural Networks for 2+1D U(1) Gauge Theory Simulations in Hamiltonian Formulation
Di Luo, Shunyue Yuan, James Stokes, Bryan K. Clark
[ arXiv:2211.03198 ]

Abstract Gauge Theory plays a crucial role in many areas in science, including high energy physics, condensed matter physics and quantum information science. In quantum simulations of lattice gauge theory, an important step is to construct a wave function that obeys gauge symmetry. In this paper, we have developed gauge equivariant neural network wave function techniques for simulating continuous-variable quantum lattice gauge theories in the Hamiltonian formulation. We have applied the gauge equivariant neural network approach to find the ground state of 2+1-dimensional lattice gauge theory with U(1) gauge group using variational Monte Carlo. We have benchmarked our approach against the state-of-the-art complex Gaussian wave functions, demonstrating improved performance in the strong coupling regime and comparable results in the weak coupling regime.

QuACK: Accelerating Gradient-Based Quantum Optimization with Koopman Operator Learning
Di Luo, Jiayu Shen, Rumen Dangovski, Marin Soljačić
[ arXiv:2211.01365 ]

Abstract Finding efficient optimization methods plays an important role for quantum optimization and quantum machine learning on near-term quantum computers. While backpropagation on classical computers is computationally efficient, obtaining gradients on quantum computers is not, because the computational complexity usually scales with the number of parameters and measurements. In this paper, we connect Koopman operator theory, which has been successful in predicting nonlinear dynamics, with natural gradient methods in quantum optimization. We propose a data-driven approach using Koopman operator learning to accelerate quantum optimization and quantum machine learning. We develop two new families of methods: the sliding window dynamic mode decomposition (DMD) and the neural DMD for efficiently updating parameters on quantum computers. We show that our methods can predict gradient dynamics on quantum computers and accelerate the variational quantum eigensolver used in quantum optimization, as well as quantum machine learning. We further implement our Koopman operator learning algorithm on a real IBM quantum computer and demonstrate their practical effectiveness.

Learning to Optimize Quasi-Newton Methods
Isaac Liao, Rumen R. Dangovski, Jakob N. Foerster, Marin Soljačić
[ arXiv:2210.06171 ]

Abstract We introduce a novel machine learning optimizer called LODO, which online meta-learns an implicit inverse Hessian of the loss as a subroutine of quasi-Newton optimization. Our optimizer merges Learning to Optimize (L2O) techniques with quasi-Newton methods to learn neural representations of symmetric matrix vector products, which are more flexible than those in other quasi-Newton methods. Unlike other L2O methods, ours does not require any meta-training on a training task distribution, and instead learns to optimize on the fly while optimizing on the test task, adapting to the local characteristics of the loss landscape while traversing it. Theoretically, we show that our optimizer approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians. We experimentally verify our algorithm's performance in the presence of noise, and show that simpler alternatives for representing the inverse Hessians worsen performance. Lastly, we use our optimizer to train a semi-realistic deep neural network with 95k parameters, and obtain competitive results against standard neural network optimizers.

On the Importance of Calibration in Semi-supervised Learning
Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava
[ arXiv:2210.04783 ]

Abstract State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods are optimized for model performance, with little focus directed to improve model calibration. In this work, we empirically demonstrate that model calibration is strongly correlated with model performance and propose to improve calibration via approximate Bayesian techniques. We introduce a family of new SSL models that optimizes for calibration and demonstrate their effectiveness across standard vision benchmarks of CIFAR-10, CIFAR-100 and ImageNet, giving up to 15.9% improvement in test accuracy. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science.

Sampling QCD field configurations with gauge-equivariant flow models
Ryan Abbott, Michael S. Albergo, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Alexander G. D. G. Matthews, Sébastien Racanière, Ali Razavi, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
[ arXiv:2208.03832 ]

Abstract Machine learning methods based on normalizing flows have been shown to address important challenges, such as critical slowing-down and topological freezing, in the sampling of gauge field configurations in simple lattice field theories. A critical question is whether this success will translate to studies of QCD. This Proceedings presents a status update on advances in this area. In particular, it is illustrated how recently developed algorithmic components may be combined to construct flow-based sampling algorithms for QCD in four dimensions. The prospects and challenges for future use of this approach in at-scale applications are summarized.

Strong Lensing Source Reconstruction Using Continuous Neural Fields
Siddharth Mishra-Sharma, Ge Yang
[ arXiv:2206.14820 ]

Abstract From the nature of dark matter to the rate of expansion of our Universe, observations of distant galaxies distorted through strong gravitational lensing have the potential to answer some of the major open questions in astrophysics. Modeling galaxy-galaxy strong lensing observations presents a number of challenges as the exact configuration of both the background source and foreground lens galaxy is unknown. A timely call, prompted by a number of upcoming surveys anticipating high-resolution lensing images, demands methods that can efficiently model lenses at their full complexity. In this work, we introduce a method that uses continuous neural fields to non-parametrically reconstruct the complex morphology of a source galaxy while simultaneously inferring a distribution over foreground lens galaxy configurations. We demonstrate the efficacy of our method through experiments on simulated data targeting high-resolution lensing images similar to those anticipated in near-future astrophysical surveys.

Simplifying Polylogarithms with Machine Learning
Aurélien Dersy, Matthew D. Schwartz, Xiaoyuan Zhang
[ arXiv:2206.04115 ]

Abstract Polylogrithmic functions, such as the logarithm or dilogarithm, satisfy a number of algebraic identities. For the logarithm, all the identities follow from the product rule. For the dilogarithm and higher-weight classical polylogarithms, the identities can involve five functions or more. In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from Feynman integrals. Although the initial expressions resulting from the integration usually simplify, it is often difficult to know which identities to apply and in what order. To address this bottleneck, we explore to what extent machine learning methods can help. We consider both a reinforcement learning approach, where the identities are analogous to moves in a game, and a transformer network approach, where the problem is viewed analogously to a language-translation task. While both methods are effective, the transformer network appears more powerful and holds promise for practical use in symbolic manipulation tasks in mathematical physics.

Identifying equivalent Calabi–Yau topologies: A discrete challenge from math and physics for machine learning
Vishnu Jejjala, Washington Taylor, Andrew Turner
[ arXiv:2202.07590 ]

Abstract We review briefly the characteristic topological data of Calabi–Yau threefolds and focus on the question of when two threefolds are equivalent through related topological data. This provides an interesting test case for machine learn- ing methodology in discrete mathematics problems motivated by physics.

Finite-Volume Pionless Effective Field Theory for Few-Nucleon Systems with Differentiable Programming
Xiangkai Sun, William Detmold, Di Luo, Phiala E. Shanahan
[ arXiv:2202.03530 ]

Abstract Finite-volume pionless effective field theory provides an efficient framework for the extrapolation of nuclear spectra and matrix elements calculated at finite volume in lattice QCD to infinite volume, and to nuclei with larger atomic number. In this work, it is demonstrated how this framework may be implemented via a set of correlated Gaussian wavefunctions optimised using differentiable programming and via solution of a generalised eigenvalue problem. This approach is shown to be significantly more efficient than a stochastic implementation of the variational method based on the same form of correlated Gaussian wavefunctions, yielding comparably accurate representations of the ground-state wavefunctions with an order of magnitude fewer terms. The efficiency of representation allows such calculations to be extended to larger systems than in previous work. The method is demonstrated through calculations of the binding energies of nuclei with atomic number A∈{2,3,4} in finite volume, matched to lattice QCD calculations at quark masses corresponding to mπ=806 MeV, and infinite-volume effective field theory calculations of A∈{2,3,4,5,6} systems based on this matching.

Building Quantum Field Theories Out of Neurons
James Halverson
[ arXiv:2112.04527 ]

Abstract An approach to field theory is studied in which fields are comprised of N constituent random neurons. Gaussian theories arise in the infinite-N limit when neurons are independently distributed, via the Central Limit Theorem, while interactions arise due to finite-N effects or non-independently distributed neurons. Euclidean-invariant ensembles of neurons are engineered, with tunable two-point function, yielding families of Euclidean-invariant field theories. Some Gaussian, Euclidean invariant theories are reflection positive, which allows for analytic continuation to a Lorentz-invariant quantum field theory. Examples are presented that yield dual theories at infinite-N, but have different symmetries at finite-N. Landscapes of classical field configurations are determined by local maxima of parameter distributions. Predictions arise from mixed field-neuron correlators. Near-Gaussianity is exhibited at large-N, potentially explaining a feature of field theories in Nature.

Quantum reservoir computing using arrays of Rydberg atoms
Rodrigo Araiza Bravo, Khadijeh Najafi, Xun Gao, Susanne F. Yelin
[ arXiv:2111.10956 ]

Abstract Quantum computing promises to provide machine learning with computational advantages. However, noisy intermediate-scale quantum (NISQ) devices pose engineering challenges to realizing quantum machine learning (QML) advantages. Recently, a series of QML computational models inspired by the noise-tolerant dynamics on the brain have emerged as a means to circumvent the hardware limitations of NISQ devices. In this article, we introduce a quantum version of a recurrent neural network (RNN), a well-known model for neural circuits in the brain. Our quantum RNN (qRNN) makes use of the natural Hamiltonian dynamics of an ensemble of interacting spin-1/2 particles as a means for computation. In the limit where the Hamiltonian is diagonal, the qRNN recovers the dynamics of the classical version. Beyond this limit, we observe that the quantum dynamics of the qRNN provide it quantum computational features that can aid it in computation. To this end, we study a qRNN based on arrays of Rydberg atoms, and show that the qRNN is indeed capable of replicating the learning of several cognitive tasks such as multitasking, decision making, and long-term memory by taking advantage of several key features of this platform such as interatomic species interactions, and quantum many-body scars.

Inferring dark matter substructure with astrometric lensing beyond the power spectrum
Siddharth Mishra-Sharma
[ arXiv:2110.01620 ]

Abstract Astrometry -- the precise measurement of positions and motions of celestial objects -- has emerged as a promising avenue for characterizing the dark matter population in our Galaxy. By leveraging recent advances in simulation-based inference and neural network architectures, we introduce a novel method to search for global dark matter-induced gravitational lensing signatures in astrometric datasets. Our method based on neural likelihood-ratio estimation shows significantly enhanced sensitivity to a cold dark matter population and more favorable scaling with measurement noise compared to existing approaches based on two-point correlation statistics, establishing machine learning as a powerful tool for characterizing dark matter using astrometric data.

Flow-based sampling for multimodal distributions in lattice field theory
Daniel C. Hackett, Chung-Chun Hsieh, Michael S. Albergo, Denis Boyda, Jiunn-Wei Chen, Kai-Feng Chen, Kyle Cranmer, Gurtej Kanwar, Phiala E. Shanahan
[ arXiv:2107.00734 ]

Abstract Recent results have demonstrated that samplers constructed with flow-based generative models are a promising new approach for configuration generation in lattice field theory. In this paper, we present a set of methods to construct flow models for targets with multiple separated modes (i.e. theories with multiple vacua). We demonstrate the application of these methods to modeling two-dimensional real scalar field theory in its symmetry-broken phase. In this context we investigate the performance of different flow-based sampling algorithms, including a composite sampling algorithm where flow-based proposals are occasionally augmented by applying updates using traditional algorithms like HMC.

Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators
Anindita Maiti, Keegan Stoner, James Halverson
[ arXiv:2106.00694 ]

Abstract Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.

Introduction to Normalizing Flows for Lattice Field Theory
Michael S. Albergo, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Kyle Cranmer, Sébastien Racanière, Danilo Jimenez Rezende, and Phiala E. Shanahan
[ arXiv:2101.08176 ]

Abstract This notebook tutorial demonstrates a method for sampling Boltzmann distributions of lattice field theories using a class of machine learning models known as normalizing flows. The ideas and approaches proposed in arXiv:1904.12072, arXiv:2002.02428, and arXiv:2003.06413 are reviewed and a concrete implementation of the framework is presented. We apply this framework to a lattice scalar field theory and to U(1) gauge theory, explicitly encoding gauge symmetries in the flow-based approach to the latter. This presentation is intended to be interactive and working with the attached Jupyter notebook is recommended.

Published

Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics
Kyle Cranmer, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Phiala E. Shanahan
Nature Reviews Physics, 2023, Volume 5 [ arXiv:2309.01156 ]

Abstract Sampling from known probability distributions is a ubiquitous task in computational science, underlying calculations in domains from linguistics to biology and physics. Generative machine-learning (ML) models have emerged as a promising tool in this space, building on the success of this approach in applications such as image, text, and audio generation. Often, however, generative tasks in scientific domains have unique structures and features -- such as complex symmetries and the requirement of exactness guarantees -- that present both challenges and opportunities for ML. This Perspective outlines the advances in ML-based sampling motivated by lattice quantum field theory, in particular for the theory of quantum chromodynamics. Enabling calculations of the structure and interactions of matter from our most fundamental understanding of particle physics, lattice quantum chromodynamics is one of the main consumers of open-science supercomputing worldwide. The design of ML algorithms for this application faces profound challenges, including the necessity of scaling custom ML architectures to the largest supercomputers, but also promises immense benefits, and is spurring a wave of development in ML-based sampling more broadly. In lattice field theory, if this approach can realize its early promise it will be a transformative step towards first-principles physics calculations in particle, nuclear and condensed matter physics that are intractable with traditional approaches.

Subhalo effective density slope measurements from HST strong lensing data with neural likelihood-ratio estimation
Gemma Zhang, Atınç Çağan Şengül, Cora Dvorkin
Monthly Notices of the Royal Astronomical Society, 2024, Volume 527, Issue 2 [ arXiv:2308.09739 ]

Abstract Examining the properties of subhalos with strong gravitational lensing images can shed light on the nature of dark matter. From upcoming large-scale surveys, we expect to discover orders of magnitude more strong lens systems that can be used for subhalo studies. To optimally extract information from a large number of strong lensing images, machine learning provides promising avenues for efficient analysis that is unachievable with traditional analysis methods, but application of machine learning techniques to real observations is still limited. We build upon previous work, which uses a neural likelihood-ratio estimator, to constrain the effective density slopes of subhalos and demonstrate the feasibility of this method on real strong lensing observations. To do this, we implement significant improvements to the forward simulation pipeline and undertake careful model evaluation using simulated images. Ultimately, we use our trained model to predict the effective subhalo density slope from combining a set of strong lensing images taken by the extit{Hubble Space Telescope}. We found the subhalo slope measurement of this set of observations to be steeper than the slope predictions of cold dark matter subhalos. Our result adds to several previous works that also measured high subhalo slopes in observations. Although a possible explanation for this is that subhalos with steeper slopes are easier to detect due to selection effects and thus contribute to statistical bias, our result nevertheless points to the need for careful analysis of more strong lensing observations from future surveys.

Data Compression and Inference in Cosmology with Self-Supervised Machine Learning
Aizhan Akhmetzhanova, Siddharth Mishra-Sharma, Cora Dvorkin
Monthly Notices of the Royal Astronomical Society, 2023, Volume 527, Issue 3 [ arXiv:2308.09751 ]

Abstract The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations. Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries, which can be used for a variety of downstream tasks, including precise and accurate parameter inference. We demonstrate how this paradigm can be used to construct summary representations that are insensitive to prescribed systematic effects, such as the influence of baryonic physics. Our results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.

Gravitational action for a massive Majorana fermion in 2d quantum gravity
Corinne de Lacroix, Harold Erbin, Vincent Lahoche
Journal of High Energy Physics, 2024, Volume 2024, Article number 68 [ arXiv:2308.08342 ]

Abstract We compute the gravitational action of a free massive Majorana fermion coupled to two-dimensional gravity on compact Riemann surfaces of arbitrary genus. The structure is similar to the case of the massive scalar. The small-mass expansion of the gravitational yields the Liouville action at zeroth order, and we can identify the Mabuchi action at first order. While the massive Majorana action is a conformal deformation of the massless Majorana CFT, we find an action different from the one given by the David-Distler-Kawai (DDK) ansatz.

Lattice quantum chromodynamics at large isospin density: 6144 pions in a box
Ryan Abbott, William Detmold, Fernando Romero-López, Zohreh Davoudi, Marc Illa, Assumpta Parreño, Robert J. Perry, Phiala E. Shanahan, Michael L. Wagman
APS Journals 2023, Volume 108, Issue 11 [ arXiv:2307.15014 ]

Abstract We present an algorithm to compute correlation functions for systems with the quantum numbers of many identical mesons from lattice quantum chromodynamics (QCD). The algorithm is numerically stable and allows for the computation of n-pion correlation functions for n∈{1,…,N} using a single N×N matrix decomposition, improving on previous algorithms. We apply the algorithm to calculations of correlation functions with up to 6144 π+s using two ensembles of gauge field configurations generated with quark masses corresponding to a pion mass mπ=170 MeV and spacetime volumes of (4.43×8.8) fm4 and (5.83×11.6) fm4. We also discuss statistical techniques for the analysis of such systems, in which the correlation functions vary over many orders of magnitude. In particular, we observe that the many-pion correlation functions are well approximated by log-normal distributions, allowing the extraction of the energies of these systems. Using these energies, the large-isospin-density, zero-baryon-density region of the QCD phase diagram is explored. A peak is observed in the energy density at an isospin chemical potential μI∼1.5mπ, signalling the transition into a Bose-Einstein condensed phase. The isentropic speed of sound in the medium is seen to exceed the ideal-gas (conformal) limit (c2s≤1/3) over a wide range of chemical potential before falling towards the asymptotic expectation at μI∼15mπ. These, and other thermodynamic observables, indicate that the isospin chemical potential must be large for the system to be well described by an ideal gas or perturbative QCD.

A Spectral Metric for Collider Geometry
Andrew J. Larkoski, Jesse Thaler
Journal of High Energy Physics 2023, Volume 2023, article number 107 [ arXiv:2305.03751 ]

Abstract By quantifying the distance between two collider events, one can triangulate a metric space and reframe collider data analysis as computational geometry. One popular geometric approach is to first represent events as an energy flow on an idealized celestial sphere and then define the metric in terms of optimal transport in two dimensions. In this paper, we advocate for representing events in terms of a spectral function that encodes pairwise particle angles and products of particle energies, which enables a metric distance defined in terms of one-dimensional optimal transport. This approach has the advantage of automatically incorporating obvious isometries of the data, like rotations about the colliding beam axis. It also facilitates first-principles calculations, since there are simple closed-form expressions for optimal transport in one dimension. Up to isometries and event sets of measure zero, the spectral representation is unique, so the metric on the space of spectral functions is a metric on the space of events. At lowest order in perturbation theory in electron-positron collisions, our metric is simply the summed squared invariant masses of the two event hemispheres. Going to higher orders, we present predictions for the distribution of metric distances between jets in fixed-order and resummed perturbation theory as well as in parton-shower generators. Finally, we speculate on whether the spectral approach could furnish a useful metric on the space of quantum field theories.

Level Crossings, Attractor Points and Complex Multiplication
Hamza Ahmed, Fabian Ruehle
Journal of High Energy Physics, 2023, Volume 2023, Article number 164 [ arXiv:2304.00027 ]

Abstract We study the complex structure moduli dependence of the scalar Laplacian eigenmodes for one-parameter families of Calabi-Yau n-folds in P^{n+1}. It was previously observed that some eigenmodes get lighter while others get heavier as a function of these moduli, which leads to eigenvalue crossing. We identify the cause for this behavior for the torus. We then show that at points in a sublocus of complex structure moduli space where Laplacian eigenmodes cross, the torus has complex multiplication. We speculate that the generalization to arbitrary Calabi-Yau manifolds could be that level crossing is related to rank one attractor points. To test this, we compute the eigenmodes numerically for the quartic K3 and the quintic threefold, and match crossings to CM and attractor points in these varieties. To quantify the error of our numerical methods, we also study the dependence of the numerical spectrum on the quality of the Calabi-Yau metric approximation, the number of points sampled from the Calabi-Yau variety, the truncation of the eigenbasis, and the the distance from degeneration points in complex structure moduli space.

Exploring the CP-violating Dashen phase in the Schwinger model with tensor networks
Lena Funcke, Karl Jansen, Stefan Kühn
Physical Review D, 2023, Volume 108, Issue 1 [ arXiv:2303.03799 ]

Abstract We numerically study the phase structure of the two-flavor Schwinger model with matrix product states, focusing on the (1+1)-dimensional analog of the CP-violating Dashen phase in QCD. We simulate the two-flavor Schwinger model around the point where the positive mass of one fermion flavor corresponds to the negative mass of the other fermion flavor, which is a sign-problem afflicted regime for conventional Monte Carlo techniques. Our results indicate that the model undergoes a CP-violating Dashen phase transition at this point, which manifests itself in abrupt changes of the average electric field and the analog of the pion condensate in the model. Studying the scaling of the bipartite entanglement entropy as a function of the volume, we find clear indications that this transition is not of first order.

Mehmet Demirtas, Manki Kim, Liam McAllister, Jakob Moritz, Andres Rios-Tascon
David Shih, Matthew R. Buckley, Lina Necib
Journal of High Energy Physics, 2024, Volume 2024, Article number 184 [ arXiv:2303.00757 ]

Abstract We present an efficient algorithm for computing the prepotential in compactifications of type II string theory on mirror pairs of Calabi-Yau threefolds in toric varieties. Applying this method, we exhibit the first systematic computation of genus-zero Gopakumar-Vafa invariants in compact threefolds with many moduli, including examples with up to 491 vector multiplets.

SHAPER: Can You Hear the Shape of a Jet?
Demba Ba, Akshunna S. Dogra, Rikab Gambhir, Abiy Tasissa, Jesse Thaler
Journal of High Energy Physics, 2023, Volume 2023, Article 195 [ arXiv:2302.12266 ]

Abstract The identification of interesting substructures within jets is an important tool for searching for new physics and probing the Standard Model at colliders. Many of these substructure tools have previously been shown to take the form of optimal transport problems, in particular the Energy Mover's Distance (EMD). In this work, we show that the EMD is in fact the natural structure for comparing collider events, which accounts for its recent success in understanding event and jet substructure. We then present a Shape Hunting Algorithm using Parameterized Energy Reconstruction (SHAPER), which is a general framework for defining and computing shape-based observables. SHAPER generalizes N-jettiness from point clusters to any extended, parametrizable shape. This is accomplished by efficiently minimizing the EMD between events and parameterized manifolds of energy flows representing idealized shapes, implemented using the dual-potential Sinkhorn approximation of the Wasserstein metric. We show how the geometric language of observables as manifolds can be used to define novel observables with built-in infrared-and-collinear safety. We demonstrate the efficacy of the SHAPER framework by performing empirical jet substructure studies using several examples of new shape-based observables.

EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets
Erik Buhmann, Gregor Kasieczka, Jesse Thaler
SciPost Physics, 2023, Volume 15, Issue 4 [ arXiv:2301.08128 ]

Abstract With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.

Comparing Point Cloud Strategies for Collider Event Classification
Peter Onyisi, Delon Shen, Jesse Thaler
Physical Review D, 2023, Volume 108, Issue 1 [ arXiv:2212.10659 ]

Abstract In this paper, we compare several event classification architectures defined on the point cloud representation of collider events. These approaches, which are based on the frameworks of deep sets and edge convolutions, circumvent many of the difficulties associated with traditional feature engineering. To benchmark our architectures against more traditional event classification strategies, we perform a case study involving Higgs boson decays to tau leptons. We find a 2.5 times increase in performance compared to a baseline ATLAS analysis with engineered features. Our point cloud architectures can be viewed as simplified versions of graph neural networks, where each particle in the event corresponds to a graph node. In our case study, we find the best balance of performance and computational cost for simple pairwise architectures, which are based on learned edge features.

Characterizing 4-string contact interaction using machine learning
Harold Erbin, Atakan Hilmi Fırat
Journal of High Energy Physics, 2024, Article 16 [ arXiv:2211.09129 ]

Abstract The geometry of 4-string contact interaction of closed string field theory is characterized using machine learning. We obtain Strebel quadratic differentials on 4-punctured spheres as a neural network by performing unsupervised learning with a custom-built loss function. This allows us to solve for local coordinates and compute their associated mapping radii numerically. We also train a neural network distinguishing vertex from Feynman region. As a check, 4-tachyon contact term in the tachyon potential is computed and a good agreement with the results in the literature is observed. We argue that our algorithm is manifestly independent of number of punctures and scaling it to characterize the geometry of n-string contact interaction is feasible.

Aspects of scaling and scalability for flow-based sampling of lattice QCD
Ryan Abbott, Michael S. Albergo, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Alexander G. D. G. Matthews, Sébastien Racanière, Ali Razavi, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
The European Physical Journal A 2023, Volume 59, Article Number 257 [ arXiv:2211.07541 ]

Abstract Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally.

Large-time correlation functions in bosonic lattice field theories
Cagin Yunus, William Detmold
Physics Letter B, 2023, Volume 840, 137890 [ arXiv:2210.15789 ]

Abstract Large-time correlation functions have a pivotal role in extracting particle masses from Euclidean lattice field theory calculations, however little is known about the statistical properties of these quantities. In this work, the asymptotic form of the distributions of the correlation functions at vanishing momentum is determined for bosonic interacting lattice field theories with a unique gapped vacuum. It is demonstrated that the deviations from the asymptotic form at large Euclidean times can be utilized to determine the spectrum of the theory.

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure
Samuel Kim, Peter Y. Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, Marin Soljačić
Transactions on Machine Learning Research 2022 [ ]

Abstract Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely a black-box. The data may have some known structure (e.g. symmetries) and/or the data generation process may be a composite process that yields useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and do not easily accommodate known structure. Instead, we use Bayesian neural networks, a class of scalable and flexible surrogate models with inductive biases, to extend BO to complex, structured problems with high dimensionality. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that neural networks often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost.

Symmetries of Calabi-Yau Prepotentials with Isomorphic Flops
Andre Lukas, Fabian Ruehle
Journal of High Energy 2023, Article 175 [ arXiv:2210.09369 ]

Abstract Calabi-Yau threefolds with infinitely many flops to isomorphic manifolds have an extended Kahler cone made up from an infinite number of individual Kahler cones. These cones are related by reflection symmetries across flop walls. We study the implications of this cone structure for mirror symmetry, by considering the instanton part of the prepotential in Calabi-Yau threefolds. We show that such isomorphic flops across facets of the Kahler cone boundary give rise to symmetry groups isomorphic to Coxeter groups. In the dual Mori cone, non-flopping curve classes that are identified under these groups have the same Gopakumar-Vafa invariants. This leads to instanton prepotentials invariant under Coxeter groups, which we make manifest by introducing appropriate invariant functions. For some cases, these functions can be expressed in terms of theta functions whose appearance can be linked to an elliptic fibration structure of the Calabi-Yau manifold.

Electric-Magnetic Duality in a Class of G2-Compactifications of M-theory
James Halverson, Benjamin Sung, Jiahua Tian
Journal of High Energy Physics, 2023, Volume 2023, Article 89 [ arXiv:2210.08628 ]

Abstract We study electric-magnetic duality in compactifications of M-theory on twisted connected sum (TCS) G2 manifolds via duality with F-theory. Specifically, we study the physics of the D3-branes in F-theory compactified on a Calabi-Yau fourfold Y, dual to a compactification of M-theory on a TCS G2 manifold X. =2 supersymmetry is restored in an appropriate geometric limit. In that limit, we demonstrate that the dual of D3-branes probing seven-branes corresponds to the shrinking of certain surfaces and curves, yielding light particles that may carry both electric and magnetic charges. We provide evidence that the Minahan-Nemeschansky theories with En flavor symmetry may be realized in this way. The SL(2,ℤ) monodromy of the 3/7-brane system is dual to a Fourier-Mukai transform of the dual IIA/M-theory geometry in this limit, and we extrapolate this monodromy action to the global compactification. Away from the limit, the theory is broken to =1 supersymmetry by a D-term.

Degeneracy Engineering for Classical and Quantum Annealing: A Case Study of Sparse Linear Regression in Collider Physics
Eric R. Anschuetz, Lena Funcke, Patrick T. Komiske, Serhii Kryhin, Jesse Thaler
Physical Review D, Volume 106, Article 056008 [ arXiv:2205.10375 ]

Abstract Classical and quantum annealing are computing paradigms that have been proposed to solve a wide range of optimization problems. In this paper, we aim to enhance the performance of annealing algorithms by introducing the technique of degeneracy engineering, through which the relative degeneracy of the ground state is increased by modifying a subset of terms in the objective Hamiltonian. We illustrate this novel approach by applying it to the example of ℓ0-norm regularization for sparse linear regression, which is in general an NP-hard optimization problem. Specifically, we show how to cast ℓ0-norm regularization as a quadratic unconstrained binary optimization (QUBO) problem, suitable for implementation on annealing platforms. As a case study, we apply this QUBO formulation to energy flow polynomials in high-energy collider physics, finding that degeneracy engineering substantially improves the annealing performance. Our results motivate the application of degeneracy engineering to a variety of regularized optimization problems.

Discovering Conservation Laws using Optimal Transport and Manifold Learning
Peter Y. Lu, Rumen Dangovski, Marin Soljačić
Nature Communications [ arXiv:2208.14995 ]

Abstract Conservation laws are key theoretical and practical tools for understanding, characterizing, and modeling nonlinear dynamical systems. However, for many complex dynamical systems, the corresponding conserved quantities are difficult to identify, making it hard to analyze their dynamics and build efficient, stable predictive models. Current approaches for discovering conservation laws often depend on detailed dynamical information, such as the equation of motion or fine-grained time measurements, with many recent proposals also relying on black box parametric deep learning methods. We instead reformulate this task as a manifold learning problem and propose a non-parametric approach, combining the Wasserstein metric from optimal transport with diffusion maps, to discover conserved quantities that vary across trajectories sampled from a dynamical system. We test this new approach on a variety of physical systems—including conservative Hamiltonian systems, dissipative systems, and spatiotemporal systems—and demonstrate that our manifold learning method is able to both identify the number of conserved quantities and extract their values. Using tools from optimal transport theory and manifold learning, our proposed method provides a direct geometric approach to identifying conservation laws that is both robust and interpretable without requiring an explicit model of the system nor accurate time information.

Inferring subhalo effective density slopes from strong lensing observations with neural likelihood-ratio estimation
Gemma Zhang, Siddharth Mishra-Sharma, Cora Dvorkin
Monthly Notices of the Royal Astronomical Society, 2022, Volume 517, Issue 3 [ arXiv:2208.13796 ]

Abstract Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for individual subhalos through traditional sampling methods. To go beyond individual subhalo measurements, we leverage recent advances in machine learning and introduce a neural likelihood-ratio estimator to infer an effective density slope for populations of subhalos. We demonstrate that our method is capable of harnessing the statistical power of multiple subhalos (within and across multiple images) to distinguish between characteristics of different subhalo populations. The computational efficiency warranted by the neural likelihood-ratio estimator over traditional sampling enables statistical studies of dark matter perturbers and is particularly useful as we expect an influx of strong lensing systems from upcoming surveys.

Gauge-equivariant flow models for sampling in lattice field theories with pseudofermions
Ryan Abbott, Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Betsy Tian, Julian M. Urban
Physical REview D, 2022, Volume 106, Issue 7 [ arXiv:2207.08945 ]

Abstract This work presents gauge-equivariant architectures for flow-based sampling in fermionic lattice field theories using pseudofermions as stochastic estimators for the fermionic determinant. This is the default approach in state-of-the-art lattice field theory calculations, making this development critical to the practical application of flow models to theories such as QCD. Methods by which flow-based sampling approaches can be improved via standard techniques such as even/odd preconditioning and the Hasenbusch factorization are also outlined. Numerical demonstrations in two-dimensional U(1) and SU(3) gauge theories with Nf=2 flavors of fermions are provided.

Radio excess from stimulated dark matter decay
Andrea Caputo,Hongwan Liu, Siddharth Mishra-Sharma,Maxim Pospelov, Joshua T. Ruderman
Physical REview D, 2023, Volume 107, Issue 12 [ ]

Abstract Despite an intense theoretical and experimental effort over the past decade, observations of the extragalactic radio background at multiple frequencies below 10 GHz are not understood in terms of known radio sources and may represent a sign of new physics. In this paper, we identify a new class of dark sector models with feebly interacting particles, where dark photons oscillate into ordinary photons that contribute to the radio background. Our scenario can explain both the magnitude and the spectral index of the radio background, while being consistent with other cosmological and astrophysical constraints. These models predict new relativistic degrees of freedom and spectral distortions of the cosmic microwave background, which could be detected in the next generation of experiments.

Power Counting Energy Flow Polynomials
Pedro Cal, Jesse Thaler, Wouter J. Waalewijn
Journal of High Energy Physics, 2022, Article 21 [ arXiv:2205.06818 ]

Abstract Power counting is a systematic strategy for organizing collider observables and their associated theoretical calculations. In this paper, we use power counting to characterize a class of jet substructure observables called energy flow polynomials (EFPs). EFPs provide an overcomplete linear basis for infrared-and-collinear safe jet observables, but it is known that in practice, a small subset of EFPs is often sufficient for specific jet analysis tasks. By applying power counting arguments, we obtain linear relationships between EFPs that hold for quark and gluon jets to a specific order in the power counting. We test these relations in the parton shower generator Pythia, finding excellent agreement. Power counting allows us to truncate the basis of EFPs without affecting performance, which we corroborate through a study of quark-gluon tagging and regression.

Disentangling Quarks and Gluons with CMS Open Data
Patrick T. Komiske, Serhii Kryhin, Jesse Thaler
Physical Review D, 2022, Volume 106 Article 094021 [ arXiv:2205.04459 ]

Abstract We study quark and gluon jets separately using public collider data from the CMS experiment. Our analysis is based on 2.3/fb of proton-proton collisions at 7 TeV, collected at the Large Hadron Collider in 2011. We define two non-overlapping samples via a pseudorapidity cut -- central jets with |eta| < 0.65 and forward jets with |eta| > 0.65 -- and employ jet topic modeling to extract individual distributions for the maximally separable categories. Under certain assumptions, such as sample independence and mutual irreducibility, these categories correspond to "quark" and "gluon" jets, as given by a recently proposed operational definition. We consider a number of different methods for extracting reducibility factors from the central and forward datasets, from which the fractions of quark jets in each sample can be determined. The greatest stability and robustness to statistical uncertainties is achieved by a novel method based on parametrizing the endpoints of a receiver operating characteristic (ROC) curve. To mitigate detector effects, which would otherwise induce unphysical differences between central and forward jets, we use the OmniFold method to perform central value unfolding. As a demonstration of the power of this method, we extract the intrinsic dimensionality of the quark and gluon jet samples, which exhibit Casimir scaling, as expected from the strongly-ordered limit. To our knowledge, this work is the first application of full phase space unfolding to real collider data, and one of the first applications of topic modeling to extract separate quark and gluon distributions at the LHC.

Infinite Variance in Monte Carlo Sampling of Lattice Field Theories
Cagin Yunus, William Detmold
Physical Review D, Volume 106, Article 094506 [ arXiv:2205.01001 ]

Abstract In Monte Carlo calculations of expectation values in lattice quantum field theories, the stochastic variance of the sampling procedure that is used defines the precision of the calculation for a fixed number of samples. If the variance of an estimator of a particular quantity is formally infinite, or in practice very large compared to the square of the mean, then that quantity can not be reliably estimated using the given sampling procedure. There are multiple scenarios in which this occurs, including in Lattice Quantum Chromodynamics, and a particularly simple example is given by the Gross-Neveu model where Monte Carlo calculations involve the introduction of auxiliary bosonic variables through a Hubbard-Stratonovich (HS) transformation. Here, it is shown that the variances of HS estimators for classes of operators involving fermion fields are divergent in this model and an even simpler zero-dimensional analogue. To correctly estimate these observables, two alternative sampling methods are proposed and numerically investigated.

Going Beyond the Galaxy Power Spectrum: an Analysis of BOSS Data with Wavelet Scattering Transforms
Georgios Valogiannis, Cora Dvorkin
Physical Review D, 2022, Volume 106, Article 103509 [ arXiv:2204.13717 ]

Abstract We perform the first application of the wavelet scattering transform (WST) on actual galaxy observations, through a WST analysis of the BOSS DR12 CMASS dataset. We lay out the detailed procedure on how to capture all necessary layers of realism for an application on data obtained from a spectroscopic survey, including the effects of redshift-space anisotropy, non-trivial survey geometry, the shortcomings of the dataset through a set of systematic weights and the Alcock-Paczynski distortion effect. In order to capture the cosmological dependence of the WST, we use galaxy mocks obtained from the state-of-the-art ABACUSSUMMIT simulations, tuned to match the anisotropic correlation function of the BOSS CMASS sample in the redshift range 0.46<z<0.60. Using our theory model for the WST coefficients, as well as for the first 2 multipoles of the galaxy power spectrum, that we use as reference, we perform a likelihood analysis of the CMASS data and obtain the posterior probability distributions of 4 cosmological parameters, {ωbc,ns8}, as well as the Hubble constant, derived from a fixed value of the angular size of the sound horizon at last scattering measured by the Planck satellite, all of which are marginalized over the 7 nuisance parameters of the Halo Occupation Distribution model. The WST is found to deliver a substantial improvement in the values of the predicted 1σ errors compared to the regular power spectrum, which are tighter by a factor in the range 3−6 in the case of flat and uninformative priors and by a factor of 4−28, when a Big Bang Nucleosynthesis prior is applied on the value of ωb. Furthermore, in the latter case, we obtain a 0.6% measurement of the Hubble constant. Our results are investigative and subject to certain approximations in our analysis, that we discuss in the text.

Creating Simple, Interpretable Anomaly Detectors for New Physics in Jet Substructure
Layne Bradshaw, Spencer Chang, Bryan Ostdiek
Physical Review D, 2022, Volume 106, Article 035014 [ arXiv:2203.01343 ]

Abstract Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett this http URL, which maps out the physical observables learned by a neural network classifier, to the case of anomaly detection. We propose two different strategies that use a small number of high-level observables to mimic the decisions made by the autoencoder on background events. Despite the underlying differences in their approach, we find that both strategies have similar ordering performance as the autoencoder and independently use the same five high-level observables. From there, we compare the performance of these networks as anomaly detectors. We find that both strategies perform similarly to the autoencoder across a variety of signals, giving a nontrivial demonstration that learning to order background events transfers to ordering a variety of signal events.

Flow-based sampling in the lattice Schwinger model at criticality
Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
Physical Review D, 2022, Volume 106, Article 014514 [ arXiv:2202.11712 ]

Abstract Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.

Parton physics from a heavy-quark operator product expansion: Lattice QCD calculation of the second moment of the pion distribution amplitude
William Detmold, Anthony Grebe, Issaku Kanamori, C.-J. David Lin, Santanu Mondal, Robert Perry, Yong Zhao
Physical Review D, Volume 105, Article 034506 [ arXiv:2109.15241 ]

Abstract The pion light-cone distribution amplitude (LCDA) is a central non-perturbative object of interest for high-energy exclusive processes in quantum chromodynamics. In this article, the second Mellin moment of the pion LCDA is determined as a proof-of-concept calculation for the first numerical implementation of the heavy-quark operator product expansion (HOPE) method. The resulting value for the second Mellin moment, determined in quenched QCD at a pion mass of mπ=550 MeV at a factorization scale of 2 GeV is ⟨ξ2⟩=0.210±0.013 (stat.)±0.034 (sys.). This result is compatible with those from previous determinations of this quantity.

Strictification and gluing of Lagrangian distributions on derived schemes with shifted symplectic forms
Dennis Borisov, Ludmil Katzarkov, Artan Sheshmani, Shing-Tung Yau
Science Direct Journals, 2024, Volume 438 [ arXiv:1908.00651 ]

Abstract A strictification result is proved for isotropic distributions on derived schemes equipped with negatively shifted homotopically closed 2-forms. It is shown that any derived scheme over ℂ equipped with a −2-shifted symplectic structure, and having a Hausdorff space of classical points, admits a globally defined Lagrangian distribution as a dg ℂ∞-manifold.

Towards Quantum Simulations in Particle Physics and Beyond on Noisy Intermediate-Scale Quantum Devices
Lena Funcke, Tobias Hartung, Karl Jansen, Stefan Kühn, Manuel Schneider, Paolo Stornati, Xiaoyang Wang
Philosophical Transactions of the Royal Society A [ arXiv:2110.03809 ]

Abstract We review two algorithmic advances that bring us closer to reliable quantum simulations of model systems in high energy physics and beyond on noisy intermediate-scale quantum (NISQ) devices. The first method is the dimensional expressivity analysis of quantum circuits, which allows for constructing minimal but maximally expressive quantum circuits. The second method is an efficient mitigation of readout errors on quantum devices. Both methods can lead to significant improvements in quantum simulations, e.g., when variational quantum eigensolvers are used.

SymmetryGAN: Symmetry Discovery with Deep Learning
Krish Desai, Benjamin Nachman, Jesse Thaler
Physical. Rev. D, 2022, 105:096031 [ arXiv:2112.05722 ]

Abstract What are the symmetries of a dataset? Whereas the symmetries of an individual data element can be characterized by its invariance under various transformations, the symmetries of an ensemble of data elements are ambiguous due to Jacobian factors introduced while changing coordinates. In this paper, we provide a rigorous statistical definition of the symmetries of a dataset, which involves inertial reference densities, in analogy to inertial frames in classical mechanics. We then propose SymmetryGAN as a novel and powerful approach to automatically discover symmetries using a deep learning method based on generative adversarial networks (GANs). When applied to Gaussian examples, SymmetryGAN shows excellent empirical performance, in agreement with expectations from the analytic loss landscape. SymmetryGAN is then applied to simulated dijet events from the Large Hadron Collider (LHC) to demonstrate the potential utility of this method in high energy collider physics applications. Going beyond symmetry discovery, we consider procedures to infer the underlying symmetry group from empirical data.

PQ Axiverse
Mehmet Demirtas, Naomi Gendler, Cody Long, Liam McAllister, Jakob Moritz
Journal of High Energy Physics 2023, Volume 2023, Article number 92 [ arXiv:2112.04503 ]

Abstract We show that the strong CP problem is solved in a large class of compactifications of string theory. The Peccei-Quinn mechanism solves the strong CP problem if the CP-breaking effects of the ultraviolet completion of gravity and of QCD are small compared to the CP-preserving axion potential generated by low-energy QCD instantons. We characterize both classes of effects. To understand quantum gravitational effects, we consider an ensemble of flux compactifications of type IIB string theory on orientifolds of Calabi-Yau hypersurfaces in the geometric regime, taking a simple model of QCD on D7-branes. We show that the D-brane instanton contribution to the neutron electric dipole moment falls exponentially in N4, with N the number of axions. In particular, this contribution is negligible in all models in our ensemble with N>17. We interpret this result as a consequence of large N effects in the geometry that create hierarchies in instanton actions and also suppress the ultraviolet cutoff. We also compute the CP breaking due to high-energy instantons in QCD. In the absence of vectorlike pairs, we find contributions to the neutron electric dipole moment that are not excluded, but that could be accessible to future experiments if the scale of supersymmetry breaking is sufficiently low. The existence of vectorlike pairs can lead to a larger dipole moment. Finally, we show that a significant fraction of models are allowed by standard cosmological and astrophysical constraints.

Machine Learning in Nuclear Physics
Amber Boehnlein, Markus Diefenthaler, Nobuo Sato, Malachi Schram, Veronique Ziegler, Cristiano Fanelli, Morten Hjorth-Jensen, Tanja Horn, Michelle P. Kuchera, Dean Lee, Witold Nazarewicz, Peter Ostroumov, Kostas Orginos, Alan Poon, Xin-Nian Wang, Alexander Scheinker, Michael S. Smith, and Long-Gang Pang
Reviews of Modern Physics, 2022, Volume 94, Article 031003 [ arXiv:2112.02309 ]

Abstract Advances in machine learning methods provide tools that have broad applicability in scientific research. These techniques are being applied across the diversity of nuclear physics research topics, leading to advances that will facilitate scientific discoveries and societal applications. This Colloquium provides a snapshot of nuclear physics research, which has been transformed by machine learning techniques.

Infinite Neural Network Quantum States
Di Luo, James Halverson
Machine Learning: Science and Technology, 2023, Volume 4, Number 2 [ arXiv:2112.00723 ]

Abstract We study infinite limits of neural network quantum states (∞-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For ∞-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an ∞-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. ∞-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
Charlotte Loh, Thomas Christensen, Rumen Dangovski, Samuel Kim, Marin Soljačić
Nature Communications, 2022, Volume 13, Article 4223 [ arXiv:2110.08406 ]

Abstract Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

Pruning a restricted Boltzmann machine for quantum state reconstruction
Anna Golubeva, Roger G. Melko
Physical Review B, 2022, Volume 105, Article 125124 [ arXiv:2110.03676 ]

Abstract Restricted Boltzmann machines (RBMs) have proven to be a powerful tool for learning quantum wavefunction representations from qubit projective measurement data. Since the number of classical parameters needed to encode a quantum wavefunction scales rapidly with the number of qubits, the ability to learn efficient representations is of critical importance. In this paper we study magnitude-based pruning as a way to compress the wavefunction representation in an RBM, focusing on RBMs trained on data from the transverse-field Ising model in one dimension. We find that pruning can reduce the total number of RBM weights, but the threshold at which the reconstruction accuracy starts to degrade varies significantly depending on the phase of the model. In a gapped region of the phase diagram, the RBM admits pruning over half of the weights while still accurately reproducing relevant physical observables. At the quantum critical point however, even a small amount of pruning can lead to significant loss of accuracy in the physical properties of the reconstructed quantum state. Our results highlight the importance of tracking all relevant observables as their sensitivity varies strongly with pruning. Finally, we find that sparse RBMs are trainable and discuss how a successful sparsity pattern can be created without pruning.

Classical Shadows for Quantum Process Tomography on Near-term Quantum Computers
Ryan Levy, Di Luo, Bryan K. Clark
Physical Review Research, 2024, Volume 6, Issue 1 [ arXiv:2110.02965 ]

Abstract Quantum process tomography is a powerful tool for understanding quantum channels and characterizing properties of quantum devices. Inspired by recent advances using classical shadows in quantum state tomography [H.-Y. Huang, R. Kueng, and J. Preskill, Nat. Phys. 16, 1050 (2020).], we have developed ShadowQPT, a classical shadow method for quantum process tomography. We introduce two related formulations with and without ancilla qubits. ShadowQPT stochastically reconstructs the Choi matrix of the device allowing for an a-posteri classical evaluation of the device on arbitrary inputs with respect to arbitrary outputs. Using shadows we then show how to compute overlaps, generate all k-weight reduced processes, and perform reconstruction via Hamiltonian learning. These latter two tasks are efficient for large systems as the number of quantum measurements needed scales only logarithmically with the number of qubits. A number of additional approximations and improvements are developed including the use of a pair-factorized Clifford shadow and a series of post-processing techniques which significantly enhance the accuracy for recovering the quantum channel. We have implemented ShadowQPT using both Pauli and Clifford measurements on the IonQ trapped ion quantum computer for quantum processes up to n=4 qubits and achieved good performance.

Deep Set Auto Encoders for Anomaly Detection in Particle Physics
Bryan Ostdiek
SciPost Physics, 2022, Vol. 12, Issue 1 [ arXiv:2109.01695 ]

Abstract There is an increased interest in model agnostic search strategies for physics beyond the standard model at the Large Hadron Collider. We introduce a Deep Set Variational Autoencoder and present results on the Dark Machines Anomaly Score Challenge. We find that the method attains the best anomaly detection ability when there is no decoding step for the network, and the anomaly score is based solely on the representation within the encoded latent space. This method was one of the top-performing models in the Dark Machines Challenge, both for the open data sets as well as the blinded data sets.

A variational study of two-nucleon systems with lattice QCD
Saman Amarasinghe, Riyadh Baghdadi, Zohreh Davoudi, William Detmold, Marc Illa, Assumpta Parreno, Andrew V. Pochinsky, Phiala E. Shanahan, Michael L. Wagman
Physical Review D, Volume 107, Issue 9 [ arXiv:2103.02602 ]

Abstract The low-energy spectrum and scattering of two-nucleon systems are studied with lattice quantum chromodynamics using a variational approach. A wide range of interpolating operators are used: dibaryon operators built from products of plane-wave nucleons, hexaquark operators built from six localized quarks, and quasi-local operators inspired by two-nucleon bound-state wavefunctions in low-energy effective theories. Sparsening techniques are used to compute the timeslice-to-all quark propagators required to form correlation-function matrices using products of these operators. Projection of these matrices onto irreducible representations of the cubic group, including spin-orbit coupling, is detailed. Variational methods are applied to constrain the low-energy spectra of two-nucleon systems in a single finite volume with quark masses corresponding to a pion mass of 806 MeV. Results for S- and D-wave phase shifts in the isospin singlet and triplet channels are obtained under the assumption that partial-wave mixing is negligible. Tests of interpolating-operator dependence are used to investigate the reliability of the energy spectra obtained and highlight both the strengths and weaknesses of variational methods. These studies and comparisons to previous studies using the same gauge-field ensemble demonstrate that interpolating-operator dependence can lead to significant effects on the two-nucleon energy spectra obtained using both variational and non-variational methods, including missing energy levels and other discrepancies. While this study is inconclusive regarding the presence of two-nucleon bound states at this quark mass, it provides robust upper bounds on two-nucleon energy levels that can be improved in future calculations using additional interpolating operators and is therefore a step toward reliable nuclear spectroscopy from the underlying Standard Model of particle physics.

Real-time lattice gauge theory actions: unitarity, convergence, and path integral contour deformations
Gurtej Kanwar, Michael L. Wagman
Physical Review D, Volume 104, Article 014513 [ arXiv:2103.02602 ]

Abstract The Wilson action for Euclidean lattice gauge theory defines a positive-definite transfer matrix that corresponds to a unitary lattice gauge theory time-evolution operator if analytically continued to real time. Hoshina, Fujii, and Kikukawa (HFK) recently pointed out that applying the Wilson action discretization to continuum real-time gauge theory does not lead to this, or any other, unitary theory and proposed an alternate real-time lattice gauge theory action that does result in a unitary real-time transfer matrix. The character expansion defining the HFK action is divergent, and in this work we apply a path integral contour deformation to obtain a convergent representation for U(1) HFK path integrals suitable for numerical Monte Carlo calculations. We also introduce a class of real-time lattice gauge theory actions based on analytic continuation of the Euclidean heat-kernel action. Similar divergent sums are involved in defining these actions, but for one action in this class this divergence takes a particularly simple form, allowing construction of a path integral contour deformation that provides absolutely convergent representations for U(1) and SU(N) real-time lattice gauge theory path integrals. We perform proof-of-principle Monte Carlo calculations of real-time U(1) and SU(3) lattice gauge theory and verify that exact results for unitary time evolution of static quark-antiquark pairs in (1 + 1)D are reproduced.

Towards an Optimal Estimation of Cosmological Parameters with the Wavelet Scattering Transform
Georgios Valogiannis, Cora Dvorkin
Physical Review D, 2022, 105, 103534 [ arXiv:2108.07821 ]

Abstract Optimal extraction of the non-Gaussian information encoded in the Large-Scale Structure (LSS) of the universe lies at the forefront of modern precision cosmology. We propose achieving this task through the use of the Wavelet Scattering Transform (WST), which subjects an input field to a layer of non-linear transformations that are sensitive to non-Gaussianity in spatial density distributions through a generated set of WST coefficients. In order to assess its applicability in the context of LSS surveys, we apply the WST on the 3D overdensity field obtained by the Quijote simulations, out of which we extract the Fisher information in 6 cosmological parameters. It is subsequently found to deliver a large improvement in the marginalized errors on all parameters, ranging between 1.2−4× tighter than the corresponding ones obtained from the regular 3D cold dark matter + baryon power spectrum, as well as a 50% improvement over the neutrino mass constraint given by the marked power spectrum. Through this first application on 3D cosmological fields, we demonstrate the great promise held by this novel statistic and set the stage for its future application to actual galaxy observations.

Deep multi-task mining Calabi-Yau four-folds
Harold Erbin, Riccardo Finotello, Robin Schneider, Mohamed Tamaazousti
Machine Learning: Science and Technology, 2021, Volume 3, Number 1 [ arXiv:2108.02221 ]

Abstract We continue earlier efforts in computing the dimensions of tangent space cohomologies of Calabi-Yau manifolds using deep learning. In this paper, we consider the dataset of all Calabi-Yau four-folds constructed as complete intersections in products of projective spaces. Employing neural networks inspired by state-of-the-art computer vision architectures, we improve earlier benchmarks and demonstrate that all four non-trivial Hodge numbers can be learned at the same time using a multi-task architecture. With 30% (80%) training ratio, we reach an accuracy of 100% for h(1,1) and 97% for h(2,1) (100% for both), 81% (96%) for h(3,1), and 49% (83%) for h(2,2). Assuming that the Euler number is known, as it is easy to compute, and taking into account the linear constraint arising from index computations, we get 100% total accuracy.

Nonperturbative renormalization for the neural network–QFT correspondence
Harold Erbin, Vincent Lahoche, Dine Ousmane Samary
Machine Learning Science and Technology, 2022, Volume 3, Number 1, Article 015027 [ arXiv:2108.01403 ]

Abstract In a recent work~[1], Halverson, Maiti and Stoner proposed a description of neural networks in terms of a Wilsonian effective field theory. The infinite-width limit is mapped to a free field theory, while finite N corrections are taken into account by interactions (non-Gaussian terms in the action). In this paper, we study two related aspects of this correspondence. First, we comment on the concepts of locality and power-counting in this context. Indeed, these usual space-time notions may not hold for neural networks (since inputs can be arbitrary), however, the renormalization group provides natural notions of locality and scaling. Moreover, we comment on several subtleties, for example, that data components may not have a permutation symmetry: in that case, we argue that random tensor field theories could provide a natural generalization. Second, we improve the perturbative Wilsonian renormalization from~[1] by providing an analysis in terms of the nonperturbative renormalization group using the Wetterich-Morris equation. An important difference with usual nonperturbative RG analysis is that only the effective (IR) 2-point function is known, which requires setting the problem with care. Our aim is to provide a useful formalism to investigate neural networks behavior beyond the large-width limit (i.e.~far from Gaussian limit) in a nonperturbative fashion. A major result of our analysis is that changing the standard deviation of the neural network weight distribution can be interpreted as a renormalization flow in the space of networks. We focus on translations invariant kernels and provide preliminary numerical results.

Neural Conditional Reweighting
Benjamin Nachman, Jesse Thaler
Physical Review D, Volume 105, Article 076015 [ arXiv:2107.08979 ]

Abstract There is a growing use of neural network classifiers as unbinned, high-dimensional (and variable-dimensional) reweighting functions. To date, the focus has been on marginal reweighting, where a subset of features are used for reweighting while all other features are integrated over. There are some situations, though, where it is preferable to condition on auxiliary features instead of marginalizing over them. In this paper, we introduce neural conditional reweighting, which extends neural marginal reweighting to the conditional case. This approach is particularly relevant in high-energy physics experiments for reweighting detector effects conditioned on particle-level truth information. We leverage a custom loss function that not only allows us to achieve neural conditional reweighting through a single training procedure, but also yields sensible interpolation even in the presence of phase space holes. As a specific example, we apply neural conditional reweighting to the energy response of high-energy jets, which could be used to improve the modeling of physics objects in parametrized fast simulation packages.

Single electrons on solid neon as a solid-state quit platform
Xianjing Zhou, Gerwin Koolstra, Xufeng Zhang, Ge Yang, Xu Han, Brennan Dizdar, Divan Ralu, Wei Guo, Kater W. Murch, David I. Shuster, Dafei Jin
Nature, 2022, 605, 46-50 [ arXiv:2106.10326 ]

Abstract Progress toward the realization of quantum computers requires persistent advances in their constituent building blocks - qubits. Novel qubit platforms that simultaneously embody long coherence, fast operation, and large scalability offer compelling advantages in the construction of quantum computers and many other quantum information systems. Electrons, ubiquitous elementary particles of nonzero charge, spin, and mass, have commonly been perceived as paradigmatic local quantum information carriers. Despite superior controllability and configurability, their practical performance as qubits via either motional or spin states depends critically on their material environment. Here we report our experimental realization of a new qubit platform based upon isolated single electrons trapped on an ultraclean solid neon surface in vacuum. By integrating an electron trap in a circuit quantum electrodynamics architecture, we achieve strong coupling between the motional states of a single electron and a single microwave photon in an on-chip superconducting resonator. Qubit gate operations and dispersive readout are implemented to measure the energy relaxation time T1 of 15 μs and phase coherence time T2 over 200 ns. These results indicate that the electron-on-solid-neon qubit already performs near the state of the art as a charge qubit.

Flow-based sampling for fermionic lattice field theories
Michael S. Albergo, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Julian M. Urban, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Phiala E. Shanahan
Physical Review D, 2021, Vol. 104, Iss. 11 – 1 [ arXiv:2106.05934 ]

Abstract Algorithms based on normalizing flows are emerging as promising machine learning approaches to sampling complicated probability distributions in a way that can be made asymptotically exact. In the context of lattice field theory, proof-of-principle studies have demonstrated the effectiveness of this approach for scalar theories, gauge theories, and statistical systems. This work develops approaches that enable flow-based sampling of theories with dynamical fermions, which is necessary for the technique to be applied to lattice field theory studies of the Standard Model of particle physics and many condensed matter systems. As a practical demonstration, these methods are applied to the sampling of field configurations for a two-dimensional theory of massless staggered fermions coupled to a scalar field via a Yukawa interaction.

Preserving New Physics while Simultaneously Unfolding All Observables
Patrick Komiske, W. Patrick McCormack, Benjamin Nachman
Physical Review D, Volume 104, Article 076027 [ arXiv:2105.09923 ]

Abstract Direct searches for new particles at colliders have traditionally been factorized into model proposals by theorists and model testing by experimentalists. With the recent advent of machine learning methods that allow for the simultaneous unfolding of all observables in a given phase space region, there is a new opportunity to blur these traditional boundaries by performing searches on unfolded data. This could facilitate a research program where data are explored in their natural high dimensionality with as little model bias as possible. We study how the information about physics beyond the Standard Model is preserved by full phase space unfolding using an important physics target at the Large Hadron Collider (LHC): exotic Higgs boson decays involving hadronic final states. We find that if the signal cross section is high enough, information about the new physics is visible in the unfolded data. We will show that in some cases, quantifiably all of the information about the new physics is encoded in the unfolded data. Finally, we show that there are still many cases when the unfolding does not work fully or precisely, such as when the signal cross section is small. This study will serve as an important benchmark for enhancing unfolding methods for the LHC and beyond.

A Compound Poisson Generator approach to Point-Source Inference in Astrophysics
Gabriel H. Collin, Nicholas L. Rodd, Tyler Erjavec, Kerstin Perez
The Astrophysical Journal, 2022, Volume 260, Number 2 [ arXiv:2104.04529 | code ]

Abstract The identification and description of point sources is one of the oldest problems in astronomy; yet, even today the correct statistical treatment for point sources remains as one of the field's hardest problems. For dim or crowded sources, likelihood based inference methods are required to estimate the uncertainty on the characteristics of the source population. In this work, a new parametric likelihood is constructed for this problem using Compound Poisson Generator (CPG) functionals which incorporate instrumental effects from first principles. We demonstrate that the CPG approach exhibits a number advantages over Non-Poissonian Template Fitting (NPTF) - an existing parametric likelihood method - in a series of test scenarios in the context of X-ray astronomy. These demonstrations show that the effect of the point-spread function, effective area, and choice of point-source spatial distribution cannot, in general, be factorised as they are in the NPTF construction, while the new CPG construction is validated in these scenarios. Separately, an examination of the diffuse-flux emission limit is used to show that most simple choices of priors on the standard parameterisation of the population model can result in unexpected biases: when a model comprising both a point-source population and diffuse component is applied to this limit, nearly all observed flux will be assigned to either the population or to the diffuse component. A new parametrisation is presented for these priors which is demonstrated to properly estimate the uncertainties in this limit. In this choice of priors, the CPG correctly identifies that the fraction of flux assigned to the population model cannot be constrained by the data.

Modern Machine Learning and Particle Physics
Matthew D. Schwartz
Harvard Data Science Review, 2021, Issue 3.2, 13 May [ arXiv:2103.12226 ]

Abstract Over the past five years, modern machine learning has been quietly revolutionizing particle physics. Old methodology is being outdated and entirely new ways of thinking about data are becoming commonplace. This article will review some aspects of the natural synergy between modern machine learning and particle physics, focusing on applications at the Large Hadron Collider. A sampling of examples is given, from signal/background discrimination tasks using supervised learning to direct data-driven approaches. Some comments on persistent challenges and possible future directions for the field are included at the end.

Topological obstructions to autoencoding
Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts
Journal of High Energy Physics, 2021, Issue 4, Article 280 [ arXiv:2102.08380 ]

Abstract Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so clear. In particular, for data sets with nontrivial topology, there will always be points that erroneously seem anomalous due to global issues. Conversely, neural networks typically have an inductive bias or prior to locally interpolate such that undersampled or rare events may be reconstructed with small error, despite actually being the desired anomalies. Taken together, these facts are in tension with the simple picture of the autoencoder as an anomaly detector. Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training. We ground this analysis in the discussion of a mock "bump hunt" in which the autoencoder fails to identify an anomalous "signal" for reasons tied to the intrinsic topology of n-particle phase space.

Few-nucleon matrix elements in pionless effective field theory in a finite volume
W. Detmold and P. E. Shanahan
Physical Review D, Volume 103, Article 074503 [ arXiv:2102.04329 ]

Abstract Pionless effective field theory in a finite volume (FVEFTπ/) is investigated as a framework for the analysis of multi-nucleon spectra and matrix elements calculated in lattice QCD (LQCD). By combining FVEFTπ/ with the stochastic variational method, the spectra of nuclei with atomic number A∈{2,3} are matched to existing finite-volume LQCD calculations at heavier-than-physical quark masses corresponding to a pion mass mπ=806 MeV, thereby enabling infinite-volume binding energies to be determined using infinite-volume variational calculations. Based on the variational wavefunctions that are constructed in this approach, the finite-volume matrix elements of various local operators are computed in FVEFTπ/ and matched to LQCD calculations of the corresponding QCD operators in the same volume, thereby determining the relevant one and two-body EFT counterterms and enabling an extrapolation of the LQCD matrix elements to infinite volume. As examples, the scalar, tensor, and axial matrix elements are considered, as well as the magnetic moments and the isovector longitudinal momentum fraction.

Path integral contour deformations for observables in SU(N) gauge theory
William Detmold, Gurtej Kanwar, Henry Lamm, Michael L. Wagman, Neill C. Warrington
Physical Review D, 2021, Vol. 103, Issue 9, Article 094517 [ arXiv:2101.12668 ]

Abstract Path integral contour deformations have been shown to mitigate sign and signal-to-noise problems associated with phase fluctuations in lattice field theories. We define a family of contour deformations applicable to SU(N) lattice gauge theory that can reduce sign and signal-to-noise problems associated with complex actions and complex observables. For observables, these contours can be used to define deformed observables with identical expectation value but different variance. As a proof-of-principle, we apply machine learning techniques to optimize the deformed observables associated with Wilson loops in two dimensional SU(2) and SU(3) gauge theory. We study loops consisting of up to 64 plaquettes and achieve variance reduction of up to 4 orders of magnitude.

Learning to Unknot
Sergei Gukov, James Halverson, Fabian Ruehle, and Piotr Sułkowski
Machine Learning - Science and Technology, 2021, Volume 2, Number 2, Article 025035 [ arXiv:2010.16263 ]

Abstract We introduce natural language processing into the study of knot theory, as made natural by the braid word representation of knots. We study the UNKNOT problem of determining whether or not a given knot is the unknot. After describing an algorithm to randomly generate $N$-crossing braids and their knot closures and discussing the induced prior on the distribution of knots, we apply binary classification to the UNKNOT decision problem. We find that the Reformer and shared-QK Transformer network architectures outperform fully-connected networks, though all perform well. Perhaps surprisingly, we find that accuracy increases with the length of the braid word, and that the networks learn a direct correlation between the confidence of their predictions and the degree of the Jones polynomial. Finally, we utilize reinforcement learning (RL) to find sequences of Markov moves and braid relations that simplify knots and can identify unknots by explicitly giving the sequence of unknotting actions. Trust region policy optimization (TRPO) performs consistently well for a wide range of crossing numbers and thoroughly outperformed other RL algorithms and random walkers. Studying these actions, we find that braid relations are more useful in simplifying to the unknot than one of the Markov moves.

Elliptic stable envelopes and hypertoric loop spaces
Michael McBreen, Artan Sheshmani, Shing-Tung Yau
Selecta Mathematica, 2023, Volume 29, Article number 73 [ arXiv:2010.0067 ]

Abstract This paper relates the elliptic stable envelopes of a hypertoric variety X with the K-theoretic stable envelopes of the loop hypertoric space, ℒ˜X. It thus points to a possible categorification of elliptic stable envelopes.

Experimental Physics

Pre-prints

Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies with CAMELS
Victoria Ono, Core Francisco Park, Nayantara Mudur, Yueying Ni, Carolina Cuesta-Lazaro, Francisco Villaescusa-Navarro
[ arXiv:2403.10648 ]

Abstract Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between dark matter density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. In this work, we develop a diffusion generative model to reconstruct dark matter fields from galaxies. The diffusion model is trained on the CAMELS simulation suite that contains thousands of state-of-the-art galaxy formation simulations with varying cosmological parameters and sub-grid astrophysics. We demonstrate that the diffusion model can predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over uncertainties in cosmological and astrophysical models. Interestingly, the model generalizes to simulation volumes approximately 500 times larger than those it was trained on, and across different galaxy formation models. Code for reproducing these results can be found at this https URL

Supernovae Time Profiles as a Probe of New Physics at Neutrino Telescopes
Jeff Lazar, Ying-Ying Li, Carlos A. Arguelles, Vedran Brdar
[ arXiv:2403.09781 ]

Abstract Neutrino telescopes, including IceCube, can detect galactic supernova events by observing the collective rise in photomultiplier count rates with a sub-second time resolution. Leveraging precise timing, we demonstrate the ability of neutrino telescopes to explore new weakly coupled states emitted from supernovae and subsequently decaying to neutrinos. Our approach utilizes publicly available packages, \texttt{ASTERIA} and \texttt{SNEWPY}, for simulating detector responses and parametrizing neutrino fluxes originating from Standard Model and new physics. We present results for two beyond the Standard Model scenarios and introduce the tool developed for testing a diverse range of new physics models.

Superphot+: Realtime Fitting and Classification of Supernova Light Curves
Kaylee M. de Soto (1), Ashley Villar (1), Edo Berger (1 and 2), Sebastian Gomez (3), Griffin Hosseinzadeh (4), Doug Branton (5), Sandro Campos (6), Melissa DeLucchi (6), Jeremy Kubica (6), Olivia Lynn (6), Konstantin Malanchev (6), Alex I. Malz (6) ((1) Center for Astrophysics | Harvard & Smithsonian, (2) The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, (3) Space Telescope Science Institute, (4) Steward Observatory | University of Arizona, (5) DiRAC Institute and the Department of Astronomy | University of Washington, (6) McWilliams Center for Cosmology | Department of Physics at Carnegie Mellon University)
[ arXiv:2403.07975 ]

Abstract Photometric classifications of supernova (SN) light curves have become necessary to utilize the full potential of large samples of observations obtained from wide-field photometric surveys, such as the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory. Here, we present a photometric classifier for SN light curves that does not rely on redshift information and still maintains comparable accuracy to redshift-dependent classifiers. Our new package, Superphot+, uses a parametric model to extract meaningful features from multiband SN light curves. We train a gradient-boosted machine with fit parameters from 6,061 ZTF SNe that pass data quality cuts and are spectroscopically classified as one of five classes: SN Ia, SN II, SN Ib/c, SN IIn, and SLSN-I. Without redshift information, our classifier yields a class-averaged F1-score of 0.61 +/- 0.02 and a total accuracy of 0.83 +/- 0.01. Including redshift information improves these metrics to 0.71 +/- 0.02 and 0.88 +/- 0.01, respectively. We assign new class probabilities to 3,558 ZTF transients that show SN-like characteristics (based on the ALeRCE Broker light curve and stamp classifiers), but lack spectroscopic classifications. Finally, we compare our predicted SN labels with those generated by the ALeRCE light curve classifier, finding that the two classifiers agree on photometric labels for 82 +/- 2% of light curves with spectroscopic labels and 72% of light curves without spectroscopic labels. Superphot+ is currently classifying ZTF SNe in real time via the ANTARES Broker, and is designed for simple adaptation to six-band Rubin light curves in the future.

New Pathways in Neutrino Physics via Quantum-Encoded Data Analysis
Jeffrey Lazar, Santiago Giner Olavarrieta, Giancarlo Gatti, Carlos A. Argüelles, Mikel Sanz
[ arXiv:2402.19306 ]

Abstract Ever-increasing amount of data is produced by particle detectors in their quest to unveil the laws of Nature. The large data rate requires the use of specialized triggers that promptly reduce the data rate to a manageable level; however, in doing so, unexpected new phenomena may escape detection. Additionally, the large data rate is increasingly difficult to analyze effectively, which has led to a recent revolution on machine learning techniques. Here, we present a methodology based on recent quantum compression techniques that has the capacity to store exponentially more amount of information than classically available methods. To demonstrate this, we encode the full neutrino telescope event information using parity observables in an IBM quantum processor using 8 qubits. Then we show that we can recover the information stored on the quantum computer with a fidelity of 84%. Finally, we illustrate the use of our protocol by performing a classification task that separates electron-neutrino events to muon-neutrinos events in a neutrino telescope. This new capability would eventually allow us to solve the street light effect in particle physics, where we only record signatures of particles with which we are familiar.

Full-shape analysis with simulation-based priors: constraints on single field inflation from BOSS
Mikhail M. Ivanov, Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma, Andrej Obuljen, Michael W. Toomey
[ arXiv:2402.13310 ]

Abstract We present an efficient approach to set informative physically motivated priors for EFT-based full-shape analyses of galaxy survey data. We extract these priors from simulated galaxy catalogs based on halo occupation distribution (HOD) models. As a first step, we build a joint distribution between EFT galaxy bias and HOD parameters from a set of 10,500 HOD mock catalogs. We make use of the field level EFT technique that allows for cosmic variance cancellation, enabling a precision calibration of EFT parameters from computationally inexpensive small-volume simulations. As a second step, we use neural density estimators -- normalizing flows -- to model the marginal probability density of the EFT parameters, which can be used as a prior distribution in full shape analyses. As a first application, we use our HOD-based prior distribution in a new analysis of galaxy power spectra and bispectra from the BOSS survey in the context of single field primordial non-Gaussianity. We find that our approach leads to a reduction of the posterior volume of bias parameters by an order of magnitude. We also find fequilNL=650±310 and forthoNL=42±130 (at 68\% CL) in a combined two-template analysis, representing a ≈40% improvement in constraints on single field primordial non-Gaussianity, equivalent to doubling the survey volume.

LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology
Matthew Ho, Deaglan J. Bartlett, Nicolas Chartier, Carolina Cuesta-Lazaro, Simon Ding, Axel Lapel, Pablo Lemos, Christopher C. Lovell, T. Lucas Makinen, Chirag Modi, Viraj Pandya, Shivam Pandey, Lucia A. Perez, Benjamin Wandelt, Greg L. Bryan
[ arXiv:2402.05137 ]

Abstract This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at this https URL.

The Escape Velocity Profile of the Milky Way from Gaia DR3
Arthur Tsang, Atınç Çağan Şengül, Cora Dvorkin
[ arXiv:2402.00108 ]

Abstract The escape velocity profile of the Milky Way offers a crucial and independent measurement of its underlying mass distribution and dark matter properties. Using a sample of stars from Gaia DR3 with 6D kinematics and strict quality cuts, we obtain an escape velocity profile of the Milky Way from 4 kpc to 11 kpc in Galactocentric radius. To infer the escape velocity in radial bins, we model the tail of the stellar speed distribution with both traditional power law models and a new functional form that we introduce. While power law models tend to rely on extrapolation to high speeds, we find our new functional form gives the most faithful representation of the observed distribution. Using this for the escape velocity profile, we constrain the properties of the Milky Ways dark matter halo modeled as a Navarro-Frenck-White profile. Combined with constraints from the circular velocity at the solar position, we obtain a concentration and mass of cDM200c=13.9+6.2−4.3 and MDM200c=0.55+0.15−0.14×1012M⊙. This corresponds to a total Milky Way mass of M200c=0.64+0.15−0.14×1012M⊙, which is on the low end of the historic range of the Galaxys mass, but in line with other recent estimates.

Substructure Detection in Realistic Strong Lensing Systems with Machine LearningSubstructure Detection in Realistic Strong Lensing Systems with Machine Learning
Arthur Tsang, Atınç Çağan Şengül, Cora Dvorkin
[ arXiv:2402.16624 ]

Abstract Tens of thousands of galaxy-galaxy strong lensing systems are expected to be discovered by the end of the decade. These will form a vast new dataset that can be used to probe subgalactic dark matter structures through its gravitational effects, which will in turn allow us to study the nature of dark matter at small length scales. This work shows how we can leverage machine learning to search through the data and identify which systems are most likely to contain dark matter substructure and thus can be studied in greater depth. We use a UNet, an image segmentation architecture, on a simulated strongly-lensed dataset with realistic sources (COSMOS galaxies), lenses (power-law elliptical profiles with multipoles and external shear), and noise. Our machine learning algorithm is able to quickly detect most substructure at high image resolution and subhalo concentration. At a false positive rate of 10%, we are able to identify systems with substructure at a true positive rate of 71% for a subhalo mass range of 109-109.5M⊙. While recent detections are consistent with higher concentrations, we find that our algorithm fails at detecting subhalos with lower concentrations (expected from ΛCDM simulations).

A Physics-Informed Variational Autoencoder for Rapid Galaxy Inference and Anomaly Detection
Alexander Gagliano, V. Ashley Villar
[ arXiv:2312.16687 ]

Abstract The Vera C. Rubin Observatory is slated to observe nearly 20 billion galaxies during its decade-long Legacy Survey of Space and Time. The rich imaging data it collects will be an invaluable resource for probing galaxy evolution across cosmic time, characterizing the host galaxies of transient phenomena, and identifying novel populations of anomalous systems. To facilitate these studies, we introduce a convolutional variational autoencoder trained to estimate the redshift, stellar mass, and star-formation rates of galaxies from multi-band imaging data. We train and test our physics-informed CVAE on a spectroscopic sample of ∼26,000 galaxies within z<1 imaged through the Dark Energy Camera Legacy Survey. We show that our model can infer redshift and stellar mass more accurately than the latest image-based self-supervised learning approaches, and is >100x faster than more computationally-intensive SED-fitting techniques. Using a small sample of Green Pea and Red Spiral galaxies reported in the literature, we further demonstrate how this CVAE can be used to rapidly identify rare galaxy populations and interpret what makes them unique.

Applications of Lipschitz neural networks to the Run 3 LHCb trigger system
Blaise Delaney, Nicole Schulte, Gregory Ciezarek, Niklas Nolte, Mike Williams, Johannes Albrecht
[ arXiv:2312.14265 ]

Abstract The operating conditions defining the current data taking campaign at the Large Hadron Collider, known as Run 3, present unparalleled challenges for the real-time data acquisition workflow of the LHCb experiment at CERN. To address the anticipated surge in luminosity and consequent event rate, the LHCb experiment is transitioning to a fully software-based trigger system. This evolution necessitated innovations in hardware configurations, software paradigms, and algorithmic design. A significant advancement is the integration of monotonic Lipschitz neural networks into the LHCb trigger system. These deep learning models offer certified robustness against detector instabilities, and the ability to encode domain-specific inductive biases. Such properties are crucial for the inclusive heavy-flavour triggers and, most notably, for the topological triggers designed to inclusively select b-hadron candidates by exploiting the unique kinematic and decay topologies of beauty decays. This paper describes the recent progress in integrating Lipschitz neural networks into the topological triggers, highlighting the resulting enhanced sensitivity to highly displaced multi-body candidates produced within the LHCb acceptance.

First search for dark-trident processes using the MicroBooNE detector
MicroBooNE collaboration
[ arXiv:2312.13945 ]

Abstract We present a first search for dark-trident scattering in a neutrino beam using a data set corresponding to 7.2×1020 protons on target taken with the MicroBooNE detector at Fermilab. Proton interactions in the neutrino target at the Main Injector produce π0 and η mesons, which could decay into dark-matter (DM) particles mediated via a dark photon A′. A convolutional neural network is trained to identify interactions of the DM particles in the liquid-argon time projection chamber (LArTPC) exploiting its image-like reconstruction capability. In the absence of a DM signal, we provide limits at the 90% confidence level on the squared kinematic mixing parameter ε2 as a function of the dark-photon mass in the range 10≤MA′≤400 MeV. The limits cover previously unconstrained parameter space for the production of fermion or scalar DM particles χ for two benchmark models with mass ratios Mχ/MA′=0.6 and 2 and for dark fine-structure constants 0.1≤αD≤1.

Inhomogeneous Energy Injection in the 21-cm Power Spectrum: Sensitivity to Dark Matter Decay
Yitian Sun, Joshua W. Foster, Hongwan Liu, Julian B. Muñoz, Tracy R. Slatyer
[ arXiv:2312.11608 ]

Abstract The 21-cm signal provides a novel avenue to measure the thermal state of the universe during cosmic dawn and reionization (redshifts z∼5−30), and thus to probe energy injection from decaying or annihilating dark matter (DM). These DM processes are inherently inhomogeneous: both decay and annihilation are density dependent, and furthermore the fraction of injected energy that is deposited at each point depends on the gas ionization and density, leading to further anisotropies in absorption and propagation. In this work, we develop a new framework for modeling the impact of spatially inhomogeneous energy injection and deposition during cosmic dawn, accounting for ionization and baryon density dependence, as well as the attenuation of propagating photons. We showcase how this first completely inhomogeneous treatment affects the predicted 21-cm power spectrum in the presence of exotic sources of energy injection, and forecast the constraints that upcoming HERA measurements of the 21-cm power spectrum will set on DM decays to photons and to electron/positron pairs. These projected constraints considerably surpass those derived from CMB and Lyman-α measurements, and for decays to electron/positron pairs they exceed all existing constraints in the sub-GeV mass range, reaching lifetimes of ∼1028s. Our analysis demonstrates the unprecedented sensitivity of 21-cm cosmology to exotic sources of energy injection during the cosmic dark ages. Our code, 𝙳𝙼𝟸𝟷𝚌𝚖, includes all these effects and is publicly available in an accompanying release.

Cosmological Field Emulation and Parameter Inference with Diffusion Models
Nayantara Mudur, Carolina Cuesta-Lazaro, Douglas P. Finkbeiner
[ arXiv:2312.07534 ]

Abstract Cosmological simulations play a crucial role in elucidating the effect of physical parameters on the statistics of fields and on constraining parameters given information on density fields. We leverage diffusion generative models to address two tasks of importance to cosmology -- as an emulator for cold dark matter density fields conditional on input cosmological parameters Ωm and σ8, and as a parameter inference model that can return constraints on the cosmological parameters of an input field. We show that the model is able to generate fields with power spectra that are consistent with those of the simulated target distribution, and capture the subtle effect of each parameter on modulations in the power spectrum. We additionally explore their utility as parameter inference models and find that we can obtain tight constraints on cosmological parameters.

Learning an Effective Evolution Equation for Particle-Mesh Simulations Across Cosmologies
Nicolas Payot, Pablo Lemos, Laurence Perreault-Levasseur, Carolina Cuesta-Lazaro, Chirag Modi, Yashar Hezaveh
[ arXiv:2311.18017 ]

Abstract Particle-mesh simulations trade small-scale accuracy for speed compared to traditional, computationally expensive N-body codes in cosmological simulations. In this work, we show how a data-driven model could be used to learn an effective evolution equation for the particles, by correcting the errors of the particle-mesh potential incurred on small scales during simulations. We find that our learnt correction yields evolution equations that generalize well to new, unseen initial conditions and cosmologies. We further demonstrate that the resulting corrected maps can be used in a simulation-based inference framework to yield an unbiased inference of cosmological parameters. The model, a network implemented in Fourier space, is exclusively trained on the particle positions and velocities.

A point cloud approach to generative modeling for galaxy surveys at the field level
Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma
[ arXiv:2311.17141 ]

Abstract We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.

Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models
Core Francisco Park, Victoria Ono, Nayantara Mudur, Yueying Ni, Carolina Cuesta-Lazaro
[ arXiv:2311.08558 ]

Abstract Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. The relationship between dark matter density fields and galaxy distributions can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. Based on state-of-the-art galaxy formation simulation suites with varied cosmological parameters and sub-grid astrophysics, we develop a diffusion generative model to predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over the uncertainties in cosmology and galaxy formation.

Two Watts is All You Need: Enabling In-Detector Real-Time Machine Learning for Neutrino Telescopes Via Edge Computing
Miaochen Jin, Yushi Hu, Carlos A. Argüelles
[ arXiv:2311.04983 ]

Abstract The use of machine learning techniques has significantly increased the physics discovery potential of neutrino telescopes. In the upcoming years, we are expecting upgrade of currently existing detectors and new telescopes with novel experimental hardware, yielding more statistics as well as more complicated data signals. This calls out for an upgrade on the software side needed to handle this more complex data in a more efficient way. Specifically, we seek low power and fast software methods to achieve real-time signal processing, where current machine learning methods are too expensive to be deployed in the resource-constrained regions where these experiments are located. We present the first attempt at and a proof-of-concept for enabling machine learning methods to be deployed in-detector for water/ice neutrino telescopes via quantization and deployment on Google Edge Tensor Processing Units (TPUs). We design a recursive neural network with a residual convolutional embedding, and adapt a quantization process to deploy the algorithm on a Google Edge TPU. This algorithm can achieve similar reconstruction accuracy compared with traditional GPU-based machine learning solutions while requiring the same amount of power compared with CPU-based regression solutions, combining the high accuracy and low power advantages and enabling real-time in-detector machine learning in even the most power-restricted environments.

E(2) Equivariant Neural Networks for Robust Galaxy Morphology Classification
Sneh Pandya, Purvik Patel, Franc O, Jonathan Blazek
[ arXiv:2311.01500 | code ]

Abstract We propose the use of group convolutional neural network architectures (GCNNs) equivariant to the 2D Euclidean group, E(2), for the task of galaxy morphology classification by utilizing symmetries of the data present in galaxy images as an inductive bias in the architecture. We conduct robustness studies by introducing artificial perturbations via Poisson noise insertion and one-pixel adversarial attacks to simulate the effects of limited observational capabilities. We train, validate, and test GCNNs equivariant to discrete subgroups of E(2) - the cyclic and dihedral groups of order N - on the Galaxy10 DECals dataset and find that GCNNs achieve higher classification accuracy and are consistently more robust than their non-equivariant counterparts, with an architecture equivariant to the group D16 achieving a 95.52±0.18% test-set accuracy. We also find that the model loses <6% accuracy on a 50%-noise dataset and all GCNNs are less susceptible to one-pixel perturbations than an identically constructed CNN..

Search for heavy neutral leptons in electron-positron and neutral-pion final states with the MicroBooNE detector
MicroBooNE collaboration
[ arXiv:2310.07660 ]

Abstract We present the first search for heavy neutral leptons (HNL) decaying into νe+e− or νπ0 final states in a liquid-argon time projection chamber using data collected with the MicroBooNE detector. The data were recorded synchronously with the NuMI neutrino beam from Fermilab's Main Injector corresponding to a total exposure of 7.01×1020 protons on target. We set upper limits at the 90% confidence level on the mixing parameter |Uμ4|2 in the mass ranges 10≤mHNL≤150 MeV for the νe+e− channel and 150≤mHNL≤245 MeV for the νπ0 channel, assuming |Ue4|2=|Uτ4|2=0. These limits represent the most stringent constraints in the mass range 35<mHNL<175 MeV and the first constraints from a direct search for νπ0 decays.

Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample
Enrique Paillas, Carolina Cuesta-Lazaro, Will J. Percival, Seshadri Nadathur, Yan-Chuan Cai, Sihan Yuan, Florian Beutler, Arnaud de Mattia, Daniel Eisenstein, Daniel Forero-Sanchez, Nelson Padilla, Mathilde Pinon, Vanina Ruhlmann-Kleider, Ariel G. Sánchez, Georgios Valogiannis, Pauline Zarrouk
[ arXiv:2309.16541 ]

Abstract We present a clustering analysis of the BOSS DR12 CMASS galaxy sample, combining measurements of the galaxy two-point correlation function and density-split clustering down to a scale of 1h−1Mpc. Our theoretical framework is based on emulators trained on high-fidelity mock galaxy catalogues that forward model the cosmological dependence of the clustering statistics within an extended-ΛCDM framework, including redshift-space and Alcock-Paczynski distortions. Our base-ΛCDM analysis finds ωcdm=0.1201±0.0022, σ8=0.792±0.034, and ns=0.970±0.018, corresponding to fσ8=0.462±0.020 at z≈0.525, which is in agreement with Planck 2018 predictions and various clustering studies in the literature. We test single-parameter extensions to base-ΛCDM, varying the running of the spectral index, the dark energy equation of state, and the density of massless relic neutrinos, finding no compelling evidence for deviations from the base model. We model the galaxy-halo connection using a halo occupation distribution framework, finding signatures of environment-based assembly bias in the data. We validate our pipeline against mock catalogues that match the clustering and selection properties of CMASS, showing that we can recover unbiased cosmological constraints even with a volume 84 times larger than the one used in this study.

SUNBIRD: A simulation-based model for full-shape density-split clustering
Carolina Cuesta-Lazaro, Enrique Paillas, Sihan Yuan, Yan-Chuan Cai, Seshadri Nadathur, Will J. Percival, Florian Beutler, Arnaud de Mattia, Daniel Eisenstein, Daniel Forero-Sanchez, Nelson Padilla, Mathilde Pinon, Vanina Ruhlmann-Kleider, Ariel G. Sánchez, Georgios Valogiannis, Pauline Zarrouk
[ arXiv:2309.16539 ]

Abstract Combining galaxy clustering information from regions of different environmental densities can help break cosmological parameter degeneracies and access non-Gaussian information from the density field that is not readily captured by the standard two-point correlation function (2PCF) analyses. However, modelling these density-dependent statistics down to the non-linear regime has so far remained challenging. We present a simulation-based model that is able to capture the cosmological dependence of the full shape of the density-split clustering (DSC) statistics down to intra-halo scales. Our models are based on neural-network emulators that are trained on high-fidelity mock galaxy catalogues within an extended-ΛCDM framework, incorporating the effects of redshift-space, Alcock-Paczynski distortions and models of the halo-galaxy connection. Our models reach sub-percent level accuracy down to 1h−1Mpc and are robust against different choices of galaxy-halo connection modelling. When combined with the galaxy 2PCF, DSC can tighten the constraints on ωcdm, σ8, and ns by factors of 2.9, 1.9, and 2.1, respectively, compared to a 2PCF-only analysis. DSC additionally puts strong constraints on environment-based assembly bias parameters. Our code is made publicly available on Github.

Chained Quantile Morphing with Normalizing Flows
Samuel Bright-Thonney, Philip Harris, Patrick McCormack, Simon Rothman
[ arXiv:2309.15912 ]

Abstract Accounting for inaccuracies in Monte Carlo simulations is a crucial step in any high energy physics analysis. It becomes especially important when training machine learning models, which can amplify simulation inaccuracies and introduce large discrepancies and systematic uncertainties when the model is applied to data. In this paper, we introduce a method to transform simulated events to better match data using normalizing flows, a class of deep learning-based density estimation models. Our proposal uses a technique called chained quantile morphing, which corrects a set of observables by iteratively shifting each entry according to a conditonal cumulative density function. We demonstrate the technique on a realistic particle physics dataset, and compare it to a neural network-based reweighting method. We also introduce a new contrastive learning technique to correct high dimensional particle-level inputs, which naively cannot be efficiently corrected with morphing strategies.

GWAK: Gravitational-Wave Anomalous Knowledge with Recurrent Autoencoders
Ryan Raikman, Eric A. Moreno, Ekaterina Govorkova, Ethan J Marx, Alec Gunny, William Benoit, Deep Chatterjee, Rafia Omer, Muhammed Saleem, Dylan S Rankin, Michael W Coughlin, Philip C Harris, Erik Katsavounidis
[ arXiv:2309.11537 ]

Abstract Matched-filtering detection techniques for gravitational-wave (GW) signals in ground-based interferometers rely on having well-modeled templates of the GW emission. Such techniques have been traditionally used in searches for compact binary coalescences (CBCs), and have been employed in all known GW detections so far. However, interesting science cases aside from compact mergers do not yet have accurate enough modeling to make matched filtering possible, including core-collapse supernovae and sources where stochasticity may be involved. Therefore the development of techniques to identify sources of these types is of significant interest. In this paper, we present a method of anomaly detection based on deep recurrent autoencoders to enhance the search region to unmodeled transients. We use a semi-supervised strategy that we name Gravitational Wave Anomalous Knowledge (GWAK). While the semi-supervised nature of the problem comes with a cost in terms of accuracy as compared to supervised techniques, there is a qualitative advantage in generalizing experimental sensitivity beyond pre-computed signal templates. We construct a low-dimensional embedded space using the GWAK method, capturing the physical signatures of distinct signals on each axis of the space. By introducing signal priors that capture some of the salient features of GW signals, we allow for the recovery of sensitivity even when an unmodeled anomaly is encountered. We show that regions of the GWAK space can identify CBCs, detector glitches and also a variety of unmodeled astrophysical sources.

Simulation-based Inference for Exoplanet Atmospheric Retrieval: Insights from winning the Ariel Data Challenge 2023 using Normalizing Flows
Mayeul Aubin, Carolina Cuesta-Lazaro, Ethan Tregidga, Javier Viaña, Cecilia Garraffo, Iouli E. Gordon, Mercedes López-Morales, Robert J. Hargreaves, Vladimir Yu. Makhnev, Jeremy J. Drake, Douglas P. Finkbeiner, Phillip Cargile
[ arXiv:2309.09337 ]

Abstract Advancements in space telescopes have opened new avenues for gathering vast amounts of data on exoplanet atmosphere spectra. However, accurately extracting chemical and physical properties from these spectra poses significant challenges due to the non-linear nature of the underlying physics. This paper presents novel machine learning models developed by the AstroAI team for the Ariel Data Challenge 2023, where one of the models secured the top position among 293 competitors. Leveraging Normalizing Flows, our models predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions. Moreover, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for exoplanet atmosphere spectra analysis. Finally, we present recommendations to enhance the challenge and models, providing valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, advancing our understanding of these distant worlds.

Birth of the first stars amidst decaying and annihilating dark matter
Wenzer Qin, Julian B. Munoz, Hongwan Liu, Tracy R. Slatyer
[ arXiv:2308.12992 ]

Abstract The first stars are expected to form through molecular-hydrogen (H2) cooling, a channel that is especially sensitive to the thermal and ionization state of gas, and can thus act as a probe of exotic energy injection from decaying or annihilating dark matter (DM). Here, we use a toy halo model to study the impact of DM-sourced energy injection on the H2 content of the first galaxies, and thus estimate the threshold mass required for a halo to form stars at high redshifts. We find that currently allowed DM models can significantly change this threshold, producing both positive and negative feedback. In some scenarios, the extra heating of the gas raises the halo mass required for collapse, whereas in others, energy injection lowers the threshold by increasing the free-electron fraction and catalyzing H2 formation. The direction of the effect can be redshift-dependent. We also bracket the uncertainties from self-shielding of halos from Lyman-Werner radiation. Hence, exotic energy injection can both delay and accelerate the onset of star formation; we show how this can impact the timing of 21cm signals at cosmic dawn. We encourage detailed simulation follow-ups in the most promising regions of parameter space identified in this work.

First demonstration for a LArTPC-based search for intranuclear neutron-antineutron transitions and annihilation in 40Ar using the MicroBooNE detector
MicroBooNE collaboration
[ arXiv:2308.03924 ]

Abstract In this paper, we present a novel methodology to search for intranuclear neutron-antineutron transition (n→n¯) followed by annihilation within an 40Ar nucleus, using the MicroBooNE liquid argon time projection chamber (LArTPC) detector. A discovery of n→n¯ transition or increased lower limit on the lifetime of this process would either constitute physics beyond the Standard Model or greatly constrain theories of baryogenesis, respectively. The approach presented in this paper makes use of deep learning methods to select n→n¯ events based on their unique features and differentiate them from cosmogenic backgrounds. The achieved signal and background efficiencies are (70±6)\% and (0.0020±0.0003)\%, respectively. A demonstration of a search is performed with a data set corresponding to an exposure of 3.32×1026neutron-years, and where the background rate is constrained through direct measurement, assuming the presence of a negligible signal. With this approach, no excess of events over the background prediction is observed, setting a demonstrative lower bound on the n→n¯ lifetime in 40Ar of τm>1.1×1026years, and on the free n→n¯ transition time of τn−n¯>2.6×105s, each at the 90% confidence level. This analysis represents a first-ever proof-of-principle demonstration of the ability to search for this rare process in LArTPCs with high efficiency and low background.

NuCLR, Nuclear Co-Learned Representations
Ouail Kitouni, Niklas Nolte, Sokratis Trifinopoulos, Subhash Kantamneni, Mike Williams
[ arXiv:2306.06099 ]

Abstract We introduce Nuclear Co-Learned Representations (NuCLR), a deep learning model that predicts various nuclear observables, including binding and decay energies, and nuclear charge radii. The model is trained using a multi-task approach with shared representations and obtains state-of-the-art performance, achieving levels of precision that are crucial for understanding fundamental phenomena in nuclear (astro)physics. We also report an intriguing finding that the learned representation of NuCLR exhibits the prominent emergence of crucial aspects of the nuclear shell model, namely the shell structure, including the well-known magic numbers, and the Pauli Exclusion Principle. This suggests that the model is capable of capturing the underlying physical principles and that our approach has the potential to offer valuable insights into nuclear theory.

Synthetic Gaia DR3 surveys from the FIRE cosmological simulations of Milky-Way-mass galaxies
Tri Nguyen, Xiaowei Ou, Nondh Panithanpaisal, Nora Shipp, Lina Necib, Robyn Sanderson, Andrew Wetzel
[ arXiv:2306.16475 ]

Abstract The third data release (DR3) of Gaia has provided a five-fold increase in the number of radial velocity measurements of stars, as well as a stark improvement in parallax and proper motion measurements. To help with studies that seek to test models and interpret Gaia DR3, we present nine Gaia synthetic surveys, based on three solar positions in three Milky-Way-mass galaxies of the Latte suite of the Fire-2 cosmological simulations. These synthetic surveys match the selection function, radial velocity measurements, and photometry of Gaia DR3, adapting the code base Ananke, previously used to match the Gaia DR2 release in Sanderson et al. 2020. The synthetic surveys are publicly available and can be found at this http URL. Similarly to the previous release of Ananke, these surveys are based on cosmological simulations and thus able to model non-equilibrium dynamical effects, making them a useful tool in testing and interpreting Gaia DR3.

Development of the Topological Trigger for LHCb Run 3
Nicole Schulte, Blaise Raheem Delaney, Niklas Nolte, Gregory Max Ciezarek, Johannes Albrecht, Mike Williams
[ arXiv:2306.09873 ]

Abstract The data-taking conditions expected in Run 3 of the LHCb experiment at CERN are unprecedented and challenging for the software and computing systems. Despite that, the LHCb collaboration pioneers the use of a software-only trigger system to cope with the increased event rate efficiently. The beauty physics programme of LHCb is heavily reliant on topological triggers. These are devoted to selecting beauty-hadron candidates inclusively, based on the characteristic decay topology and kinematic properties expected from beauty decays. The following proceeding describes the current progress of the Run 3 implementation of the topological triggers using Lipschitz monotonic neural networks. This architecture offers robustness under varying detector conditions and sensitivity to long-lived candidates, improving the possibility of discovering New Physics at LHCb.

Multiple Peaks and a Long Precursor in the Type IIn Supernova 2021qqp: An Energetic Explosion in a Complex Circumstellar Environment
Daichi Hiramatsu, Tatsuya Matsumoto, Edo Berger, Conor Ransome, V. Ashley Villar, Sebastian Gomez, Yvette Cendes, Kishalay De, K. Azalee Bostroem, Joseph Farah, D. Andrew Howell, Curtis McCully, Megan Newsome, Estefania Padilla Gonzalez, Craig Pellegrino, Akihiro Suzuki, Giacomo Terreran
[ arXiv:2305.11168 ]

Abstract We present optical photometry and spectroscopy of the Type IIn supernova (SN) 2021qqp. Its unusual light curve is marked by a long precursor for ≈300 days, a rapid increase in brightness for ≈60 days, and then a sharp increase of ≈1.6 mag in only a few days to a first peak of Mr≈−19.5 mag. The light curve then declines rapidly until it re-brightens to a second distinct peak of Mr≈−17.3 mag centered at ≈335 days after the first peak. The spectra are dominated by Balmer lines with a complex morphology, including a narrow component with a width of ≈1300 km s−1 (first peak) and ≈2500 km s−1 (second peak) that we associate with the circumstellar medium (CSM) and a P Cygni component with an absorption velocity of ≈8500 km s−1 (first peak) and ≈5600 km s−1 (second peak) that we associate with the SN-CSM interaction shell. Using the luminosity and velocity evolution, we construct a flexible analytical model, finding two significant mass-loss episodes with peak mass loss rates of ≈10 and ≈5M⊙ yr−1 about 0.8 and 2 yr before explosion, respectively, with a total CSM mass of ≈2−4M⊙. We show that the most recent mass-loss episode could explain the precursor for the year preceding the explosion. The SN ejecta mass is constrained to be ≈5−30M⊙ for an explosion energy of ≈(3−10)×1051 erg. We discuss eruptive massive stars (luminous blue variable, pulsational pair instability) and an extreme stellar merger with a compact object as possible progenitor channels.

Symbolic Regression on FPGAs for Fast Machine Learning Inference
Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar, Ekaterina Govorkova, Miles Cranmer, Sridhara Dasu, Peter Elmer, Philip Harris, Isobel Ojalvo, Maurizio Pierini
[ arXiv:2305.04099 ]

Abstract The high-energy physics community is investigating the feasibility of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to improve physics sensitivity while meeting data processing latency limitations. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches equation space to discover algebraic relations approximating a dataset. We use PySR (software for uncovering these expressions based on evolutionary algorithm) and extend the functionality of hls4ml (a package for machine learning inference in FPGAs) to support PySR-generated expressions for resource-constrained production environments. Deep learning models often optimise the top metric by pinning the network size because vast hyperparameter space prevents extensive neural architecture search. Conversely, SR selects a set of models on the Pareto front, which allows for optimising the performance-resource tradeoff directly. By embedding symbolic forms, our implementation can dramatically reduce the computational resources needed to perform critical tasks. We validate our procedure on a physics benchmark: multiclass classification of jets produced in simulated proton-proton collisions at the CERN Large Hadron Collider, and show that we approximate a 3-layer neural network with an inference model that has as low as 5 ns execution time (a reduction by a factor of 13) and over 90% approximation accuracy.

Pileup and Infrared Radiation Annihilation (PIRANHA): A Paradigm for Continuous Jet Grooming
Samuel Alipour-Fard, Patrick T. Komiske, Eric M. Metodiev, Jesse Thaler
[ arXiv:2305.00989 ]

Abstract Jet grooming is an important strategy for analyzing relativistic particle collisions in the presence of contaminating radiation. Most jet grooming techniques introduce hard cutoffs to remove soft radiation, leading to discontinuous behavior and associated experimental and theoretical challenges. In this paper, we introduce Pileup and Infrared Radiation Annihilation (PIRANHA), a paradigm for continuous jet grooming that overcomes the discontinuity and infrared sensitivity of hard-cutoff grooming procedures. We motivate PIRANHA from the perspective of optimal transport and the Energy Movers Distance and review Apollonius Subtraction and Iterated Voronoi Subtraction as examples of PIRANHA-style grooming. We then introduce a new tree-based implementation of PIRANHA, Recursive Subtraction, with reduced computational costs. Finally, we demonstrate the performance of Recursive Subtraction in mitigating sensitivity to soft distortions from hadronization and detector effects, and additive contamination from pileup and the underlying event.

Prometheus: An Open-Source Neutrino Telescope Simulation
Jeffrey Lazar, Stephan Meighen-Berger, Christian Haack, David Kim, Santiago Giner, Carlos A. Argüelles
[ arXiv:2304.14526 ]

Abstract Neutrino telescopes are gigaton-scale neutrino detectors comprised of individual light-detection units. Though constructed from simple building blocks, they have opened a new window to the Universe and are able to probe center-of-mass energies that are comparable to those of collider experiments. \prometheus{} is a new, open-source simulation tailored for this kind of detector. Our package, which is written in a combination of \texttt{C++} and \texttt{Python} provides a balance of ease of use and performance and allows the user to simulate a neutrino telescope with arbitrary geometry deployed in ice or water. \prometheus{} simulates the neutrino interactions in the volume surrounding the detector, computes the light yield of the hadronic shower and the out-going lepton, propagates the photons in the medium, and records their arrival times and position in user-defined regions. Finally, \prometheus{} events are serialized into a \texttt{parquet} file, which is a compact and interoperational file format that allows prompt access to the events for further analysis.

The dark matter profile of the Milky Way inferred from its circular velocity curve
Xiaowei Ou, Anna-Christina Eilers, Lina Necib, Anna Frebel
[ arXiv:2303.12838 ]

Abstract All galaxies are formed within dark matter halos, the nature of which is yet to be understood. The circular velocity curve, one of the first pieces of evidence for dark matter, is a direct probe of the Galaxy's potential, which allows studies of the nature of these dark matter halos. Recent large surveys have provided valuable information for determining the Milky Way circular velocity curve. In this study, we derive precise parallaxes for 120,309 stars with a data-driven model, using APOGEE DR17 spectra combined with photometry measurements from Gaia, 2MASS, and WISE. We measure the circular velocity curve of the Milky Way out to ∼30 kpc, and use it to provide an updated model of the dark matter density profile. We find a significantly faster decline in the circular velocity curve at outer galactic radii. To address this decline, we find that a cored Einasto profile with slope parameter 1.13+0.06−0.06 is a better fit to the data than a generalized or contracted Navarro-Frank-White (NFW), as was argued in previous studies. The virial mass of the best-fit dark matter halo is 1.50+0.04−0.04×1011 M⊙, significantly lower than that from a generalized NFW profile, but the corresponding local dark matter density at the solar position is 0.425+0.004−0.004 GeV cm−3, consistent with the literature. We additionally find the J-factor for annihilating dark matter at a 15∘ view angle towards the galactic centre is 9.96+0.64−0.57×1022 GeV2 cm−5, ∼8% of the value found from a standard NFW profile used in the literature. Our results further demonstrate the capability of the circular velocity curve, especially in light of the recent wave of data, in constraining the Milky Way's dark matter halo.

Via Machinae 2.0: Full-Sky, Model-Agnostic Search for Stellar Streams in Gaia DR2
David Shih, Matthew R. Buckley, Lina Necib
[ arXiv:2303.01529 ]

Abstract We present an update to Via Machinae, an automated stellar stream-finding algorithm based on the deep learning anomaly detector ANODE. Via Machinae identifies stellar streams within Gaia, using only angular positions, proper motions, and photometry, without reference to a model of the Milky Way potential for orbit integration or stellar distances. This new version, Via Machinae 2.0, includes many improvements and refinements to nearly every step of the algorithm, that altogether result in more robust and visually distinct stream candidates than our original formulation. In this work, we also provide a quantitative estimate of the false positive rate of Via Machinae 2.0 by applying it to a simulated Gaia-mock catalog based on Galaxia, a smooth model of the Milky Way that does not contain substructure or stellar streams. Finally, we perform the first full-sky search for stellar streams with Via Machinae 2.0, identifying 102 streams at high significance within the Gaia Data Release 2, of which only 10 have been previously identified. While follow-up observations for further confirmation are required, taking into account the false positive rate presented in this work, we expect approximately 90 of these stream candidates to correspond to real stellar structures.

Measuring the 8621 Å Diffuse Interstellar Band in Gaia DR3 RVS Spectra: Obtaining a Clean Catalog by Marginalizing over Stellar Types
Andrew K. Saydjari, Catherine Zucker, J. E. G. Peek, Douglas P. Finkbeiner
[ arXiv:2212.03879 ]

Abstract Diffuse interstellar bands (DIBs) are broad absorption features associated with interstellar dust and can serve as chemical and kinematic tracers. Conventional measurements of DIBs in stellar spectra are complicated by residuals between observations and best-fit stellar models. To overcome this, we simultaneously model the spectrum as a combination of stellar, dust, and residual components, with full posteriors on the joint distribution of the components. This decomposition is obtained by modeling each component as a draw from a high-dimensional Gaussian distribution in the data-space (the observed spectrum) -- a method we call "Marginalized Analytic Data-space Gaussian Inference for Component Separation" (MADGICS). We use a data-driven prior for the stellar component, which avoids missing stellar features not included in synthetic line lists. This technique provides statistically rigorous uncertainties and detection thresholds, which are required to work in the low signal-to-noise regime that is commonplace for dusty lines of sight. We reprocess all public Gaia DR3 RVS spectra and present an improved 8621 Å DIB catalog, free of detectable stellar line contamination. We constrain the rest-frame wavelength to 8623.14±0.087 Å (vacuum), find no significant evidence for DIBs in the Local Bubble from the 1/6th of RVS spectra that are public, and show unprecedented correlation with kinematic substructure in Galactic CO maps. We validate the catalog, its reported uncertainties, and biases using synthetic injection tests. We believe MADGICS provides a viable path forward for large-scale spectral line measurements in the presence of complex spectral contamination.

Can denoising diffusion probabilistic models generate realistic astrophysical fields?
Nayantara Mudur, Douglas P. Finkbeiner
[ arXiv:2211.12444 ]

Abstract Score-based generative models have emerged as alternatives to generative adversarial networks (GANs) and normalizing flows for tasks involving learning and sampling from complex image distributions. In this work we investigate the ability of these models to generate fields in two astrophysical contexts: dark matter mass density fields from cosmological simulations and images of interstellar dust. We examine the fidelity of the sampled cosmological fields relative to the true fields using three different metrics, and identify potential issues to address. We demonstrate a proof-of-concept application of the model trained on dust in denoising dust images. To our knowledge, this is the first application of this class of models to the interstellar medium.

Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Mover’s Distance
Ouail Kitouni, Niklas Nolte, Mike Williams
[ arXiv:2209.15624 ]

Abstract A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of the model by constraining the norm of its weights in a minimal way, resulting in higher expressiveness compared to other techniques. We present a new and interesting direction for this architecture: estimation of the Wasserstein metric (Earth Mover's Distance) in optimal transport by employing the Kantorovich-Rubinstein duality to enable its use in geometric fitting applications. Specifically, we focus on the field of high-energy particle physics, where it has been shown that a metric for the space of particle-collider events can be defined based on the Wasserstein metric, referred to as the Energy Mover's Distance (EMD). This metrization has the potential to revolutionize data-driven collider phenomenology. The work presented here represents a major step towards realizing this goal by providing a differentiable way of directly calculating the EMD. We show how the flexibility that our approach enables can be used to develop novel clustering algorithms.

Hardware-accelerated Inference for Real-Time Gravitational-Wave Astronomy
Alec Gunny, Dylan Rankin, Jeffrey Krupa, Muhammed Saleem, Tri Nguyen, Michael Coughlin, Philip Harris, Erik Katsavounidis, Steven Timm, Burt Holzman
[ arXiv:2108.12430 ]

Abstract The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deployment of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu, Jennifer Ngadiuba, Mia Liu, Duc Hoang, Edward Kreinar, Zhenbin Wu
[ arXiv:2103.05579 ]

Abstract Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

Published

An Extensive Hubble Space Telescope Study of the Offset and Host Light Distributions of Type I Superluminous Supernovae
Brian Hsu, Peter K. Blanchard, Edo Berger, Sebastian Gomez
The Astrophysical Journal 2024, Volume 961, Number 2 [ arXiv:2308.07271 ]

Abstract We present an extensive Hubble Space Telescope (HST) rest-frame ultraviolet (UV) imaging study of the locations of Type I superluminous supernovae (SLSNe) within their host galaxies. The sample includes 65 SLSNe with detected host galaxies in the redshift range z≈0.05−2. Using precise astrometric matching with SN images, we determine the distributions of physical and host-normalized offsets relative to the host centers, as well as the fractional flux distribution relative to the underlying UV light distribution. We find that the host-normalized offsets of SLSNe roughly track an exponential disk profile, but exhibit an overabundance of sources with large offsets of 1.5−4 times their host half-light radius. The SLSNe normalized offsets are systematically larger than those of long gamma-ray bursts (LGRBs), and even Type Ib/c and II SNe. Furthermore, we find that about 40\% of all SLSNe occur in the dimmest regions of their host galaxies (fractional flux of 0), in stark contrast to LGRBs and Type Ib/c and II SNe. We do not detect any significant trends in the locations of SLSNe as a function of redshift, or as a function of explosion and magnetar engine parameters inferred from modeling of their optical lights curves. The significant difference in SLSN locations compared to LGRBs (and normal core-collapse SNe) suggests that at least some of their progenitors follow a different evolutionary path. We speculate that SLSNe arise from massive runaway stars from disrupted binary systems, with velocities of ∼102 km s−1.

A Parsec-Scale Galactic 3D Dust Map out to 1.25 kpc from the Sun
Gordian Edenhofer, Catherine Zucker, Philipp Frank, Andrew K. Saydjari, Joshua S. Speagle, Douglas Finkbeiner, Torsten Enßlin
Astronomy & Astrophysics, Forthcoming article, 2024, Section Interstellar and circumstellar matter [ arXiv:2308.01295 ]

Abstract High-resolution 3D maps of interstellar dust are critical for probing the underlying physics shaping the structure of the interstellar medium, and for foreground correction of astrophysical observations affected by dust. We aim to construct a new 3D map of the spatial distribution of interstellar dust extinction out to a distance of 1.25 kpc from the Sun. We leverage distance and extinction estimates to 54 million nearby stars derived from the Gaia BP/RP spectra. Using the stellar distance and extinction information, we infer the spatial distribution of dust extinction. We model the logarithmic dust extinction with a Gaussian Process in a spherical coordinate system via Iterative Charted Refinement and a correlation kernel inferred in previous work. We probe our 661 million dimensional posterior distribution using the variational inference method MGVI. Our 3D dust map achieves an angular resolution of 14' (Nside = 256). We sample the dust extinction in 516 distance bins spanning 69 pc to 1250 pc. We obtain a maximum distance resolution of 0.4 pc at 69 pc and a minimum distance resolution of 7 pc at 1.25 kpc. Our map resolves the internal structure of hundreds of molecular clouds in the solar neighborhood and will be broadly useful for studies of star formation, Galactic structure, and young stellar populations.

From Discovery to the First Month of the Type II Supernova 2023ixf: High and Variable Mass Loss in the Final Year before Explosion
Daichi Hiramatsu, Daichi Tsuna, Edo Berger, Koichi Itagaki, Jared A. Goldberg, Sebastian Gomez, Kishalay De, Griffin Hosseinzadeh, K. Azalee Bostroem, Peter J. Brown, Iair Arcavi, Allyson Bieryla, Peter K. Blanchard, Gilbert A. Esquerdo, Joseph Farah, D. Andrew Howell, Tatsuya Matsumoto, Curtis McCully, Megan Newsome, Estefania Padilla Gonzalez, Craig Pellegrino, Jaehyon Rhee, Giacomo Terreran, József Vinkó, J. Craig Wheeler
The Astrophysical Journal Letters 2023, Volume 955, Number 1 [ arXiv:2307.03165 ]

Abstract We present the discovery of the Type II supernova SN 2023ixf in M101 and follow-up photometric and spectroscopic observations, respectively, in the first month and week of its evolution. Our discovery was made within a day of estimated first light, and the following light curve is characterized by a rapid rise (≈5 days) to a luminous peak (MV≈−18.2 mag) and plateau (MV≈−17.6 mag) extending to 30 days with a fast decline rate of ≈0.03 mag day−1. During the rising phase, U−V color shows blueward evolution, followed by redward evolution in the plateau phase. Prominent flash features of hydrogen, helium, carbon, and nitrogen dominate the spectra up to ≈5 days after first light, with a transition to a higher ionization state in the first ≈2 days. Both the U−V color and flash ionization states suggest a rise in the temperature, indicative of a delayed shock breakout inside dense circumstellar material (CSM). From the timescales of CSM interaction, we estimate its compact radial extent of ∼(3−7)×1014 cm. We then construct numerical light-curve models based on both continuous and eruptive mass-loss scenarios shortly before explosion. For the continuous mass-loss scenario, we infer a range of mass-loss history with 0.1−1.0M⊙yr−1 in the final 2−1 yr before explosion, with a potentially decreasing mass loss of 0.01−0.1M⊙yr−1 in ∼0.7−0.4 yr toward the explosion. For the eruptive mass-loss scenario, we favor eruptions releasing 0.3−1M⊙ of the envelope at about a year before explosion, which result in CSM with mass and extent similar to the continuous scenario. We discuss the implications of the available multiwavelength constraints obtained thus far on the progenitor candidate and SN 2023ixf to our variable CSM models.

Exotic energy injection in the early universe I: a novel treatment for low-energy electrons and photons
Hongwan Liu, Wenzer Qin, Gregory W. Ridgway, Tracy R. Slatyer
APS Journals 2023, Volume 108, Issue 4 [ arXiv:2303.07366 ]

Abstract Decaying or annihilating dark matter and other exotic energy injections can modify the spectrum of the universe's photon bath, resulting in e.g. new contributions to spectral distortions of the cosmic microwave background blackbody spectrum and modifications to the temperature and ionization history of the universe. Here, we present an improved version of the 𝙳𝚊𝚛𝚔𝙷𝚒𝚜𝚝𝚘𝚛𝚢 code, which is now capable of consistently calculating the spectrum of low-energy photons by properly treating the interactions of these photons with the levels of hydrogen atoms. Other changes to the code include a more detailed treatment of energy deposition by low-energy electrons, and spectral distortions from heating of the intergalactic medium. All of the improvements we have made to 𝙳𝚊𝚛𝚔𝙷𝚒𝚜𝚝𝚘𝚛𝚢 are publicly available.

Exotic energy injection in the early universe II: CMB spectral distortions and constraints on light dark matter
Hongwan Liu, Wenzer Qin, Gregory W. Ridgway, Tracy R. Slatyer
APS Journals 2023, Volume 108, Issue 4 [ arXiv:2303.07370 ]

Abstract We calculate the post-recombination contribution to the Cosmic Microwave Background (CMB) spectral distortion due to general exotic energy injections, including dark matter (DM) decaying or annihilating to Standard Model particles. Upon subtracting the background distortion that would be present even without such energy injections, we find residual distortions that are still potentially large enough to be detectable by future experiments such as PIXIE. The distortions also have a high-energy spectral feature that is a unique signature of the injection of high-energy particles. We present a calculation of the global ionization history in the presence of decaying dark matter with sub-keV masses, and also show that previous calculations of the global ionization history in the presence of energy injection are not significantly modified by these additional spectral distortions. Our improved treatment of low-energy electrons allows us to extend calculations of the CMB anisotropy constraints for decaying DM down to arbitrarily low masses. We also recast these bounds as constraints on the coupling of axion-like particles to photons.

Expressive Monotonic Neural Networks
Niklas Nolte, Ouail Kitouni, Mike Williams
International Conference on Learning Representations 2023 [ ]

Abstract The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dic- tates such behavior. This is especially important for interpretability and fairness considerations. In a broader context, scenarios in which monotonicity is impor- tant can be found in finance, medicine, physics, and other disciplines. It is thus desirable to build neural network architectures that implement this inductive bias provably. In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. The weight constraint scheme directly controls the Lipschitz constant of the neural network and thus provides the additional benefit of robustness. Com- pared to currently existing techniques used for monotonicity, our method is sim- pler in implementation and in theory foundations, has negligible computational overhead, is guaranteed to produce monotonic dependence, and is highly expres- sive. We show how the algorithm is used to train powerful, robust, and inter- pretable discriminators that achieve competitive performance compared to current state-of-the-art methods across various benchmarks, from social applications to the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider.

Non-perturbative strong coupling at timelike momenta
Jan Horak, Jan M. Pawlowski, Jonas Turnwald, Julian M. Urban, Nicolas Wink, Savvas Zafeiropoulos
Physical Review D 2023, Volume 107, Issue 7 [ arXiv:2301.08128 ]

Abstract We compute the strong coupling constant of Landau gauge QCD in the full complex momentum plane, both directly and via spectral reconstruction. In particular, we consider the Taylor coupling given by the product of ghost and gluon dressing functions. Assuming spectral representations for the latter, we first show that also the coupling obeys such a representation. The subsequent spectral reconstruction of the coupling data, obtained from 2+1 flavour lattice QCD results for the ghost and gluon, is based on a probabilistic inversion of this representation using Gaussian process regression with analytically enforced asymptotics. In contradistinction, our direct calculation relies on earlier reconstruction results for the ghost and gluon spectral functions themselves, as well as data obtained in functional QCD. Apart from its relevance for studies of resonances or scattering processes, the calculation also serves as a non-trivial benchmark of our reconstruction approach. The results show remarkable agreement, testifying to the reliability of the method.

First demonstration of neural sensing and control in a kilometer-scale gravitational wave observatory
Nikhil Mukund, James Lough, Aparna Bisht, Holger Wittel, Séverin Landry Nadji, Christoph Affeldt, Fabio Bergamin, Marc Brinkmann, Volker Kringel, Harald Lück, Michael Weinert, Karsten Danzmann
Physical Review Applied, 2023, Volume 20, Issue 6 [ arXiv:2301.06221 ]

Abstract Suspended optics in gravitational wave (GW) observatories are susceptible to alignment perturbations, particularly slow drifts over time, due to variations in temperature and seismic levels. Such misalignments affect the coupling of the incident laser beam into the optical cavities, degrade both circulating power and optomechanical photon squeezing and thus decrease the astrophysical sensitivity to merging binaries. Traditional alignment techniques involve differential wavefront sensing using multiple quadrant photodiodes but are often restricted in bandwidth and are limited by the sensing noise. We present the first-ever successful implementation of neural network-based sensing and control at a gravitational wave observatory and demonstrate low-frequency control of the signal recycling mirror at the GEO 600 detector. Alignment information for three critical optics is simultaneously extracted from the interferometric dark port camera images via a CNN-LSTM network architecture and is then used for MIMO control using soft actor-critic-based deep reinforcement learning. Overall sensitivity improvement achieved using our scheme demonstrates deep learning's capabilities as a viable tool for real-time sensing and control for current and next-generation GW interferometers.

Non-parametric Lagrangian biasing from the insights of neural nets
Xiaohan Wu, Julian B. Munoz, Daniel J. Eisenstein
Journal of Cosmology and Astroparticle Physics 2023, Volume 2023 [ arXiv:2212.08095 ]

Abstract We present a Lagrangian model of galaxy clustering bias in which we train a neural net using the local properties of the smoothed initial density field to predict the late-time mass-weighted halo field. By fitting the mass-weighted halo field in the AbacusSummit simulations at z=0.5, we find that including three coarsely spaced smoothing scales gives the best recovery of the halo power spectrum. Adding more smoothing scales may lead to 2-5% underestimation of the large-scale power and can cause the neural net to overfit. We find that the fitted halo-to-mass ratio can be well described by two directions in the original high-dimension feature space. Projecting the original features into these two principal components and re-training the neural net either reproduces the original training result, or outperforms it with a better match of the halo power spectrum. The elements of the principal components are unlikely to be assigned physical meanings, partly owing to the features being highly correlated between different smoothing scales. Our work illustrates a potential need to include multiple smoothing scales when studying galaxy bias, and this can be done easily with machine-learning methods that can take in high dimensional input feature space.

Stellar Reddening Based Extinction Maps for Cosmological Applications
Nayantara Mudur, Core Francisco Park, Douglas P Finkbeiner
The Astrophysical Journal, 2023, Volume 949, Number 2 [ arXiv:2212.04514 ]

Abstract Cosmological surveys must correct their observations for the reddening of extragalactic objects by Galactic dust. Existing dust maps, however, have been found to have spatial correlations with the large-scale structure of the Universe. Errors in extinction maps can propagate systematic biases into samples of dereddened extragalactic objects and into cosmological measurements such as correlation functions between foreground lenses and background objects and the primordial non-gaussianity parameter fNL. Emission-based maps are contaminated by the cosmic infrared background, while maps inferred from stellar-reddenings suffer from imperfect removal of quasars and galaxies from stellar catalogs. Thus, stellar-reddening based maps using catalogs without extragalactic objects offer a promising path to making dust maps with minimal correlations with large-scale structure. We present two high-latitude integrated extinction maps based on stellar reddenings, with a point spread function of full-width half-maximum 6.1' and 15'. We employ a strict selection of catalog objects to filter out galaxies and quasars and measure the spatial correlation of our extinction maps with extragalactic structure. Our galactic extinction maps have reduced spatial correlation with large scale structure relative to most existing stellar-reddening based and emission-based extinction maps.

Variational Neural-Network Ansatz for Continuum Quantum Field Theory
John M. Martyn, Khadijeh Najafi, Di Luo
APS Journals 2023, Volume 131, Issue 8 [ arXiv:2212.00782 ]

Abstract Physicists dating back to Feynman have lamented the difficulties of applying the variational principle to quantum field theories. In non-relativistic quantum field theories, the challenge is to parameterize and optimize over the infinitely many n-particle wave functions comprising the state's Fock space representation. Here we approach this problem by introducing neural-network quantum field states, a deep learning ansatz that enables application of the variational principle to non-relativistic quantum field theories in the continuum. Our ansatz uses the Deep Sets neural network architecture to simultaneously parameterize all of the n-particle wave functions comprising a quantum field state. We employ our ansatz to approximate ground states of various field theories, including an inhomogeneous system and a system with long-range interactions, thus demonstrating a powerful new tool for probing quantum field theories.

Search for boosted Higgs boson decay to a charm quark-antiquark pair in proton-proton collisions at s√ = 13 TeV
CMS Collaboration
Physical Review Letters, 2023, Volume 131, Issue 4 [ arXiv:2211.14181 ]

Abstract A search for the standard model (SM) Higgs boson (H) produced with transverse momentum greater than 450 GeV and decaying to a charm quark-antiquark (cc¯) pair is presented. The search is performed using proton-proton collision data collected at s√ = 13 TeV by the CMS experiment at the LHC, corresponding to an integrated luminosity of 138 fb−1. Boosted H→cc¯ decay products are reconstructed as a single large-radius jet and identified using a deep neural network charm tagging technique. The method is validated by measurement of the Z→cc¯ decay process, which is observed with a signal strength of 1.00+0.17−0.14 (syst) ± 0.08 (theo) ± 0.06 (stat), defined as the ratio of the observed process rate to the standard model expectation. The observed (expected) upper limit on σ(H)(H→cc¯) is set at 47 (39) times the SM prediction at 95% confidence level.

Endothermic self-interacting dark matter in Milky Way-like dark matter haloes
Stephanie ONeil (1), Mark Vogelsberger (1,2), Saniya Heeba (3), Katelin Schutz (3), Jonah C. Rose (4), Paul Torrey (4), Josh Borrow (1), Ryan Low (5), Rakshak Adhikari (5), Mikhail V. Medvedev (5,6), Tracy R. Slatyer (1,2,7), Jesús Zavala (8) ((1) MIT, (2) AIFAI MIT, (3) McGill, (4) UFL, (5) KU, (7) MIT CTP, (8) University of Iceland)
Royal Astronomical Society, 2023, Volume 524, Issue 1 [ arXiv:2210.16328 ]

Abstract Self-interacting dark matter (SIDM) offers the potential to mitigate some of the discrepancies between simulated cold dark matter (CDM) and observed galactic properties. We introduce a physically motivated SIDM model to understand the effects of self interactions on the properties of Milky Way and dwarf galaxy sized haloes. This model consists of dark matter with a nearly degenerate excited state, which allows for both elastic and inelastic scattering. In particular, the model includes a significant probability for particles to up-scatter from the ground state to the excited state. We simulate a suite of zoom-in Milky Way-sized N-body haloes with six models with different scattering cross sections to study the effects of up-scattering in SIDM models. We find that the up-scattering reaction greatly increases the central densities of the main halo through the loss of kinetic energy. However, the physical model still results in significant coring due to the presence of elastic scattering and down-scattering. These effects are not as apparent in the subhalo population compared to the main halo, but the number of subhaloes is reduced compared to CDM.

Deep Learning Detection and Classification of Gravitational Waves from Neutron Star-Black Hole Mergers
Richard Qiu, Plamen Krastev, Kiranjyot Gill, Edo Berger
Physics Letters B, 2023, Volume 840 [ arXiv:2210.15888 ]

Abstract The Laser Interferometer Gravitational-Wave Observatory (LIGO) and Virgo Interferometer Collaborations have now detected all three classes of compact binary mergers: binary black hole (BBH), binary neutron star (BNS), and neutron star-black hole (NSBH). For coalescences involving neutron stars, the simultaneous observation of gravitational and electromagnetic radiation produced by an event, has broader potential to enhance our understanding of these events, and also to probe the equation of state (EOS) of dense matter. However, electromagnetic follow-up to gravitational wave (GW) events requires rapid real-time detection and classification of GW signals, and conventional detection approaches are computationally prohibitive for the anticipated rate of detection of next-generation GW detectors. In this work, we present the first deep learning based results of classification of GW signals from NSBH mergers in extit{real} LIGO data. We show for the first time that a deep neural network can successfully distinguish all three classes of compact binary mergers and separate them from detector noise. Specifically, we train a convolutional neural network (CNN) on ∼500,000 data samples of real LIGO noise with injected BBH, BNS, and NSBH GW signals, and we show that our network has high sensitivity and accuracy. Most importantly, we successfully recover the two confirmed NSBH events to-date (GW200105 and GW200115) and the two confirmed BNS mergers to-date (GW170817 and GW190425), together with ≈90% of all BBH candidate events from the third Gravitational Wave Transient Catalog, GWTC-3. These results are an important step towards low-latency real-time GW detection, enabling multi-messenger astronomy.

Identifying Tidal Disruption Events with an Expansion of the FLEET Machine Learning Algorithm
Sebastian Gomez, V. Ashley Villar, Edo Berger, Suvi Gezari, Sjoert van Velzen, Matt Nicholl, Peter K. Blanchard, Kate. D. Alexander
The Astrophysical Journal, 2023, Volume 949, Issue 113 [ arXiv:2210.10810 ]

Abstract We present an expansion of FLEET, a machine learning algorithm optimized to select transients that are most likely to be tidal disruption events (TDEs). FLEET is based on a random forest algorithm trained on the light curves and host galaxy information of 4,779 spectroscopically classified transients. For transients with a probability of being a TDE, \ptde>0.5, we can successfully recover TDEs with a ≈40\% completeness and a ≈30\% purity when using the first 20 days of photometry, or a similar completeness and ≈50\% purity when including 40 days of photometry. We find that the most relevant features for differentiating TDEs from other transients are the normalized host separation, and the light curve (g−r) color during peak. Additionally, we use FLEET to produce a list of the 39 most likely TDE candidates discovered by the Zwicky Transient Facility that remain currently unclassified. We explore the use of FLEET for the Legacy Survey of Space and Time on the Vera C. Rubin Observatory (\textit{Rubin}) and the \textit{Nancy Grace Roman Space Telescope} (\textit{Roman}). We simulate the \textit{Rubin} and \textit{Roman} survey strategies and estimate that ∼104 TDEs could be discovered every year by \textit{Rubin}, and ∼200 TDEs per year by \textit{Roman}. Finally, we run FLEET on the TDEs in our \textit{Rubin} survey simulation and find that we can recover ∼30\% of those at a redshift z<0.5 with \ptde>0.5. This translates to ∼3,000 TDEs per year that FLEET could uncover from \textit{Rubin}. FLEET is provided as a open source package on GitHub this https URL

The First Two Years of FLEET: an Active Search for Superluminous Supernovae
Sebastian Gomez, Edo Berger, Peter K. Blanchard, Griffin Hosseinzadeh, Matt Nicholl, Daichi Hiramatsu, V. Ashley Villar, Yao Yin
The Astrophysical Journal, 2023, Volume 949, Issue 114 [ arXiv:2210.10811 ]

Abstract In November 2019 we began operating FLEET (Finding Luminous and Exotic Extragalactic Transients), a machine learning algorithm designed to photometrically identify Type I superluminous supernovae (SLSNe) in transient alert streams. Using FLEET, we spectroscopically classified 21 of the 50 SLSNe identified worldwide between November 2019 and January 2022. Based on our original algorithm, we anticipated that FLEET would achieve a purity of about 50\% for transients with a probability of being a SLSN, \pslsn>0.5; the true on-sky purity we obtained is closer to 80\%. Similarly, we anticipated FLEET could reach a completeness of about 30\%, and we indeed measure an upper limit on the completeness of ≈33\%. Here, we present FLEET 2.0, an updated version of FLEET trained on 4,780 transients (almost 3 times more than in FLEET 1.0). FLEET 2.0 has a similar predicted purity to FLEET 1.0, but outperforms FLEET 1.0 in terms of completeness, which is now closer to ≈40\% for transients with \pslsn>0.5. Additionally, we explore possible systematics that might arise from the use of FLEET for target selection. We find that the population of SLSNe recovered by FLEET is mostly indistinguishable from the overall SLSN population, in terms of physical and most observational parameters. We provide FLEET as an open source package on GitHub this https URL

Uncovering dark matter density profiles in dwarf galaxies with graph neural networks
Tri Nguyễn, Siddharth Mishra-Sharma, Reuel Williams, Lina Necib
Physical Review D, 202, Volume 107, Issue 4 [ arXiv:2208.12825 ]

Abstract Dwarf galaxies are small, dark matter-dominated galaxies, some of which are embedded within the Milky Way. Their lack of baryonic matter (e.g., stars and gas) makes them perfect test beds for probing the properties of dark matter -- understanding the spatial dark matter distribution in these systems can be used to constrain microphysical dark matter interactions that influence the formation and evolution of structures in our Universe. We introduce a new method that leverages simulation-based inference and graph-based machine learning in order to infer the dark matter density profiles of dwarf galaxies from observable kinematics of stars gravitationally bound to these systems. Our approach aims to address some of the limitations of established methods based on dynamical Jeans modeling. We show that this novel method can place stronger constraints on dark matter profiles and, consequently, has the potential to weigh in on some of the ongoing puzzles associated with the small-scale structure of dark matter halos, such as the core-cusp discrepancy.

Neural Embedding: Learning the Embedding of the Manifold of Physics Data
Sang Eon Park, Philip Harris, Bryan Ostdiek
Journal of High Energy Physics, 2023, Volume 2023, Article 108 [ arXiv:2208.05484 ]

Abstract In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.

Robust Clustering of the Local Milky Way Stellar Kinematic Substructures with Gaia eDR3
Xiaowei Ou, Lina Necib, Anna Frebel
Royal Astronomical Society, 2023, Volume 521, Issue 2 [ arXiv:2208.01056 ]

Abstract We apply the clustering algorithm HDBSCAN on the Gaia early third data release astrometry combined with the Gaia second data release radial velocity measurements of almost 5.5 million stars to identify the local stellar kinematic substructures in the solar neighborhood. Understanding these structures helps build a more complete picture of the formation of the Milky Way, as well as an empirical phase space distribution of dark matter that would inform detection experiments. The main goal of this study is to provide a list of the most stable clusters, by taking into account the measurement uncertainties and studying the stability of the clustering results. We apply the clustering algorithm in two spaces, in velocity space in order to study recently accreted structures, and in action-angle space to find phase-mixed structures. We find 23 (6) robust clusters in velocity space (action-angle space) that are consistently not associated with noise. They are attributed to the known structures: the Gaia Sausage-Enceladus, the Helmi Stream, and globular cluster NGC 3201 are found in both spaces, while NGC 104 and the thick disk (Sequoia) are identified in velocity space (action-angle space). We discuss the kinematic properties of these structures and study whether many of the small clusters belong to a similar larger cluster based on their chemical abundances. Although we do not identify any new structures, we find that the HDBSCAN member selection of already known structures is unstable to input kinematics of the stars when resampled within their uncertainties. We therefore present the most stable subset of local kinematic structures, which are consistently identified by the clustering algorithm, and emphasize the need to take into account error propagation during both the manual and automated identification of stellar structures, both for existing ones as well as future discoveries. (abridged)

Characterizing the Expected Behavior of Non-Poissonian Template Fitting
Luis Gabriel C. Bariuan, Tracy R. Slatyer
Physical Review D, 2023, Volume 107, Issue 10–15 [ arXiv:2207.13097 ]

Abstract We have performed a systematic study of the statistical behavior of non-Poissonian template fitting (NPTF), a method designed to analyze and characterize unresolved point sources in general counts datasets. In this paper, we focus on the properties and characteristics of the Fermi-LAT gamma-ray data set. In particular, we have simulated and analyzed gamma-ray sky maps under varying conditions of exposure, angular resolution, pixel size, energy window, event selection, and source brightness. We describe how these conditions affect the sensitivity of NPTF to the presence of point sources, for inner-galaxy studies of point sources within the Galactic Center excess, and for the simplified case of isotropic emission. We do not find opportunities for major gains in sensitivity from varying these choices, within the range available with current Fermi-LAT data. We provide an analytic estimate of the NPTF sensitivity to point sources for the case of isotropic emission and perfect angular resolution, and find good agreement with our numerical results for that case.

Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks
Christopher J. Shallue, Daniel J. Eisenstein
Monthly Notices of the Royal Astronomical Society, 2023, Volume 520, Issue 4 [ arXiv:2207.12511 ]

Abstract We present a method to reconstruct the initial linear-regime matter density field from the late-time non-linearly evolved density field in which we channel the output of standard first-order reconstruction to a convolutional neural network (CNN). Our method shows dramatic improvement over the reconstruction of either component alone. We show why CNNs are not well-suited for reconstructing the initial density directly from the late-time density: CNNs are local models, but the relationship between initial and late-time density is not local. Our method leverages standard reconstruction as a preprocessing step, which inverts bulk gravitational flows sourced over very large scales, transforming the residual reconstruction problem from long-range to local and making it ideally suited for a CNN. We develop additional techniques to account for redshift distortions, which warp the density fields measured by galaxy surveys. Our method improves the range of scales of high-fidelity reconstruction by a factor of 2 in wavenumber above standard reconstruction, corresponding to a factor of 8 increase in the number of well-reconstructed modes. In addition, our method almost completely eliminates the anisotropy caused by redshift distortions. As galaxy surveys continue to map the Universe in increasingly greater detail, our results demonstrate the opportunity offered by CNNs to untangle the non-linear clustering at intermediate scales more accurately than ever before.

Modeling early-universe energy injection with Dense Neural Networks
Yitian Sun, Tracy R. Slatyer
Physical Review D, Volume 107, Article 063541 [ arXiv:2207.06425 ]

Abstract We show that Dense Neural Networks can be used to accurately model the cooling of high-energy particles in the early universe, in the context of the public code package DarkHistory. DarkHistory self-consistently computes the temperature and ionization history of the early universe in the presence of exotic energy injections, such as might arise from the annihilation or decay of dark matter. The original version of DarkHistory uses large pre-computed transfer function tables to evolve photon and electron spectra in redshift steps, which require a significant amount of memory and storage space. We present a light version of DarkHistory that makes use of simple Dense Neural Networks to store and interpolate the transfer functions, which performs well on small computers without heavy memory or storage usage. This method anticipates future expansion with additional parametric dependence in the transfer functions without requiring exponentially larger data tables..

The Dark Energy Camera Plane Survey 2 (DECaPS2): More Sky, Less Bias, and Better Uncertainties
A. K. Saydjari, E. F. Schlafly, D. Lang, A. M. Meisner, G. M. Green, C. Zucker, I. Zelko, J. S. Speagle, T. Daylan, A. Lee, F. Valdes, D. Schlegel, D. P. Finkbeiner
The Astrophysical Journal Supplement Series, 2023, Vol 264, Number 2 [ arXiv:2206.11909 ]

Abstract Deep optical and near-infrared imaging of the entire Galactic plane is essential for understanding our Galaxy's stars, gas, and dust. The second data release of the DECam Plane Survey (DECaPS2) extends the five-band optical and near-infrared survey of the southern Galactic plane to cover 6.5% of the sky, |b| < 10° and 6° > l > -124°, complementary to coverage by Pan-STARRS1. Typical single-exposure effective depths, including crowding effects and other complications, are 23.5, 22.6, 22.1, 21.6, and 20.8 mag in g, r, i, z, and Y bands, respectively, with around 1 arcsecond seeing. The survey comprises 3.32 billion objects built from 34 billion detections in 21.4 thousand exposures, totaling 260 hours open shutter time on the Dark Energy Camera (DECam) at Cerro Tololo. The data reduction pipeline features several improvements, including the addition of synthetic source injection tests to validate photometric solutions across the entire survey footprint. A convenient functional form for the detection bias in the faint limit was derived and leveraged to characterize the photometric pipeline performance. A new post-processing technique was applied to every detection to de-bias and improve uncertainty estimates of the flux in the presence of structured backgrounds, specifically targeting nebulosity. The images and source catalogs are publicly available at this http URL: http://decaps.skymaps.info/

Revealing the Milky Way’s Most Recent Major Merger with a Gaia EDR3 Catalog of Machine-Learned Line-of-Sight Velocities
Adriana Dropulic, Hongwan Liu, Bryan Ostdiek, Mariangela Lisanti
Monthly Notices of the Royal Astronomical Society, May 2023, Volume 521, Issue 2 [ arXiv:2205.12278 ]

Abstract Machine learning can play a powerful role in inferring missing line-of-sight velocities from astrometry in surveys such as Gaia. In this paper, we apply a neural network to Gaia Early Data Release 3 (EDR3) and obtain line-of-sight velocities and associated uncertainties for ~92 million stars. The network, which takes as input a star's parallax, angular coordinates, and proper motions, is trained and validated on ~6.4 million stars in Gaia with complete phase-space information. The network's uncertainty on its velocity prediction is a key aspect of its design; by properly convolving these uncertainties with the inferred velocities, we obtain accurate stellar kinematic distributions. As a first science application, we use the new network-completed catalog to identify candidate stars that belong to the Milky Way's most recent major merger, Gaia-Sausage-Enceladus (GSE). We present the kinematic, energy, angular momentum, and spatial distributions of the ~450,000 GSE candidates in this sample, and also study the chemical abundances of those with cross matches to GALAH and APOGEE. The network's predictive power will only continue to improve with future Gaia data releases as the training set of stars with complete phase-space information grows. This work provides a first demonstration of how to use machine learning to exploit high-dimensional correlations on data to infer line-of-sight velocities, and offers a template for how to train, validate and apply such a neural network when complete observational data is not available.

Bias and Priors in Machine Learning Calibrations for High Energy Physics
Rikab Gambhir, Benjamin Nachman, Jesse Thaler
Physical Review D, Volume 106, Article 036011 [ arXiv:2205.05084 ]

Abstract Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.

Learning Uncertainties the Frequentist Way: Calibration and Correlation in High Energy Physics
Rikab Gambhir, Benjamin Nachman, Jesse Thaler
Physical Review Letters, 2022, Volume 129, Article 082001 [ arXiv:2205.03413 ]

Abstract Calibration is a common experimental physics problem, whose goal is to infer the value and uncertainty of an unobservable quantity Z given a measured quantity X. Additionally, one would like to quantify the extent to which X and Z are correlated. In this paper, we present a machine learning framework for performing frequentist maximum likelihood inference with Gaussian uncertainty estimation, which also quantifies the mutual information between the unobservable and measured quantities. This framework uses the Donsker-Varadhan representation of the Kullback-Leibler divergence -- parametrized with a novel Gaussian Ansatz -- to enable a simultaneous extraction of the maximum likelihood values, uncertainties, and mutual information in a single training. We demonstrate our framework by extracting jet energy corrections and resolution factors from a simulation of the CMS detector at the Large Hadron Collider. By leveraging the high-dimensional feature space inside jets, we improve upon the nominal CMS jet resolution by upwards of 15%.

Photometrically-Classified Superluminous Supernovae from the Pan-STARRS1 Medium Deep Survey: A Case Study for Science with Machine Learning-Based Classification
Brian Hsu, Griffin Hosseinzadeh, V. Ashley Villar, Edo Berger
The Astrophysical Journal, 2022, Volume 937, Number 1 [ arXiv:2204.09809 ]

Abstract With the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), it is expected that only ∼0.1% of all transients will be classified spectroscopically. To conduct studies of rare transients, such as Type I superluminous supernovae (SLSNe), we must instead rely on photometric classification. In this vein, here we carry out a pilot study of SLSNe from the Pan-STARRS1 Medium-Deep Survey (PS1-MDS) classified photometrically with our SuperRAENN and Superphot algorithms. We first construct a sub-sample of the photometric sample using a list of simple selection metrics designed to minimize contamination and ensure sufficient data quality for modeling. We then fit the multi-band light curves with a magnetar spin-down model using the Modular Open-Source Fitter for Transients (MOSFiT). Comparing the magnetar engine and ejecta parameter distributions of the photometric sample to those of the PS1-MDS spectroscopic sample and a larger literature spectroscopic sample, we find that these samples are overall consistent, but that the photometric sample extends to slower spins and lower ejecta masses, which correspond to lower luminosity events, as expected for photometric selection. While our PS1-MDS photometric sample is still smaller than the overall SLSN spectroscopic sample, our methodology paves the way to an orders-of-magnitude increase in the SLSN sample in the LSST era through photometric selection and study.

Luminous Supernovae: Unveiling a Population Between Superluminous and Normal Core-collapse Supernovae
Sebastian Gomez, Edo Berger, Matt Nicholl, Peter K. Blanchard, Griffin Hosseinzadeh
The Astrophysical Journal, 2022, Volume 941, Number 2 [ arXiv:2204.08486 ]

Abstract Stripped-envelope core-collapse supernovae can be divided into two broad classes: the common Type Ib/c supernovae (SNe Ib/c), powered by the radioactive decay of 56Ni, and the rare superluminous supernovae (SLSNe), most likely powered by the spin-down of a magnetar central engine. Up to now, the intermediate regime between these two populations has remained mostly unexplored. Here, we present a comprehensive study of 40 extit{luminous supernovae} (LSNe), SNe with peak magnitudes of Mr=−19 to −20 mag, bound by SLSNe on the bright end and by SNe Ib/c on the dim end. Spectroscopically, LSNe appear to form a continuum between Type Ic SNe and SLSNe. Given their intermediate nature, we model the light curves of all LSNe using a combined magnetar plus radioactive decay model and find that they are indeed intermediate, not only in terms of their peak luminosity and spectra, but also in their rise times, power sources, and physical parameters. We sub-classify LSNe into distinct groups that are either as fast-evolving as SNe Ib/c or as slow-evolving as SLSNe, and appear to be either radioactively or magnetar powered, respectively. Our findings indicate that LSNe are powered by either an over-abundant production of 56Ni or by weak magnetar engines, and may serve as the missing link between the two populations.

Quantification of high dimensional non-Gaussianities and its implication to Fisher analysis in cosmology
Core Francisco Park, Erwan Allys, Francisco Villaescusa-Navarro, Douglas P. Finkbeiner
The Astrophysical Journal, Volume 946, Number 2 [ arXiv:2204.05435 ]

Abstract It is well known that the power spectrum is not able to fully characterize the statistical properties of non-Gaussian density fields. Recently, many different statistics have been proposed to extract information from non-Gaussian cosmological fields that perform better than the power spectrum. The Fisher matrix formalism is commonly used to quantify the accuracy with which a given statistic can constrain the value of the cosmological parameters. However, these calculations typically rely on the assumption that the likelihood of the considered statistic follows a multivariate Gaussian distribution. In this work we follow Sellentin & Heavens (2017) and use two different statistical tests to identify non-Gaussianities in different statistics such as the power spectrum, bispectrum, marked power spectrum, and wavelet scatering transform (WST). We remove the non-Gaussian components of the different statistics and perform Fisher matrix calculations with the extit{Gaussianized} statistics using Quijote simulations. We show that constraints on the parameters can change by a factor of ∼2 in some cases. We show with simple examples how statistics that do not follow a multivariate Gaussian distribution can achieve artificially tight bounds on the cosmological parameters when using the Fisher matrix formalism. We think that the non-Gaussian tests used in this work represent a powerful tool to quantify the robustness of Fisher matrix calculations and their underlying assumptions. We release the code used to compute the power spectra, bispectra, and WST that can be run on both CPUs and GPUs.

Constraining the Time of Gravitational Wave Emission from Core-Collapse Supernovae
Kiranjyot Gill, Griffin Hosseinzadeh, Edo Berger, Michele Zanolin, Marek Szczepanczyk
The Astrophysical Journal, 2022, Volume 931, Number 2 [ arXiv:2201.03609 ]

Abstract The advent of sensitive gravitational wave (GW) detectors, coupled with wide-field, high cadence optical time-domain surveys, raises the possibility of the first join GW-electromagnetic (EM) detections of core-collapse supernovae (CCSNe). For targeted searches of Gas from CCSNe optical observation can be used to increase the sensitivity of the search by restricting the relevant time interval, defined here as the GW search window (GSW). The extent of the GSW is a critical factor in determining the achievable false alarm probability (FAP) for a triggered CCSN search. The ability to constrain the GSW from optical observations depends on how early a CCSN is detected, as well as the ability to model the early optical emission. Here we present several approaches to constrain the GSW, ranging in complexity from model-independent analytical fits of the early light curve, model-dependent fits of the rising or entire light curve, and a new data-driven approach using existing well-sampled CCSN light curves from {\it Kepler} and the Transiting Exoplanet Survey Satellite (TESS). We use these approaches to determine the time of core-collapse and its associated uncertainty (i.e., the GSW). We apply our methods to two Type II See that occurred during LIGO/Virgo Observing Run 3: SN\,2019fcn and SN\,2019ejj (both in the same galaxy at d = 15.7 Mac). Our approach shortens the duration of the GSW and improves the robustness of the GSW compared to techniques used in past GW CCSN searches.

Photometry on Structured Backgrounds: Local Pixelwise Infilling by Regression
Andrew K. Saydjari, Douglas P. Finkbeiner
The Astrophysical Journal, 2022, Volume 933, Number 2 [ arXiv:2201.07246 ]

Abstract Photometric pipelines struggle to estimate both the flux and flux uncertainty for stars in the presence of structured backgrounds such as filaments or clouds. However, it is exactly stars in these complex regions that are critical to understanding star formation and the structure of the interstellar medium. We develop a method, similar to Gaussian process regression, which we term local pixelwise infilling (LPI). Using a local covariance estimate, we predict the background behind each star and the uncertainty on that prediction in order to improve estimates of flux and flux uncertainty. We show the validity of our model on synthetic data and real dust fields. We further demonstrate that the method is stable even in the crowded field limit. While we focus on optical-IR photometry, this method is not restricted to those wavelengths. We apply this technique to the 34 billion detections in the second data release of the Dark Energy Camera Plane Survey (DECaPS2). In addition to removing many >3σ outliers and improving uncertainty estimates by a factor of ∼2−3 on nebulous fields, we also show that our method is well-behaved on uncrowded fields. The entirely post-processing nature of our implementation of LPI photometry allows it to easily improve the flux and flux uncertainty estimates of past as well as future surveys.

Impact of Massive Binary Star and Cosmic Evolution on Gravitational Wave Observations II: Double Compact Object Rates and Properties
Floor S. Broekgaarden, Edo Berger, Simon Stevenson, Stephen Justham, Ilya Mandel, Martyna Churślińska, Like A. C. van Son, Tom Wagg, Alejandro Vigna-Gómez, Selma E. De Mink, Debatri Chattopadhyay, Coenraad J. Neijssel
Monthly Notices of the Royal Astronomical Society, 2022, Volume 516, Issue 4, Pages 5737–5761 [ arXiv:2112.05763 ]

Abstract Making the most of the rapidly increasing population of gravitational-wave detections of black hole (BH) and neutron star (NS) mergers requires comparing observations with population synthesis predictions. In this work we investigate the combined impact from the key uncertainties in population synthesis modelling of the isolated binary evolution channel: the physical processes in massive binary-star evolution and the star formation history as a function of metallicity, Z, and redshift z,S(Z,z). Considering these uncertainties we create 560 different publicly available model realizations and calculate the rate and distribution characteristics of detectable BHBH, BHNS, and NSNS mergers. We find that our stellar evolution and S(Z,z) variations can impact the predicted intrinsic and detectable merger rates by factors 102-104. We find that BHBH rates are dominantly impacted by S(Z,z) variations, NSNS rates by stellar evolution variations and BHNS rates by both. We then consider the combined impact from all uncertainties considered in this work on the detectable mass distribution shapes (chirp mass, individual masses and mass ratio). We find that the BHNS mass distributions are predominantly impacted by massive binary-star evolution changes. For BHBH and NSNS we find that both uncertainties are important. We also find that the shape of the delay time and birth metallicity distributions are typically dominated by the choice of S(Z,z) for BHBH, BHNS and NSNS. We identify several examples of robust features in the mass distributions predicted by all 560 models, such that we expect more than 95% of BHBH detections to contain a BH ≳8M⊙ and have mass ratios ≲4. Our work demonstrates that it is essential to consider a wide range of allowed models to study double compact object merger rates and properties.

Substructure Detection Reanalyzed: Dark Perturber shown to be a Line-of-Sight Halo
Atınç Çağan Şengül, Cora Dvorkin, Bryan Ostdiek, Arthur Tsang
Monthly Notices of the Royal Astronomical Society, 2022, Volume 515, Issue 3, Pages 4391–4401 [ arXiv:2112.00749 ]

Abstract Observations of structure at sub-galactic scales are crucial for probing the properties of dark matter, which is the dominant source of gravity in the universe. It will become increasingly important for future surveys to distinguish between line-of-sight halos and subhalos to avoid wrong inferences on the nature of dark matter. We reanalyze a sub-galactic structure (in lens JVAS B1938+666) that has been previously found using the gravitational imaging technique in galaxy-galaxy lensing systems. This structure has been assumed to be a satellite in the halo of the main lens galaxy. We fit the redshift of the perturber of the system as a free parameter, using the multi-plane thin-lens approximation, and find that the redshift of the perturber is zint=1.22+0.11−0.11 (with a main lens redshift of z=0.881). Our analysis indicates that this structure is more massive than the previous result by more than an order of magnitude. This constitutes the first dark perturber shown to be a line-of-sight halo with a gravitational lensing method.

Robust and Provably Motonic Networks
Ouail Kitouni, Niklas Nolte, Mike Williams
Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021) Proceedings, [ arXiv:2112.00038 ]

Abstract The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor decays in the LHCb realtime data-processing system.

New limits on the light dark matter: proton cross section from the cosmic large-scale structure
Keir K. Rogers, Cora Dvorkin, Hiranya V. Peiris
Physical Review Letters, 2022, Volume 128, Article 171301 [ arXiv:2111.10386 ]

Abstract We set the strongest limits to-date on the velocity-independent dark matter (DM) - proton cross section σ for DM masses m=10keV to 100GeV, using large-scale structure traced by the Lyman-alpha forest: e.g., a 95% lower limit σ<6×10−30cm2, for m=100keV. Our results complement direct detection, which has limited sensitivity to sub-GeV DM. We use an emulator of cosmological simulations, combined with data from the smallest cosmological scales used to-date, to model and search for the imprint of primordial DM-proton collisions. Cosmological bounds are improved by up to a factor of 25.

A neural simulation-based inference approach for characterizing the Galactic Center γ-ray excess
Siddharth Mishra-Sharma, Kyle Cranmer
Physical Review D, 2922, Volume 105, Article 063017 [ arXiv:2110.06931 ]

Abstract The nature of the Fermi gamma-ray Galactic Center Excess (GCE) has remained a persistent mystery for over a decade. Although the excess is broadly compatible with emission expected due to dark matter annihilation, an explanation in terms of a population of unresolved astrophysical point sources e.g., millisecond pulsars, remains viable. The effort to uncover the origin of the GCE is hampered in particular by an incomplete understanding of diffuse emission of Galactic origin. This can lead to spurious features that make it difficult to robustly differentiate smooth emission, as expected for a dark matter origin, from more "clumpy" emission expected for a population of relatively bright, unresolved point sources. We use recent advancements in the field of simulation-based inference, in particular density estimation techniques using normalizing flows, in order to characterize the contribution of modeled components, including unresolved point source populations, to the GCE. Compared to traditional techniques based on the statistical distribution of photon counts, our machine learning-based method is able to utilize more of the information contained in a given model of the Galactic Center emission, and in particular can perform posterior parameter estimation while accounting for pixel-to-pixel spatial correlations in the gamma-ray map. This makes the method demonstrably more resilient to certain forms of model misspecification. On application to Fermi data, the method generically attributes a smaller fraction of the GCE flux to unresolved point sources when compared to traditional approaches. We nevertheless infer such a contribution to make up a non-negligible fraction of the GCE across all analysis variations considered, with at least 38+9−19% of the excess attributed to unresolved points sources in our baseline analysis.

A Deep-learning Approach for Live Anomaly Detection of Extragalactic Transients
Ashley Villar, Miles Cranmer, Edo Berger, Gabriella Contardo, Shirley Ho, Griffin Hosseinzadeh, Joshua Yao-Yu Lin
The Astrophysical Journal Supplement Series, Volume 255 [ ]

Abstract The Laser Interferometer Gravitational-Wave Observatory (LIGO) and Virgo Interferometer Collaborations have now detected all three classes of compact binary mergers: binary black hole (BBH), binary neutron star (BNS), and neutron star-black hole (NSBH). For coalescences involving neutron stars, the simultaneous observation of gravitational and electromagnetic radiation produced by an event, has broader potential to enhance our understanding of these events, and also to probe the equation of state (EOS) of dense matter. However, electromagnetic follow-up to gravitational wave (GW) events requires rapid real-time detection and classification of GW signals, and conventional detection approaches are computationally prohibitive for the anticipated rate of detection of next-generation GW detectors. In this work, we present the first deep learning based results of classification of GW signals from NSBH mergers in extit{real} LIGO data. We show for the first time that a deep neural network can successfully distinguish all three classes of compact binary mergers and separate them from detector noise. Specifically, we train a convolutional neural network (CNN) on ∼500,000 data samples of real LIGO noise with injected BBH, BNS, and NSBH GW signals, and we show that our network has high sensitivity and accuracy. Most importantly, we successfully recover the two confirmed NSBH events to-date (GW200105 and GW200115) and the two confirmed BNS mergers to-date (GW170817 and GW190425), together with ≈90% of all BBH candidate events from the third Gravitational Wave Transient Catalog, GWTC-3. These results are an important step towards low-latency real-time GW detection, enabling multi-messenger astronomy.

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider
T. Aarrestad, M. Van Beekveld, M. Bona, A. Bovenin, S. Caron, J. Davies, A. De Simone, C. Doglioni, J.M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek, M. Pierini, B. Ravina, R. Ruiz de Austri, S. Sekmen, M. Touranakou, M. Vaškevičiūte, R. Vilalta, J.-R. Vlimant, R. Verheyen, M. White, E. Wulff, E. Wallin, K.A. Wozniak, Z. Zhang
SciPost Physics, 2022, Volume 12, Issue 1, Page 43 [ arXiv:2105.14027 | code ]

Abstract We describe the outcome of a data challenge conducted as part of the Dark Machines initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of > 1 Billion simulated LHC events corresponding to 10 fb−1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC
Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig, Manuel Blanco Valentin, Javier Duarte, Cristian Gingu, Philip Harris, James Hirschauer, Martin Kwok, Vladimir Loncar, Yingyi Luo, Llovizna Miranda, Jennifer Ngadiuba, Daniel Noonan, Seda Ogrenci-Memik, Maurizio Pierini, Sioni Summers, Nhan Tran
IEEE Transactions on Nuclear Science, 2021, Vol. 68, Issue 8 [ arXiv:2105.01683 ]

Abstract Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications.

Towards Designing and Exploiting Generative Networks for Neutrino Physics Experiments using Liquid Argon Time Projection Chambers
Paul Lutkus, Taritree Wongjirad, Schuchin Aeron
Conference paper at ICLR 2021 [ | code ]

Abstract In this paper, we show that a hybrid approach to generative modeling via combin- ing the decoder from an autoencoder together with an explicit generative model for the latent space is a promising method for producing images of particle tra- jectories in a liquid argon time projection chamber (LArTPC). LArTPCs are a type of particle physics detector used by several current and future experiments focused on studies of the neutrino. We implement a Vector-Quantized Variational Autoencoder (VQ-VAE) and PixelCNN which produces images with LArTPC- like features and introduce a method to evaluate the quality of the images using a semantic segmentation that identifies important physics-based features.

Machine Learning the 6th Dimension: Stellar Radial Velocities from 5D Phase-Space Correlations
Adriana Dropulic, Bryan Ostdiek, Laura J. Chang, Hongwan Liu, Timothy Cohen, and Mariangela Lisanti
The Astrophysical Journal Letters, 2021, 915, L14 [ arXiv:2103.14039 ]

Abstract The Gaia satellite will observe the positions and velocities of over a billion Milky Way stars. In the early data releases, the majority of observed stars do not have complete 6D phase-space information. In this Letter, we demonstrate the ability to infer the missing line-of-sight velocities until more spectroscopic observations become available. We utilize a novel neural network architecture that, after being trained on a subset of data with complete phase-space information, takes in a star's 5D astrometry (angular coordinates, proper motions, and parallax) and outputs a predicted line-of-sight velocity with an associated uncertainty. Working with a mock Gaia catalog, we show that the network can successfully recover the distributions and correlations of each velocity component for stars that fall within ∼5 kpc of the Sun. We also demonstrate that the network can accurately reconstruct the velocity distribution of a kinematic substructure in the stellar halo that is spatially uniform, even when it comprises a small fraction of the total star count.

The Luminous and Double-Peaked Type Ic Supernova 2019stc: Evidence for Multiple Energy Sources
Sebastian Gomez, Edo Berger, Griffin Hosseinzadeh, Peter K. Blanchard, Matt Nicholl, V. Ashley Villar
The Astrophysical Journal, 2021, Vol. 913, Article 143 [ arXiv:2103.02611 ]

Abstract

The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics
Gregor Kasieczka (ed), Benjamin Nachman (ed), David Shih (ed), Oz Amram, Anders Andreassen, Kees Benkendorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H. Collins, Biwei Dai, Felipe F. De Freitas, Barry M. Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D. A. Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F. Kamenik, Charanjit K. Khosa, Patrick Komiske, Luc Le Pottier, Pablo Martín-Ramiro, Andrej Matevc, Eric Metodiev, Vinicius Mikuni, Inês Ochoa, Sang Eon Park, Maurizio Pierini, Dylan Rankin, Veronica Sanz, Nilai Sarda, Urous Seljak, Aleks Smolkovic, George Stein, Cristina Mantilla Suarez, Manuel Szewc, Jesse Thaler, Steven Tsan, Silviu-Marian Udrescu, Louis Vaslin, Jean-Roch Vlimant, Daniel Williams, Mikaeel Yunus
Reports on Progress in Physics, 2021, Volume 84, Number 12 [ arXiv:2101.08320 ]

Abstract A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.

E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once
Benjamin Nachman and Jesse Thaler
Physical Review D, 2021, Vol. 103, Issue 11, Article 116013 [ arXiv:2101.07263 | code ]

Abstract There have been a number of recent proposals to enhance the performance of machine learning strategies for collider physics by combining many distinct events into a single ensemble feature. To evaluate the efficacy of these proposals, we study the connection between single-event classifiers and multi-event classifiers under the assumption that collider events are independent and identically distributed (IID). We show how one can build optimal multi-event classifiers from single-event classifiers, and we also show how to construct multi-event classifiers such that they produce optimal single-event classifiers. This is illustrated for a Gaussian example as well as for classification tasks relevant for searches and measurements at the Large Hadron Collider. We extend our discussion to regression tasks by showing how they can be phrased in terms of parametrized classifiers. Empirically, we find that training a single-event (per-instance) classifier is more effective than training a multi-event (per-ensemble) classifier, as least for the cases we studied, and we relate this fact to properties of the loss function gradient in the two cases. While we did not identify a clear benefit from using multi-event classifiers in the collider context, we speculate on the potential value of these methods in cases involving only approximate independence, as relevant for jet substructure studies.

Fast convolutional neural networks on FPGAs with hls4ml
Thea Aarrestad, Vladimir Loncar, Nicolò Ghielmetti, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Kevin Pedro, Nhan Tran, Mia Liu, Edward Kreinar, Zhenbin Wu, Duc Hoang
Machine Learning Science and Technology, 2021, Volume 2, Issue 4, Article 045015 [ arXiv:2101.05108 ]

Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of 5μs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-Star Mergers in Real LIGO Data using Deep Learning
Plamen G. Krastev, Kiranjyot Gill, V. Ashley Villar, Edo Berger
Physics Letters B, 2021, Vol. 815, Article 136161 [ arXiv:2012.13101 ]

Abstract One of the key challenges of real-time detection and parameter estimation of gravitational waves from compact binary mergers is the computational cost of conventional matched-filtering and Bayesian inference approaches. In particular, the application of these methods to the full signal parameter space available to the gravitational-wave detectors, and/or real-time parameter estimation is computationally prohibitive. On the other hand, rapid detection and inference are critical for prompt follow-up of the electromagnetic and astro-particle counterparts accompanying important transients, such as binary neutron-star and black-hole neutron-star mergers. Training deep neural networks to identify specific signals and learn a computationally efficient representation of the mapping between gravitational-wave signals and their parameters allows both detection and inference to be done quickly and reliably, with high sensitivity and accuracy. In this work we apply a deep-learning approach to rapidly identify and characterize transient gravitational-wave signals from binary neutron-star mergers in real LIGO data. We show for the first time that artificial neural networks can promptly detect and characterize binary neutron star gravitational-wave signals in real LIGO data, and distinguish them from noise and signals from coalescing black-hole binaries. We illustrate this key result by demonstrating that our deep-learning framework classifies correctly all gravitational-wave events from the Gravitational-Wave Transient Catalog, GWTC-1 [Phys. Rev. X 9 (2019), 031040]. These results emphasize the importance of using realistic gravitational-wave detector data in machine learning approaches, and represent a step towards achieving real-time detection and inference of gravitational waves.

Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge
Sang Eon Park, Dylan Rankin, Silviu-Marian Udrescu, Mikaeel Yunus, Philip Harris
Journal of High Energy Physics, 2021, Article 30 [ arXiv:2011.03550 | code ]

Abstract Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow.

Enhancing searches for resonances with machine learning and moment decomposition
Ouail Kitouni, Benjamin Nachman, Constantin Weisser, and Mike Williams
Journal of High Energy Physics, 2021, Article 70 [ arXiv:2010.09745 | code ]

Abstract A key challenge in searches for resonant new physics is that classifiers trained to enhance potential signals must not induce localized structures. Such structures could result in a false signal when the background is estimated from data using sideband methods. A variety of techniques have been developed to construct classifiers which are independent from the resonant feature (often a mass). Such strategies are sufficient to avoid localized structures, but are not necessary. We develop a new set of tools using a novel moment loss function (Moment Decomposition or MoDe) which relax the assumption of independence without creating structures in the background. By allowing classifiers to be more flexible, we enhance the sensitivity to new physics without compromising the fidelity of the background estimation.

Foundational AI

Pre-prints

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo
[ arXiv:2404.01413 ]

Abstract The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops discovered that such loops can lead to model collapse, a phenomenon where performance progressively degrades with each model-fitting iteration until the latest model becomes useless. However, several recent papers studying model collapse assumed that new data replace old data over time rather than assuming data accumulate over time. In this paper, we compare these two settings and show that accumulating data prevents model collapse. We begin by studying an analytically tractable setup in which a sequence of linear models are fit to the previous models predictions. Previous work showed if data are replaced, the test error increases linearly with the number of model-fitting iterations; we extend this result by proving that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations. We next empirically test whether accumulating data similarly prevents model collapse by pretraining sequences of language models on text corpora. We confirm that replacing data does indeed cause model collapse, then demonstrate that accumulating data prevents model collapse; these results hold across a range of model sizes, architectures and hyperparameters. We further show that similar results hold for other deep generative models on real data: diffusion models for molecule generation and variational autoencoders for image generation. Our work provides consistent theoretical and empirical evidence that data accumulation mitigates model collapse.

The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts
[ arXiv:2403.17887 ]

Abstract We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.

FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, William T. Freeman
[ arXiv:2403.10516 ]

Abstract Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.

A Resource Model For Neural Scaling Law
Jinyeop Song, Ziming Liu, Max Tegmark, Jeff Gore
[ arXiv:2402.05164 ]

Abstract Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.

Opening the AI black box: program synthesis via mechanistic interpretability
Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark
[ arXiv:2402.05110 ]

Abstract We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.

Generating Interpretable Networks using Hypernetworks
Isaac Liao, Ziming Liu, Max Tegmark
[ arXiv:2312.03051 ]

Abstract An essential goal in mechanistic interpretability to decode a network, i.e., to convert a neural network's raw weights to an interpretable algorithm. Given the difficulty of the decoding problem, progress has been made to understand the easier encoding problem, i.e., to convert an interpretable algorithm into network weights. Previous works focus on encoding existing algorithms into networks, which are interpretable by definition. However, focusing on encoding limits the possibility of discovering new algorithms that humans have never stumbled upon, but that are nevertheless interpretable. In this work, we explore the possibility of using hypernetworks to generate interpretable networks whose underlying algorithms are not yet known. The hypernetwork is carefully designed such that it can control network complexity, leading to a diverse family of interpretable algorithms ranked by their complexity. All of them are interpretable in hindsight, although some of them are less intuitive to humans, hence providing new insights regarding how to 'think' like a neural network. For the task of computing L1 norms, hypernetworks find three algorithms: (a) the double-sided algorithm, (b) the convexity algorithm, (c) the pudding algorithm, although only the first algorithm was expected by the authors before experiments. We automatically classify these algorithms and analyze how these algorithmic phases develop during training, as well as how they are affected by complexity control. Furthermore, we show that a trained hypernetwork can correctly construct models for input dimensions not seen in training, demonstrating systematic generalization.

One-step Diffusion with Distribution Matching Distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park
[ arXiv:2311.18828 ]

Abstract Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.

Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation
Ameya Daigavane, Song Kim, Mario Geiger, Tess Smidt
[ arXiv:2311.16199 ]

Abstract We present Symphony, an E(3)-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree E(3)-equivariant features. This allows a novel representation of probability distributions via spherical harmonic signals to efficiently model the 3D geometry of molecules. We show that Symphony is able to accurately generate small molecules from the QM9 dataset, outperforming existing autoregressive models and approaching the performance of diffusion models.

Pairing-based graph neural network for simulating quantum materials
Di Luo, David D. Dai, Liang Fu
[ arXiv:2311.02143 ]

Abstract We introduce a pairing-based graph neural network, GemiNet, for simulating quantum many-body systems. Our architecture augments a BCS mean-field wavefunction with a generalized pair amplitude parameterized by a graph neural network. Variational Monte Carlo with GemiNet simultaneously provides an accurate, flexible, and scalable method for simulating many-electron systems. We apply GemiNet to two-dimensional semiconductor electron-hole bilayers and obtain highly accurate results on a variety of interaction-induced phases, including the exciton Bose-Einstein condensate, electron-hole superconductor, and bilayer Wigner crystal. Our study demonstrates the potential of physically-motivated neural network wavefunctions for quantum materials simulations.

Learning to See Physical Properties with Active Sensing Motor Policies
Gabriel B. Margolis, Xiang Fu, Yandong Ji, Pulkit Agrawal
[ arXiv:2311.01405 ]

Abstract Knowledge of terrain's physical properties inferred from color images can aid in making efficient robotic locomotion plans. However, unlike image classification, it is unintuitive for humans to label image patches with physical properties. Without labeled data, building a vision system that takes as input the observed terrain and predicts physical properties remains challenging. We present a method that overcomes this challenge by self-supervised labeling of images captured by robots during real-world traversal with physical property estimators trained in simulation. To ensure accurate labeling, we introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters. For instance, the quadruped robot learns to swipe its foot against the ground to estimate the friction coefficient accurately. We show that the visual system trained with a small amount of real-world traversal data accurately predicts physical parameters. The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.

Growing Brains: Co-emergence of Anatomical and Functional Modularity in Recurrent Neural Networks
Ziming Liu, Mikail Khona, Ila R. Fiete, Max Tegmark
[ arXiv:2310.07711 ]

Abstract Recurrent neural networks (RNNs) trained on compositional tasks can exhibit functional modularity, in which neurons can be clustered by activity similarity and participation in shared computational subtasks. Unlike brains, these RNNs do not exhibit anatomical modularity, in which functional clustering is correlated with strong recurrent coupling and spatial localization of functional clusters. Contrasting with functional modularity, which can be ephemerally dependent on the input, anatomically modular networks form a robust substrate for solving the same subtasks in the future. To examine whether it is possible to grow brain-like anatomical modularity, we apply a recent machine learning method, brain-inspired modular training (BIMT), to a network being trained to solve a set of compositional cognitive tasks. We find that functional and anatomical clustering emerge together, such that functionally similar neurons also become spatially localized and interconnected. Moreover, compared to standard L1 or no regularization settings, the model exhibits superior performance by optimally balancing task performance and network sparsity. In addition to achieving brain-like organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks, Max Tegmark
[ arXiv:2310.06824 ]

Abstract Large Language Models (LLMs) have impressive capabilities, but are also prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual issues. In this work, we curate high-quality datasets of true/false statements and use them to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1. Visualizations of LLM true/false statement representations, which reveal clear linear structure. 2. Transfer experiments in which probes trained on one dataset generalize to different datasets. 3. Causal evidence obtained by surgically intervening in a LLM's forward pass, causing it to treat false statements as true and vice versa. Overall, we present evidence that language models linearly represent the truth or falsehood of factual statements. We also introduce a novel technique, mass-mean probing, which generalizes better and is more causally implicated in model outputs than other probing techniques.

Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu, Ziqian Zhong, Max Tegmark
[ arXiv:2310.05918 ]

Abstract We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. To do so, we define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although the L2 norm has been a popular choice for characterizing model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while L2 cannot. (2) In the compression phase, LMN has linear relations with test losses, while L2 is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while L2 does not. Besides explaining grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
Rui Wang, Elyssa Hofgard, Han Gao, Robin Walters, Tess E. Smidt
[ arXiv:2310.02299 ]

Abstract Modeling symmetry breaking is essential for understanding the fundamental changes in the behaviors and properties of physical systems, from microscopic particle interactions to macroscopic phenomena like fluid dynamics and cosmic structures. Thus, identifying sources of asymmetry is an important tool for understanding physical systems. In this paper, we focus on learning asymmetries of data using relaxed group convolutions. We provide both theoretical and empirical evidence that this flexible convolution technique allows the model to maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in various physical systems. We employ various relaxed group convolution architectures to uncover various symmetry-breaking factors that are interpretable and physically meaningful in different physical systems, including the phase transition of crystal structure, the isotropy and homogeneity breaking in turbulent flow, and the time-reversal symmetry breaking in pendulum systems.

Language Models Represent Space and Time
Wes Gurnee, Max Tegmark
[ arXiv:2310.02207 ]

Abstract The capabilities of large language models (LLMs) have sparked debate over whether such systems just learn an enormous collection of superficial statistics or a set of more coherent and grounded representations that reflect the real world. We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. These representations are robust to prompting variations and unified across different entity types (e.g. cities and landmarks). In addition, we identify individual 'space neurons' and 'time neurons' that reliably encode spatial and temporal coordinates. While further investigation is needed, our results suggest modern LLMs learn rich spatiotemporal representations of the real world and possess basic ingredients of a world model.

A Neural Scaling Law from Lottery Ticket Ensembling
Ziming Liu, Max Tegmark
[ arXiv:2310.02258 ]

Abstract Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as N−α, α=4/d, where N is the number of model parameters, and d is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem y=x2 manifests a different scaling law (α=1) from their predictions (α=4). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more 'lottery tickets', which are ensembled to reduce the variance of outputs. We support the ensembling mechanism by mechanistically interpreting single neural networks, as well as studying them statistically. We attribute the N−1 scaling law to the 'central limit theorem' of lottery tickets. Finally, we discuss its potential implications for large language models and statistical physics-type theories of learning.

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola
[ arXiv:2308.07931 ]

Abstract Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.

Polarization Multi-Image Synthesis with Birefringent Metasurfaces
Dean Hazineh, Soon Wei Daniel Lim, Qi Guo, Federico Capasso, Todd Zickler
[ arXiv:2307.08106 ]

Abstract Optical metasurfaces composed of precisely engineered nanostructures have gained significant attention for their ability to manipulate light and implement distinct functionalities based on the properties of the incident field. Computational imaging systems have started harnessing this capability to produce sets of coded measurements that benefit certain tasks when paired with digital post-processing. Inspired by these works, we introduce a new system that uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure. We apply this system to the task of incoherent opto-electronic filtering, where digital spatial-filtering operations are replaced by simpler, per-pixel sums across the four polarization channels, independent of the spatial filter size. In contrast to previous work on incoherent opto-electronic filtering that can realize only one spatial filter, our approach can realize a continuous family of filters from a single capture, with filters being selected from the family by adjusting the post-capture digital summation weights. To find a metasurface that can realize a set of user-specified spatial filters, we introduce a form of gradient descent with a novel regularizer that encourages light efficiency and a high signal-to-noise ratio. We demonstrate several examples in simulation and with fabricated prototypes, including some with spatial filters that have prescribed variations with respect to depth and wavelength.

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong, Ziming Liu, Max Tegmark, Jacob Andreas
[ arXiv:2306.17844 ]

Abstract Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

Discovering New Interpretable Conservation Laws as Sparse Invariants
Ziming Liu, Patrick Obin Sturm, Saketh Bharadwaj, Sam Silva, Max Tegmark
[ arXiv:2305.19525 ]

Abstract Discovering conservation laws for a given dynamical system is important but challenging. In a theorist setup (differential equations and basis functions are both known), we propose the Sparse Invariant Detector (SID), an algorithm that auto-discovers conservation laws from differential equations. Its algorithmic simplicity allows robustness and interpretability of the discovered conserved quantities. We show that SID is able to rediscover known and even discover new conservation laws in a variety of systems. For two examples in fluid mechanics and atmospheric chemistry, SID discovers 14 and 3 conserved quantities, respectively, where only 12 and 2 were previously known to domain experts.

Dynamic Sparse Training with Structured Sparsity
Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou
[ arXiv:2305.02299 ]

Abstract Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically cheaper to train, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work, we propose a sparse-to-sparse DST method to learn a variant of structured N:M sparsity by imposing a constant fan-in constraint. We demonstrate with both a theoretical analysis and empirical results: state-of-the-art spare-to-sparse structured DST performance on a variety of network architectures, a condensed representation with a reduced parameter and memory footprint, and reduced inference time compared to dense models with a naive PyTorch CPU implementation of the condensed representation.

Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu, Eric Gan, Max Tegmark
[ arXiv:2305.08746 ]

Abstract We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structures in symbolic formulas, interpretable decision boundaries and features for classification, and mathematical structure in algorithmic datasets. The ability to directly see modules with the naked eye can complement current mechanistic interpretability strategies such as probes, interventions or staring at all weights.

Constraint of pionless EFT using two-nucleon spectra from lattice QCD
William Detmold, Fernando Romero-López, Phiala E. Shanahan
[ arXiv:2305.06313 ]

Abstract Finite-volume pionless effective field theory (FVEFTπ/) at next-to-leading order (NLO) is used to analyze the two-nucleon lattice QCD spectrum of Ref.~\cite{Amarasinghe:2021lqa}, performed at quark masses corresponding to a pion mass of approximately 800 MeV. Specifically, the effective theory is formulated in finite volume, and variational sets of wave functions are optimized using differential programming. Using these wave functions projected to the appropriate finite-volume symmetry group, variational bounds from FVEFTπ/ are obtained for the ground state, as well as excited states. By comparison with the lattice QCD GEVP spectrum, different low energy constants (LECs) are constrained. Relativistic corrections are incorporated, allowing for the extractions of NLO LECs, as well as the leading s-d-wave mixing term in the deuteron channel.

GenPhys: From Physical Processes to Generative Models
Ziming Liu, Di Luo, Yilun Xu, Tommi Jaakkola, Max Tegmark
[ arXiv:2304.02637 ]

Abstract Since diffusion models (DM) and the more recent Poisson flow generative models (PFGM) are inspired by physical processes, it is reasonable to ask: Can physical processes offer additional new generative models? We show that the answer is yes. We introduce a general family, Generative Models from Physical Processes (GenPhys), where we translate partial differential equations (PDEs) describing physical processes to generative models. We show that generative models can be constructed from s-generative PDEs (s for smooth). GenPhys subsume the two existing generative models (DM and PFGM) and even give rise to new families of generative models, e.g., "Yukawa Generative Models" inspired from weak interactions. On the other hand, some physical processes by default do not belong to the GenPhys family, e.g., the wave equation and the Schrödinger equation, but could be made into the GenPhys family with some modifications. Our goal with GenPhys is to explore and expand the design space of generative models.

DribbleBot: Dynamic Legged Manipulation in the Wild
Yandong Ji, Gabriel B. Margolis, Pulkit Agrawal
[ arXiv:2304.01159 ]

Abstract DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.

Neural Volumetric Memory for Visual Locomotion Control
Ruihan Yang, Ge Yang, Xiaolong Wang
[ arXiv:2304.01201 ]

Abstract Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more naïve methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene.

Noisy dynamical systems evolve error correcting codes and modularity
Trevor McCourt, Ila R. Fiete, Isaac L. Chuang
[ arXiv:2303.14448 ]

Abstract There is an intimate connection between life (as we know it) and fault tolerance. Despite residing in the stochastic and complex physical world, biological systems execute functions that allow them to survive and thrive by maintaining their physical integrity. At the same time, biological systems are strikingly modular: parts responsible for different functions tend to be physically separate and easily distinguishable. In this work, through experiments in Boolean networks, we show that the simultaneous presence of fault tolerance and modularity in biological systems is no coincidence. Rather, it is a typical co-occurrence in dynamic systems undergoing adaptive evolution in noisy environments. From this, we deduce the principle of error correction enhanced evolvability: systems possessing error-correcting codes are more effectively improved by evolution than those without. Noise creates the evolutionary pressure to develop initial error-correcting codes, suggesting it plays a larger role in the evolution of complex structures than previously thought.

The Quantization Model of Neural Scaling
Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark
[ arXiv:2303.13506 ]

Abstract We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where learned network capabilities are quantized into discrete chunks (quanta). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model internals, we auto-discover diverse model capabilities (quanta) and find tentative evidence that the distribution over corresponding subproblems in the prediction of natural text is compatible with the power law predicted from the neural scaling exponent as predicted from our theory.

Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries
Charlotte Loh, Seungwook Han, Shivchander Sudalairaj, Rumen Dangovski, Kai Xu, Florian Wenzel, Marin Soljacic, Akash Srivastava
[ arXiv:2303.02484 ]

Abstract Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. We leverage recent advances in contrastive representation learning to create models that separately capture opposing hypotheses of invariant and equivariant functional classes and present a simple ensembling approach to efficiently combine appropriate hypotheses for a given task. We show that MSE effectively captures the multiplicity of conflicting hypotheses that is often required in large, diverse datasets like ImageNet. As a result of their inherent diversity, MSE improves classification performance, uncertainty quantification, and generalization across a series of transfer tasks.

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models
Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi Jaakkola
[ arXiv:2302.04265 ]

Abstract We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for N dimensional data by embedding paths in N+D dimensional space while still controlling the progression with a simple scalar norm of the D additional variables. The new models reduce to PFGM when D=1 and to diffusion models when D→∞. The flexibility of choosing D allows us to trade off robustness against rigidity as increasing D results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of D, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models (D→∞) to any finite D values. Our experiments show that models with finite D can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ 64×64 datasets, with FID scores of 1.91/2.43 when D=2048/128. In addition, we demonstrate that models with smaller D exhibit improved robustness against modeling errors. Code is available at this https URL

Learning Silhouettes with Group Sparse Autoencoders
Emmanouil Theodosis and Demba Ba
Harvard CRISP Preprint [ ]

Abstract Sparse coding has been extensively used in neuroscience to model brain-like computation by drawing analogues between neurons’ firing activity and the nonzero elements of sparse vectors. Contemporary deep learning architectures have been used to model neural activity, inspired by signal processing algorithms; however sparse coding architectures are not able to explain the higher-order categorization that has been em- pirically observed at the neural level. In this work, we pro- pose a novel model-based architecture, termed group-sprase autoencoder, that produces sparse activity patterns in line with neural modeling, but showcases a higher-level order in its ac- tivation maps. We evaluate a dense model of our architecture on MNIST and CIFAR-10 and show that it learns dictionar- ies that resemble silhouettes of the given class, while its ac- tivations have a significantly higher level order compared to sparse architectures.

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior
Gabriel B Margolis, Pulkit Agrawal
[ arXiv:2212.03238 ]

Abstract Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more.

Learning Integrable Dynamics with Action-Angle Networks
Ameya Daigavane, Arthur Kosmala, Miles Cranmer, Tess Smidt, Shirley Ho
[ arXiv:2211.15338 ]

Abstract Machine learning has become increasingly popular for efficiently modelling the dynamics of complex physical systems, demonstrating a capability to learn effective models for dynamics which ignore redundant degrees of freedom. Learned simulators typically predict the evolution of the system in a step-by-step manner with numerical integration techniques. However, such models often suffer from instability over long roll-outs due to the accumulation of both estimation and integration error at each prediction step. Here, we propose an alternative construction for learned physical simulators that are inspired by the concept of action-angle coordinates from classical mechanics for describing integrable systems. We propose Action-Angle Networks, which learn a nonlinear transformation from input coordinates to the action-angle space, where evolution of the system is linear. Unlike traditional learned simulators, Action-Angle Networks do not employ any higher-order numerical integration methods, making them extremely efficient at modelling the dynamics of integrable physical systems.

A Solvable Model of Neural Scaling Laws
Alexander Maloney, Daniel A. Roberts, James Sully
[ arXiv:2210.16859 ]

Abstract Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such scaling laws to arise and then propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. By solving this model in the dual limit of large training set size and large number of parameters, we gain insight into (i) the statistical structure of datasets and tasks that lead to scaling laws, (ii) the way nonlinear feature maps, such as those provided by neural networks, enable scaling laws when trained on these datasets, (iii) the optimality of the equiparameterization scaling of training sets and parameters, and (iv) whether such scaling laws can break down and how they behave when they do. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data's spectral power law causes the model's performance to plateau.

Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu, Eric J. Michaud, Max Tegmark
[ arXiv:2210.01117 ]

Abstract Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning.

Poisson Flow Generative Models
Yilun Xu, Ziming Liu, Max Tegmark, Tommi Jaakkola
[ arXiv:2209.11178 | code ]

Abstract We propose a new "Poisson flow" generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the z=0 hyperplane in a space augmented with an additional dimension z, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the z=0 plane transforms into a distribution on the hemisphere of radius r that becomes uniform in the r→∞ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the z reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of 9.68 and a FID score of 2.48. It also performs on par with the state-of-the-art SDE approaches while offering 10× to 20× acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method.

Bounding generalization error with input compression: An empirical study with infinite-width networks
Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor
[ arXiv:2207.09408 ]

Abstract Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

Towards Understanding Grokking: An Effective Theory of Representation Learning
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams
[ arXiv:2205.10343 ]

Abstract We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a 'Goldilocks zone' (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of 'intelligence from starvation' in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

Rapid Locomotion via Reinforcement Learning
Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal
[ arXiv:2205.02824 ]

Abstract Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/.

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yin, Yoon Kim, James Glass
[ arXiv:2204.10298 ]

Abstract We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance of equivariant contrastive learning (Dangovski et al., 2021), which generalizes contrastive learning and learns representations that are insensitive to certain types of augmentations and sensitive to other "harmful" types of augmentations. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods, outperforming unsupervised SimCSE by 2.3 absolute points on semantic textual similarity tasks.

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman
[ arXiv:2203.08414 ]

Abstract Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to separate feature learning from cluster compactification. Empirically, we show that current unsupervised feature learning frameworks already generate dense features whose correlations are semantically consistent. This observation motivates us to design STEGO (Self-supervised Transformer with Energy-based Graph Optimization), a novel framework that distills unsupervised features into high-quality discrete semantic labels. At the core of STEGO is a novel contrastive loss function that encourages features to form compact clusters while preserving their relationships across the corpora. STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff (+14 mIoU) and Cityscapes (+9 mIoU) semantic segmentation challenges.

Biological error correction codes generate fault-tolerant neural networks
Alexander Zlokapa, Andrew K. Tan, John M. Martyn, Max Tegmark, Isaac L. Chuang
[ arXiv:2202.12887 ]

Abstract It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the mammalian cortex, analog error correction codes known as grid codes have been observed to protect states against neural spiking noise, but their role in information processing is unclear. Here, we use these biological codes to show that a universal fault-tolerant neural network can be achieved if the faultiness of each neuron lies below a sharp threshold, which we find coincides in order of magnitude with noise observed in biological neurons. The discovery of a sharp phase transition from faulty to fault-tolerant neural computation opens a path towards understanding noisy analog systems in artificial intelligence and neuroscience.

Cracking the Quantum Scaling Limit with Machine Learned Electron Densities
Joshua A. Rackers, Lucas Tecot, Mario Geiger, Tess E. Smidt
[ arXiv:2201.03726 ]

Abstract A long-standing goal of science is to accurately solve the Schrödinger equation for large molecular systems. The poor scaling of current quantum chemistry algorithms on classical computers imposes an effective limit of about a few dozen atoms for which we can calculate molecular electronic structure. We present a machine learning (ML) method to break through this scaling limit and make quantum chemistry calculations of very large systems possible. We show that Euclidean Neural Networks can be trained to predict the electron density with high fidelity from limited data. Learning the electron density allows us to train a machine learning model on small systems and make accurate predictions on large ones. We show that this ML electron density model can break through the quantum scaling limit and calculate the electron density of systems of thousands of atoms with quantum accuracy.

Invariance Through Latent Alignment
Takuma Yoneda, Ge Yang, Matthew R. Walter, Bradly Stadie
[ arXiv:2112.08526 ]

Abstract A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time. In many cases, however, it is impossible to know test-time distribution-shift a priori, making these schemes infeasible. In this paper, we introduce a general approach, called Invariance Through Latent Alignment (ILA), that improves the test-time performance of a visuomotor control policy in deployment environments with unknown perceptual variations. ILA performs unsupervised adaptation at deployment-time by matching the distribution of latent features on the target domain to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of challenging adaptation scenarios, including changes in lighting conditions, the content in the scene, and camera poses. We present results on calibrated control benchmarks in simulation -- the distractor control suite -- and a physical robot under a sim-to-real setup.

Equivariant Contrastive Learning
Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić
[ arXiv:2111.00899 ]

Abstract In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according to the way the inputs transform. Here, we show that rather than using only invariance, pre-training that encourages non-trivial equivariance to some transformations, while maintaining invariance to other transformations, can be used to improve the semantic quality of representations. Specifically, we extend popular SSL methods to a more general framework which we name Equivariant Self-Supervised Learning (E-SSL). In E-SSL, a simple additional pre-training objective encourages equivariance by predicting the transformations applied to the input. We demonstrate E-SSL's effectiveness empirically on several popular computer vision benchmarks. Furthermore, we demonstrate usefulness of E-SSL for applications beyond computer vision; in particular, we show its utility on regression problems in photonics science. We will release our code.

Physics-Augmented Learning: A New Paradigm Beyond Physics-Informed Learning
Ziming Liu, Yunyue Chen, Yuanqi Du, Max Tegmark
[ arXiv:2109.13901 ]

Abstract Integrating physical inductive biases into machine learning can improve model generalizability. We generalize the successful paradigm of physics-informed learning (PIL) into a more general framework that also includes what we term physics-augmented learning (PAL). PIL and PAL complement each other by handling discriminative and generative properties, respectively. In numerical experiments, we show that PAL performs well on examples where PIL is inapplicable or inefficient.

What You Can Learn by Staring at a Blank Wall
Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand
[ arXiv:2108.13027 ]

Abstract We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of ≈94% for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality.

Toward Automatic Interpretation of 3D Plots
Laura E. Brandt, William T. Freeman
[ arXiv:2106.07627 ]

Abstract This paper explores the challenge of teaching a machine how to reverse-engineer the grid-marked surfaces used to represent data in 3D surface plots of two-variable functions. These are common in scientific and economic publications; and humans can often interpret them with ease, quickly gleaning general shape and curvature information from the simple collection of curves. While machines have no such visual intuition, they do have the potential to accurately extract the more detailed quantitative data that guided the surface's construction. We approach this problem by synthesizing a new dataset of 3D grid-marked surfaces (SurfaceGrid) and training a deep neural net to estimate their shape. Our algorithm successfully recovers shape information from synthetic 3D surface plots that have had axes and shading information removed, been rendered with a variety of grid types, and viewed from a range of viewpoints.

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering
Vincent Sitzmann, Semon Rezchikov, William T. Freeman, Joshua B. Tenenbaum, Fredo Durand
[ arXiv:2106.02634 ]

Abstract Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the underlying 3D scene in a 360-degree, four-dimensional light field parameterized via a neural implicit representation. Rendering a ray from an LFN requires only a *single* network evaluation, as opposed to hundreds of evaluations per ray for ray-marching or volumetric based renderers in 3D-structured neural scene representations. In the setting of simple scenes, we leverage meta-learning to learn a prior over LFNs that enables multi-view consistent light field reconstruction from as little as a single image observation. This results in dramatic reductions in time and memory complexity, and enables real-time rendering. The cost of storing a 360-degree light field via an LFN is two orders of magnitude lower than conventional methods such as the Lumigraph. Utilizing the analytical differentiability of neural implicit representations and a novel parameterization of light space, we further demonstrate the extraction of sparse depth maps from LFNs.

Why is AI hard and Physics simple?
Daniel A. Roberts
[ arXiv:2104.00008 ]

Abstract We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.

Deep learning: a statistical viewpoint
Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin
[ arXiv:2103.09177 ]

Abstract The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

On the Minimal Error of Empirical Risk Minimization
Gil Kur, Alexander Rakhlin
[ arXiv:2102.12066 ]

Abstract RWe study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression, both in the random and the fixed design settings. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data. In the fixed design setting, we show that the error is governed by the global complexity of the entire class. In contrast, in random design, ERM may only adapt to simpler models if the local neighborhoods around the regression function are nearly as complex as the class itself, a somewhat counter-intuitive conclusion. We provide sharp lower bounds for performance of ERM for both Donsker and non-Donsker classes. We also discuss our results through the lens of recent studies on interpolation in overparameterized models.

On the convergence of group-sparse autoencoders
Emmanouil Theodosis, Bahareh Tolooshams, Pranay Tankala, Abiy Tasissa, Demba Ba
[ arXiv:2102.07003 ]

Abstract Recent approaches in the theoretical analysis of model-based deep learning architectures have studied the convergence of gradient descent in shallow ReLU networks that arise from generative models whose hidden layers are sparse. Motivated by the success of architectures that impose structured forms of sparsity, we introduce and study a group-sparse autoencoder that accounts for a variety of generative models, and utilizes a group-sparse ReLU activation function to force the non-zero units at a given layer to occur in blocks. For clustering models, inputs that result in the same group of active units belong to the same cluster. We proceed to analyze the gradient dynamics of a shallow instance of the proposed autoencoder, trained with data adhering to a group-sparse generative model. In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix. We validate our model through numerical analysis and highlight the superior performance of networks with a group-sparse ReLU compared to networks that utilize traditional ReLUs, both in sparse coding and in parameter recovery tasks. We also provide real data experiments to corroborate the simulated results, and emphasize the clustering capabilities of structured sparsity models.

Published

Growing Brains in Recurrent Neural Networks for Multiple Cognitive Tasks
Ziming Liu, Mikail Khona, Ila Fiete, Max Tegmark
NeurIPS 2023 Workshop NeurReps [ ]

Abstract Recurrent neural networks (RNNs) trained on a diverse ensemble of cognitive tasks, as described by Yang et al (2019); Khona et al. (2023), have been shown to exhibit functional modularity, where neurons organize into discrete functional clusters, each specialized for specific shared computational subtasks. However, these RNNs do not demonstrate anatomical modularity, where these functionally specialized clusters also have a distinct spatial organization. This contrasts with the human brain which has both functional and anatomical modularity. Is there a way to train RNNs to make them more like brains in this regard? We apply a recent machine learning method, brain-inspired modular training (BIMT), to encourage neural connectivity to be local in space. Consequently, hidden neuron organization of the RNN forms spatial structures reminiscent of those of the brain: spatial clusters which correspond to functional clusters. Compared to standard regularization and absence of regularization, BIMT exhibits superior performance by optimally balancing between task performance and sparsity. This balance is quantified both in terms of the number of active neurons and the cumulative wiring length. In addition to achieving brain-like organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.

Mitigating Confirmation Bias in Semi-supervised Learning via Efficient Bayesian Model Averaging
Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava
Transactions on Machine Learning Research 2023, Submission number 1013 [ | code ]

Abstract State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data, often via self-training or pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and may result in confirmation bias where the model reinforces its own mistakes. In this work, we show that SOTA SSL methods often suffer from confirmation bias and demonstrate that this is often a result of using a poorly calibrated classifier for pseudo labeling. We introduce BaM-SSL, an efficient Bayesian Model averaging technique that improves uncertainty quantification in SSL methods with limited computational or memory overhead. We demonstrate that BaM-SSL mitigates confirmation bias in SOTA SSL methods across standard vision benchmarks of CIFAR-10, CIFAR-100, giving up to 16% improvement in test accuracy on the CIFAR-100 with 400 labels benchmark. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science.

Machine Learning for Quantum-Enhanced Gravitational-Wave Observatories
Chris Whittle, Ge Yang, Matthew Evans, Lisa Barsotti
Physical Review D, Volume 108, Article 043034 [ arXiv:2305.13780 ]

Abstract Machine learning has become an effective tool for processing the extensive data sets produced by large physics experiments. Gravitational-wave detectors are now listening to the universe with quantum-enhanced sensitivity, accomplished with the injection of squeezed vacuum states. Squeezed state preparation and injection is operationally complicated, as well as highly sensitive to environmental fluctuations and variations in the interferometer state. Achieving and maintaining optimal squeezing levels is a challenging problem and will require development of new techniques to reach the lofty targets set by design goals for future observing runs and next-generation detectors. We use machine learning techniques to predict the squeezing level during the third observing run of the Laser Interferometer Gravitational-Wave Observatory (LIGO) based on auxiliary data streams, and offer interpretations of our models to identify and quantify salient sources of squeezing degradation. The development of these techniques lays the groundwork for future efforts to optimize squeezed state injection in gravitational-wave detectors, with the goal of enabling closed-loop control of the squeezer subsystem by an agent based on machine learning.

Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes
Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, Pulkit Agrawal
Science Robotics, 2023, Volume 8, Issue 84 [ arXiv:2211.11744 ]

Abstract In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: this https URL.

Precision Machine Learning
Eric J. Michaud, Ziming Liu, Max Tegmark
Entropy, 2023, 25(1) [ arXiv:2210.13447 ]

Abstract We explore unique considerations involved in fitting ML models to data with very high precision, as is often required for science applications. We empirically compare various function approximation methods and study how they scale with increasing parameters and data. We find that neural networks can often outperform classical approximation methods on high-dimensional examples, by auto-discovering and exploiting modular structures therein. However, neural networks trained with common optimizers are less powerful for low-dimensional cases, which motivates us to study the unique properties of neural network loss landscapes and the corresponding optimization challenges that arise in the high precision regime. To address the optimization issue in low dimensions, we develop training tricks which enable us to train neural networks to extremely low loss, close to the limits allowed by numerical precision.

Michael Zhang, Samuel Kim, Peter Y. Lu, Marin Soljačić
Ryan Abbott, Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Betsy Tian, Julian M. Urban
IEEE Journals, 2023, PubMed ID 37721885 [ arXiv:2207.08945 ]

Abstract Symbolic regression is a machine learning technique that can learn the governing formulas of data and thus has the potential to transform scientific discovery. However, symbolic regression is still limited in the complexity and dimensionality of the systems that it can analyze. Deep learning on the other hand has transformed machine learning in its ability to analyze extremely complex and high-dimensional datasets. We propose a neural network architecture to extend symbolic regression to parametric systems where some coefficient may vary but the structure of the underlying governing equation remains constant. We demonstrate our method on various analytic expressions, ODEs, and PDEs with varying coefficients and show that it extrapolates well outside of the training domain. The neural network-based architecture can also integrate with other deep learning architectures so that it can analyze high-dimensional data while being trained end-to-end. To this end we integrate our architecture with convolutional neural networks to analyze 1D images of varying spring systems.

Toward a more accurate 3D atlas of C. elegans neurons
Michael Skuhersky, Tailin Wu, Eviatar Yemini, Amin Nejatbakhsh, Edward Boyden & Max Tegmark
BMC Bioinformatics, Volume 23, Article 195 [ ]

Abstract Determining cell identity in volumetric images of tagged neuronal nuclei is an ongoing challenge in contemporary neuroscience. Frequently, cell identity is determined by aligning and matching tags to an “atlas” of labeled neuronal positions and other identifying characteristics. Previous analyses of such C. elegans datasets have been hampered by the limited accuracy of such atlases, especially for neurons present in the ventral nerve cord, and also by time-consuming manual elements of the alignment process.

Stable Object Reorientation using Contact Plane Registration
Richard Li, Carlos Esteves, Ameesh Makadia, Pulkit Agrawal
International Conference on Robotics and Automation 2022 [ ]

Abstract We present a system for accurately predicting stable orientations for diverse rigid objects. We propose to overcome the critical issue of modelling multimodality in the space of rotations by using a conditional generative model to accurately classify contact surfaces. Our system is capable of operating from noisy and partially-observed pointcloud observations captured by real world depth cameras. Our method substantially outperforms the current state-of-the-art systems on a simulated stacking task requiring highly accurate rotations, and demonstrates strong sim2real zero-shot transfer results across a variety of unseen objects on a real world reorientation task.

Pareto-optimal clustering with the primal deterministic information bottleneck
Andrew K. Tan, Max Tegmark, Isaac L. Chuang
Entropy, 2022, 24(6) [ arXiv:2204.02489 ]

Abstract At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the Deterministic Information Bottleneck (DIB) formulation of lossy compression, which can be interpreted as a clustering problem. To this end, we introduce the {\it primal} DIB problem, which we show results in a much richer frontier than its previously studied dual counterpart. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to most other two-objective clustering problems. We study general properties of the Pareto frontier, and give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space; and additionally propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

AI Poincaré 2.0: Machine Learning Conservation Laws from Differential Equations
Ziming Liu, Varun Madhavan, Max Tegmark
Physical Review E, 2022, Volume 106, Article 045307 [ arXiv:2203.12610 ]

Abstract We present a machine learning algorithm that discovers conservation laws from differential equations, both numerically (parametrized as neural networks) and symbolically, ensuring their functional independence (a non-linear generalization of linear independence). Our independence module can be viewed as a nonlinear generalization of singular value decomposition. Our method can readily handle inductive biases for conservation laws. We validate it with examples including the 3-body problem, the KdV equation and nonlinear Schrödinger equation.

Categorical Representation Learning and RG flow operators for algorithmic classifiers
Artan Sheshmani, Yizhuang You, Wenbo Fu, Ahmadreza Azizi
Machine Learning Science and Technology, Volume 4, Number 1, Article 015012 [ arXiv:2203.07975 ]

Abstract Following the earlier formalism of the categorical representation learning (arXiv:2103.14770) by the first two authors, we discuss the construction of the RG-flow based categorifier. Borrowing ideas from theory of renormalization group flows (RG) in quantum field theory, holographic duality, and hyperbolic geometry, and mixing them with neural ODE's, we construct a new algorithmic natural language processing (NLP) architecture, called the RG-flow categorifier or for short the RG categorifier, which is capable of data classification and generation in all layers. We apply our algorithmic platform to biomedical data sets and show its performance in the field of sequence-to-function mapping. In particular we apply the RG categorifier to particular genomic sequences of flu viruses and show how our technology is capable of extracting the information from given genomic sequences, find their hidden symmetries and dominant features, classify them and use the trained data to make stochastic prediction of new plausible generated sequences associated with new set of viruses which could avoid the human immune system. The content of the current article is part of the recent US patent application submitted by first two authors (U.S. Patent Application No.: 63/313.504).

Topogivity: A Machine-Learned Chemical Rule for Discovering Topological Materials
Andrew Ma, Yang Zhang, Thomas Christensen, Hoi Chun Po, Li Jing, Liang Fu, Marin Soljačić
American Chemical Society Publications [ arXiv:https://arxiv.org/abs/2202.05255 ]

Abstract Topological materials present unconventional electronic properties that make them attractive for both basic science and next-generation technological applications. The majority of currently-known topological materials have been discovered using methods that involve symmetry-based analysis of the quantum wavefunction. Here we use machine learning to develop a simple-to-use heuristic chemical rule that diagnoses with a high accuracy whether a material is topological using only its chemical formula. This heuristic rule is based on a notion that we term topogivity, a machine-learned numerical value for each element that loosely captures its tendency to form topological materials. We next implement a high-throughput strategy for discovering topological materials based on the heuristic topogivity-rule prediction followed by ab initio validation. This way, we discover new topological materials that are not diagnosable using symmetry indicators, including several that may be promising for experimental observation.

Neural Descriptor Fields: SE(3) Equivariant Object Representations for Manipulation
Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
International Conference on Robotics and Automation 2022 [ arXiv:2112.05124 | code ]

Abstract We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition [ arXiv:2112.03907 ]

Abstract Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing.

Mixture Model Auto-Encoders: Deep Clustering through Dictionary Learning
Alexander Lin, Andrew H. Song, Demba Ba
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3368-3372 [ arXiv:2110.04683 ]

Abstract State-of-the-art approaches for clustering high-dimensional data utilize deep auto-encoder architectures. Many of these networks require a large number of parameters and suffer from a lack of interpretability, due to the black-box nature of the auto-encoders. We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. Derived from the perspective of sparse dictionary learning and mixture models, MixMate comprises several auto-encoders, each tasked with reconstructing data in a distinct cluster, while enforcing sparsity in the latent space. Through experiments on various image datasets, we show that MixMate achieves competitive performance compared to state-of-the-art deep clustering algorithms, while using orders of magnitude fewer parameters.

Overcoming the Spectral Bias of Neural Value Approximation
Ge Yang, Anurag Ajay, Pulkit Agrawal
ICLR 2022 Conference Proceedings [ arXiv:2206.04672 ]

Abstract Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a \textit{spectral bias}, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD(0)'s estimation bias on a few tasks. Code and analysis available at https://geyang.github.io/ffn.

Machine-Learning media bias
Samantha D’Alonzo, Max Tegmark
PLOS ONE, 2022, Volume 17, Issue 8, Article e0271947 [ arXiv:2109.00024 ]

Abstract We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.

Discovering Sparse Interpretable Dynamics from Partial Observations
Peter Y. Lu, Joan Ariño, Marin Soljačić
Communications Physics, 2022, Vol 5, Article 206 [ arXiv:2107.10879 ]

Abstract Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. We propose a machine learning framework for discovering these governing equations using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. Our tests show that this method can successfully reconstruct the full system state and identify the underlying dynamics for a variety of ODE and PDE systems.

QuanTaichi: A Compiler for Quantized Simulations
Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, William Freeman, Fredo Durand
ACM Transactions on Graphics, Volume 4, Article 182 [ ]

Abstract High-resolution simulations can deliver great visual quality, but they are often limited by available memory, especially on GPUs. We present a compiler for physical simulation that can achieve both high performance and significantly reduced memory costs, by enabling flexible and aggressive quantization. Low-precision ("quantized") numerical data types are used and packed to represent simulation states, leading to reduced memory space and bandwidth consumption. Quantized simulation allows higher resolution simulation with less memory, which is especially attractive on GPUs. Implementing a quantized simulator that has high performance and packs the data tightly for aggressive storage reduction would be extremely labor-intensive and error-prone using a traditional programming language. To make the creation of quantized simulation practical, we have developed a new set of language abstractions and a compilation system. A suite of tailored domain-specific optimizations ensure quantized simulators often run as fast as the full-precision simulators, despite the overhead of encoding-decoding the packed quantized data types. Our programming language and compiler, based on Taichi, allow developers to effortlessly switch between different full-precision and quantized simulators, to explore the full design space of quantization schemes, and ultimately to achieve a good balance between space and precision. The creation of quantized simulation with our system has large benefits in terms of memory consumption and performance, on a variety of hardware, from mobile devices to workstations with high-end GPUs. We can simulate with levels of resolution that were previously only achievable on systems with much more memory, such as multiple GPUs. For example, on a single GPU, we can simulate a Game of Life with 20 billion cells (8x compression per pixel), an Eulerian fluid system with 421 million active voxels (1.6x compression per voxel), and a hybrid Eulerian-Lagrangian elastic object simulation with 235 million particles (1.7x compression per particle). At the same time, quantized simulations create physically plausible results. Our quantization techniques are complementary to existing acceleration approaches of physical simulation: they can be used in combination with these existing approaches, such as sparse data structures, for even higher scalability and performance.

Learning Task Informed Abstractions
Xiang Fu, Ge Yang, Pulkit Agrawal, Tommi Jaakkola
Proceedings of the 38th International Conference on Machine Learning, 2021, PMLR 139 [ arXiv:2106.15612 | code ]

Abstract Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.

The Principles of Deep Learning Theory
Daniel A. Roberts, Sho Yaida, Boris Hanin
Cambridge University Press (Book), 2022 [ arXiv:2106.10165 ]

Abstract This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Covariance-Free Sparse Bayesian Learning
Alexander Lin, Andrew H. Song, Berkin Bilgic, Demba Ba
IEEE Transactions on Signal Processing, volume 70 [ arXiv:2105.10439 ]

Abstract Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.

Scalable and Flexible Deep Bayesian Optimization with Auxiliary Information for Scientific Problems
Samuel Kim, Peter Y. Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, Marin Soljačić
Transactions on Machine Learning Research, September 2022 [ arXiv:2104.11667 ]

Abstract Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely black-box. The data may have some known structure, e.g. symmetries, and the data generation process can yield useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and struggle to incorporate known structure or auxiliary information. Instead, we propose performing BO on complex, structured problems by using Bayesian Neural Networks (BNNs), a class of scalable surrogate models that have the representation power and flexibility to handle structured data and exploit auxiliary information. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that BNNs often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost.

Field of Junctions: Extracting Boundary Structure at Low SNR
Dor Verbin, Todd Zickler
IEEE/CVF International Conference on Computer Vision, 2021 [ arXiv:2011.13866 ]

Abstract We introduce a bottom-up model for simultaneously finding many boundary elements in an image, including contours, corners and junctions. The model explains boundary shape in each small patch using a 'generalized M-junction' comprising M angles and a freely-moving vertex. Images are analyzed using non-convex optimization to cooperatively find M+2 junction values at every location, with spatial consistency being enforced by a novel regularizer that reduces curvature while preserving corners and junctions. The resulting 'field of junctions' is simultaneously a contour detector, corner/junction detector, and boundary-aware smoothing of regional appearance. Notably, its unified analysis of contours, corners, junctions and uniform regions allows it to succeed at high noise levels, where other methods for segmentation and boundary detection fail.

AI Feynman: a Physics-Inspired Method for Symbolic Regression
Silviu-Marian Udrescu, Max Tegmark
Sciences Advances, 2020, 6:easy2631 [ arXiv:1905.11481 ]

Abstract A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.