IAIFI Papers

View high energy physics IAIFI papers on INSPIRE

Neural Embedding: Learning the Embedding of the Manifold of Physics Data
Sang Eon Park, Philip Harris, Bryan Ostdiek
[ arXiv:2208.05484 ]

Abstract In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.

Bounding generalization error with input compression: An empirical study with infinite-width networks
Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor
[ arXiv:2207.09408 ]

Abstract Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

Strong Lensing Source Reconstruction Using Continuous Neural Fields
Siddharth Mishra-Sharma, Ge Yang
[ arXiv:2206.14820 ]

Abstract From the nature of dark matter to the rate of expansion of our Universe, observations of distant galaxies distorted through strong gravitational lensing have the potential to answer some of the major open questions in astrophysics. Modeling galaxy-galaxy strong lensing observations presents a number of challenges as the exact configuration of both the background source and foreground lens galaxy is unknown. A timely call, prompted by a number of upcoming surveys anticipating high-resolution lensing images, demands methods that can efficiently model lenses at their full complexity. In this work, we introduce a method that uses continuous neural fields to non-parametrically reconstruct the complex morphology of a source galaxy while simultaneously inferring a distribution over foreground lens galaxy configurations. We demonstrate the efficacy of our method through experiments on simulated data targeting high-resolution lensing images similar to those anticipated in near-future astrophysical surveys.

The Dark Energy Camera Plane Survey 2 (DECaPS2): More Sky, Less Bias, and Better Uncertainties
A. K. Saydjari, E. F. Schlafly, D. Lang, A. M. Meisner, G. M. Green, C. Zucker, I. Zelko, J. S. Speagle, T. Daylan, A. Lee, F. Valdes, D. Schlegel, D. P. Finkbeiner
[ arXiv:2206.11909 ]

Abstract Deep optical and near-infrared imaging of the entire Galactic plane is essential for understanding our Galaxy's stars, gas, and dust. The second data release of the DECam Plane Survey (DECaPS2) extends the five-band optical and near-infrared survey of the southern Galactic plane to cover 6.5% of the sky, |b| < 10° and 6° > l > -124°, complementary to coverage by Pan-STARRS1. Typical single-exposure effective depths, including crowding effects and other complications, are 23.5, 22.6, 22.1, 21.6, and 20.8 mag in g, r, i, z, and Y bands, respectively, with around 1 arcsecond seeing. The survey comprises 3.32 billion objects built from 34 billion detections in 21.4 thousand exposures, totaling 260 hours open shutter time on the Dark Energy Camera (DECam) at Cerro Tololo. The data reduction pipeline features several improvements, including the addition of synthetic source injection tests to validate photometric solutions across the entire survey footprint. A convenient functional form for the detection bias in the faint limit was derived and leveraged to characterize the photometric pipeline performance. A new post-processing technique was applied to every detection to de-bias and improve uncertainty estimates of the flux in the presence of structured backgrounds, specifically targeting nebulosity. The images and source catalogs are publicly available at this http URL: http://decaps.skymaps.info/

Simplifying Polylogarithms with Machine Learning
Aurélien Dersy, Matthew D. Schwartz, Xiaoyuan Zhang
[ arXiv:2206.04115 ]

Abstract Polylogrithmic functions, such as the logarithm or dilogarithm, satisfy a number of algebraic identities. For the logarithm, all the identities follow from the product rule. For the dilogarithm and higher-weight classical polylogarithms, the identities can involve five functions or more. In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from Feynman integrals. Although the initial expressions resulting from the integration usually simplify, it is often difficult to know which identities to apply and in what order. To address this bottleneck, we explore to what extent machine learning methods can help. We consider both a reinforcement learning approach, where the identities are analogous to moves in a game, and a transformer network approach, where the problem is viewed analogously to a language-translation task. While both methods are effective, the transformer network appears more powerful and holds promise for practical use in symbolic manipulation tasks in mathematical physics.

Stable Object Reorientation using Contact Plane Registration
Richard Li, Carlos Esteves, Ameesh Makadia, Pulkit Agrawal
International Conference on Robotics and Automation 2022 [ ]

Abstract We present a system for accurately predicting stable orientations for diverse rigid objects. We propose to overcome the critical issue of modelling multimodality in the space of rotations by using a conditional generative model to accurately classify contact surfaces. Our system is capable of operating from noisy and partially-observed pointcloud observations captured by real world depth cameras. Our method substantially outperforms the current state-of-the-art systems on a simulated stacking task requiring highly accurate rotations, and demonstrates strong sim2real zero-shot transfer results across a variety of unseen objects on a real world reorientation task.

Revealing the Milky Way’s Most Recent Major Merger with a Gaia EDR3 Catalog of Machine-Learned Line-of-Sight Velocities
Adriana Dropulic, Hongwan Liu, Bryan Ostdiek, Mariangela Lisanti
[ arXiv:2205.12278 ]

Abstract Machine learning can play a powerful role in inferring missing line-of-sight velocities from astrometry in surveys such as Gaia. In this paper, we apply a neural network to Gaia Early Data Release 3 (EDR3) and obtain line-of-sight velocities and associated uncertainties for ~92 million stars. The network, which takes as input a star's parallax, angular coordinates, and proper motions, is trained and validated on ~6.4 million stars in Gaia with complete phase-space information. The network's uncertainty on its velocity prediction is a key aspect of its design; by properly convolving these uncertainties with the inferred velocities, we obtain accurate stellar kinematic distributions. As a first science application, we use the new network-completed catalog to identify candidate stars that belong to the Milky Way's most recent major merger, Gaia-Sausage-Enceladus (GSE). We present the kinematic, energy, angular momentum, and spatial distributions of the ~450,000 GSE candidates in this sample, and also study the chemical abundances of those with cross matches to GALAH and APOGEE. The network's predictive power will only continue to improve with future Gaia data releases as the training set of stars with complete phase-space information grows. This work provides a first demonstration of how to use machine learning to exploit high-dimensional correlations on data to infer line-of-sight velocities, and offers a template for how to train, validate and apply such a neural network when complete observational data is not available.

Towards Understanding Grokking: An Effective Theory of Representation Learning
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams
[ arXiv:2205.10343 ]

Abstract We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a 'Goldilocks zone' (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of 'intelligence from starvation' in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

Power Counting Energy Flow Polynomials
Pedro Cal, Jesse Thaler, Wouter J. Waalewijn
[ arXiv:2205.06818 ]

Abstract Power counting is a systematic strategy for organizing collider observables and their associated theoretical calculations. In this paper, we use power counting to characterize a class of jet substructure observables called energy flow polynomials (EFPs). EFPs provide an overcomplete linear basis for infrared-and-collinear safe jet observables, but it is known that in practice, a small subset of EFPs is often sufficient for specific jet analysis tasks. By applying power counting arguments, we obtain linear relationships between EFPs that hold for quark and gluon jets to a specific order in the power counting. We test these relations in the parton shower generator Pythia, finding excellent agreement. Power counting allows us to truncate the basis of EFPs without affecting performance, which we corroborate through a study of quark-gluon tagging and regression.

Bias and Priors in Machine Learning Calibrations for High Energy Physics
Rikab Gambhir, Benjamin Nachman, Jesse Thaler
[ arXiv:2205.05084 ]

Abstract Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.

Disentangling Quarks and Gluons with CMS Open Data
Patrick T. Komiske, Serhii Kryhin, Jesse Thaler
[ arXiv:2205.04459 ]

Abstract We study quark and gluon jets separately using public collider data from the CMS experiment. Our analysis is based on 2.3/fb of proton-proton collisions at 7 TeV, collected at the Large Hadron Collider in 2011. We define two non-overlapping samples via a pseudorapidity cut -- central jets with |eta| < 0.65 and forward jets with |eta| > 0.65 -- and employ jet topic modeling to extract individual distributions for the maximally separable categories. Under certain assumptions, such as sample independence and mutual irreducibility, these categories correspond to "quark" and "gluon" jets, as given by a recently proposed operational definition. We consider a number of different methods for extracting reducibility factors from the central and forward datasets, from which the fractions of quark jets in each sample can be determined. The greatest stability and robustness to statistical uncertainties is achieved by a novel method based on parametrizing the endpoints of a receiver operating characteristic (ROC) curve. To mitigate detector effects, which would otherwise induce unphysical differences between central and forward jets, we use the OmniFold method to perform central value unfolding. As a demonstration of the power of this method, we extract the intrinsic dimensionality of the quark and gluon jet samples, which exhibit Casimir scaling, as expected from the strongly-ordered limit. To our knowledge, this work is the first application of full phase space unfolding to real collider data, and one of the first applications of topic modeling to extract separate quark and gluon distributions at the LHC.

Learning Uncertainties the Frequentist Way: Calibration and Correlation in High Energy Physics
Rikab Gambhir, Benjamin Nachman, Jesse Thaler
[ arXiv:2205.03413 ]

Abstract Calibration is a common experimental physics problem, whose goal is to infer the value and uncertainty of an unobservable quantity Z given a measured quantity X. Additionally, one would like to quantify the extent to which X and Z are correlated. In this paper, we present a machine learning framework for performing frequentist maximum likelihood inference with Gaussian uncertainty estimation, which also quantifies the mutual information between the unobservable and measured quantities. This framework uses the Donsker-Varadhan representation of the Kullback-Leibler divergence -- parametrized with a novel Gaussian Ansatz -- to enable a simultaneous extraction of the maximum likelihood values, uncertainties, and mutual information in a single training. We demonstrate our framework by extracting jet energy corrections and resolution factors from a simulation of the CMS detector at the Large Hadron Collider. By leveraging the high-dimensional feature space inside jets, we improve upon the nominal CMS jet resolution by upwards of 15%.

Rapid Locomotion via Reinforcement Learning
Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal
[ arXiv:2205.02824 ]

Abstract Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/.

Going Beyond the Galaxy Power Spectrum: an Analysis of BOSS Data with Wavelet Scattering Transforms
Georgios Valogiannis, Cora Dvorkin
[ arXiv:2204.13717 ]

Abstract We perform the first application of the wavelet scattering transform (WST) on actual galaxy observations, through a WST analysis of the BOSS DR12 CMASS dataset. We lay out the detailed procedure on how to capture all necessary layers of realism for an application on data obtained from a spectroscopic survey, including the effects of redshift-space anisotropy, non-trivial survey geometry, the shortcomings of the dataset through a set of systematic weights and the Alcock-Paczynski distortion effect. In order to capture the cosmological dependence of the WST, we use galaxy mocks obtained from the state-of-the-art ABACUSSUMMIT simulations, tuned to match the anisotropic correlation function of the BOSS CMASS sample in the redshift range 0.46<z<0.60. Using our theory model for the WST coefficients, as well as for the first 2 multipoles of the galaxy power spectrum, that we use as reference, we perform a likelihood analysis of the CMASS data and obtain the posterior probability distributions of 4 cosmological parameters, {ωbc,ns8}, as well as the Hubble constant, derived from a fixed value of the angular size of the sound horizon at last scattering measured by the Planck satellite, all of which are marginalized over the 7 nuisance parameters of the Halo Occupation Distribution model. The WST is found to deliver a substantial improvement in the values of the predicted 1σ errors compared to the regular power spectrum, which are tighter by a factor in the range 3−6 in the case of flat and uninformative priors and by a factor of 4−28, when a Big Bang Nucleosynthesis prior is applied on the value of ωb. Furthermore, in the latter case, we obtain a 0.6% measurement of the Hubble constant. Our results are investigative and subject to certain approximations in our analysis, that we discuss in the text.

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yin, Yoon Kim, James Glass
[ arXiv:2204.10298 ]

Abstract We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance of equivariant contrastive learning (Dangovski et al., 2021), which generalizes contrastive learning and learns representations that are insensitive to certain types of augmentations and sensitive to other "harmful" types of augmentations. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods, outperforming unsupervised SimCSE by 2.3 absolute points on semantic textual similarity tasks.

Photometrically-Classified Superluminous Supernovae from the Pan-STARRS1 Medium Deep Survey: A Case Study for Science with Machine Learning-Based Classification
Brian Hsu, Griffin Hosseinzadeh, V. Ashley Villar, Edo Berger
[ arXiv:2204.09809 ]

Abstract With the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), it is expected that only ∼0.1% of all transients will be classified spectroscopically. To conduct studies of rare transients, such as Type I superluminous supernovae (SLSNe), we must instead rely on photometric classification. In this vein, here we carry out a pilot study of SLSNe from the Pan-STARRS1 Medium-Deep Survey (PS1-MDS) classified photometrically with our SuperRAENN and Superphot algorithms. We first construct a sub-sample of the photometric sample using a list of simple selection metrics designed to minimize contamination and ensure sufficient data quality for modeling. We then fit the multi-band light curves with a magnetar spin-down model using the Modular Open-Source Fitter for Transients (MOSFiT). Comparing the magnetar engine and ejecta parameter distributions of the photometric sample to those of the PS1-MDS spectroscopic sample and a larger literature spectroscopic sample, we find that these samples are overall consistent, but that the photometric sample extends to slower spins and lower ejecta masses, which correspond to lower luminosity events, as expected for photometric selection. While our PS1-MDS photometric sample is still smaller than the overall SLSN spectroscopic sample, our methodology paves the way to an orders-of-magnitude increase in the SLSN sample in the LSST era through photometric selection and study.

Luminous Supernovae: Unveiling a Population Between Superluminous and Normal Core-collapse Supernovae
Sebastian Gomez, Edo Berger, Matt Nicholl, Peter K. Blanchard, Griffin Hosseinzadeh
[ arXiv:2204.08486 ]

Abstract Stripped-envelope core-collapse supernovae can be divided into two broad classes: the common Type Ib/c supernovae (SNe Ib/c), powered by the radioactive decay of 56Ni, and the rare superluminous supernovae (SLSNe), most likely powered by the spin-down of a magnetar central engine. Up to now, the intermediate regime between these two populations has remained mostly unexplored. Here, we present a comprehensive study of 40 extit{luminous supernovae} (LSNe), SNe with peak magnitudes of Mr=−19 to −20 mag, bound by SLSNe on the bright end and by SNe Ib/c on the dim end. Spectroscopically, LSNe appear to form a continuum between Type Ic SNe and SLSNe. Given their intermediate nature, we model the light curves of all LSNe using a combined magnetar plus radioactive decay model and find that they are indeed intermediate, not only in terms of their peak luminosity and spectra, but also in their rise times, power sources, and physical parameters. We sub-classify LSNe into distinct groups that are either as fast-evolving as SNe Ib/c or as slow-evolving as SLSNe, and appear to be either radioactively or magnetar powered, respectively. Our findings indicate that LSNe are powered by either an over-abundant production of 56Ni or by weak magnetar engines, and may serve as the missing link between the two populations.

Pareto-optimal clustering with the primal deterministic information bottleneck
Andrew K. Tan, Max Tegmark, Isaac L. Chuang
Entropy, 2022, 24(6) [ arXiv:2204.02489 ]

Abstract At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the Deterministic Information Bottleneck (DIB) formulation of lossy compression, which can be interpreted as a clustering problem. To this end, we introduce the {\it primal} DIB problem, which we show results in a much richer frontier than its previously studied dual counterpart. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to most other two-objective clustering problems. We study general properties of the Pareto frontier, and give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space; and additionally propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

AI Poincaré 2.0: Machine Learning Conservation Laws from Differential Equations
Ziming Liu, Varun Madhavan, Max Tegmark
[ arXiv:2203.12610 ]

Abstract We present a machine learning algorithm that discovers conservation laws from differential equations, both numerically (parametrized as neural networks) and symbolically, ensuring their functional independence (a non-linear generalization of linear independence). Our independence module can be viewed as a nonlinear generalization of singular value decomposition. Our method can readily handle inductive biases for conservation laws. We validate it with examples including the 3-body problem, the KdV equation and nonlinear Schrödinger equation.

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman
[ arXiv:2203.08414 ]

Abstract Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to separate feature learning from cluster compactification. Empirically, we show that current unsupervised feature learning frameworks already generate dense features whose correlations are semantically consistent. This observation motivates us to design STEGO (Self-supervised Transformer with Energy-based Graph Optimization), a novel framework that distills unsupervised features into high-quality discrete semantic labels. At the core of STEGO is a novel contrastive loss function that encourages features to form compact clusters while preserving their relationships across the corpora. STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff (+14 mIoU) and Cityscapes (+9 mIoU) semantic segmentation challenges.

Categorical Representation Learning and RG flow operators for algorithmic classifiers
Artan Sheshmani, Yizhuang You, Wenbo Fu, Ahmadreza Azizi
[ arXiv:2203.07975 ]

Abstract Following the earlier formalism of the categorical representation learning (arXiv:2103.14770) by the first two authors, we discuss the construction of the RG-flow based categorifier. Borrowing ideas from theory of renormalization group flows (RG) in quantum field theory, holographic duality, and hyperbolic geometry, and mixing them with neural ODE's, we construct a new algorithmic natural language processing (NLP) architecture, called the RG-flow categorifier or for short the RG categorifier, which is capable of data classification and generation in all layers. We apply our algorithmic platform to biomedical data sets and show its performance in the field of sequence-to-function mapping. In particular we apply the RG categorifier to particular genomic sequences of flu viruses and show how our technology is capable of extracting the information from given genomic sequences, find their hidden symmetries and dominant features, classify them and use the trained data to make stochastic prediction of new plausible generated sequences associated with new set of viruses which could avoid the human immune system. The content of the current article is part of the recent US patent application submitted by first two authors (U.S. Patent Application No.: 63/313.504).

Creating Simple, Interpretable Anomaly Detectors for New Physics in Jet Substructure
Layne Bradshaw, Spencer Chang, Bryan Ostdiek
[ arXiv:2203.01343 ]

Abstract Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett this http URL, which maps out the physical observables learned by a neural network classifier, to the case of anomaly detection. We propose two different strategies that use a small number of high-level observables to mimic the decisions made by the autoencoder on background events. Despite the underlying differences in their approach, we find that both strategies have similar ordering performance as the autoencoder and independently use the same five high-level observables. From there, we compare the performance of these networks as anomaly detectors. We find that both strategies perform similarly to the autoencoder across a variety of signals, giving a nontrivial demonstration that learning to order background events transfers to ordering a variety of signal events.

Biological error correction codes generate fault-tolerant neural networks
Alexander Zlokapa, Andrew K. Tan, John M. Martyn, Max Tegmark, Isaac L. Chuang
[ arXiv:2202.12887 ]

Abstract It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the mammalian cortex, analog error correction codes known as grid codes have been observed to protect states against neural spiking noise, but their role in information processing is unclear. Here, we use these biological codes to show that a universal fault-tolerant neural network can be achieved if the faultiness of each neuron lies below a sharp threshold, which we find coincides in order of magnitude with noise observed in biological neurons. The discovery of a sharp phase transition from faulty to fault-tolerant neural computation opens a path towards understanding noisy analog systems in artificial intelligence and neuroscience.

Flow-based sampling in the lattice Schwinger model at criticality
Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban
[ arXiv:2202.11712 ]

Abstract Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.

Topogivity: A Machine-Learned Chemical Rule for Discovering Topological Materials
Andrew Ma, Yang Zhang, Thomas Christensen, Hoi Chun Po, Li Jing, Liang Fu, Marin Soljačić
[ arXiv:2202.05255 ]

Abstract Topological materials present unconventional electronic properties that make them attractive for both basic science and next-generation technological applications. The majority of currently-known topological materials have been discovered using methods that involve symmetry-based analysis of the quantum wavefunction. Here we use machine learning to develop a simple-to-use heuristic chemical rule that diagnoses with a high accuracy whether a material is topological using only its chemical formula. This heuristic rule is based on a notion that we term topogivity, a machine-learned numerical value for each element that loosely captures its tendency to form topological materials. We next implement a high-throughput strategy for discovering topological materials based on the heuristic topogivity-rule prediction followed by ab initio validation. This way, we discover new topological materials that are not diagnosable using symmetry indicators, including several that may be promising for experimental observation.

Finite-Volume Pionless Effective Field Theory for Few-Nucleon Systems with Differentiable Programming
Xiangkai Sun, William Detmold, Di Luo, Phiala E. Shanahan
[ arXiv:2202.03530 ]

Abstract Finite-volume pionless effective field theory provides an efficient framework for the extrapolation of nuclear spectra and matrix elements calculated at finite volume in lattice QCD to infinite volume, and to nuclei with larger atomic number. In this work, it is demonstrated how this framework may be implemented via a set of correlated Gaussian wavefunctions optimised using differentiable programming and via solution of a generalised eigenvalue problem. This approach is shown to be significantly more efficient than a stochastic implementation of the variational method based on the same form of correlated Gaussian wavefunctions, yielding comparably accurate representations of the ground-state wavefunctions with an order of magnitude fewer terms. The efficiency of representation allows such calculations to be extended to larger systems than in previous work. The method is demonstrated through calculations of the binding energies of nuclei with atomic number A∈{2,3,4} in finite volume, matched to lattice QCD calculations at quark masses corresponding to mπ=806 MeV, and infinite-volume effective field theory calculations of A∈{2,3,4,5,6} systems based on this matching.

Constraining the Time of Gravitational Wave Emission from Core-Collapse Supernovae
Kiranjyot Gill, Griffin Hosseinzadeh, Edo Berger, Michele Zanolin, Marek Szczepanczyk
The Astrophysical Journal, 2022, Volume 931, Number 2 [ arXiv:2201.03609 ]

Abstract The advent of sensitive gravitational wave (GW) detectors, coupled with wide-field, high cadence optical time-domain surveys, raises the possibility of the first join GW-electromagnetic (EM) detections of core-collapse supernovae (CCSNe). For targeted searches of Gas from CCSNe optical observation can be used to increase the sensitivity of the search by restricting the relevant time interval, defined here as the GW search window (GSW). The extent of the GSW is a critical factor in determining the achievable false alarm probability (FAP) for a triggered CCSN search. The ability to constrain the GSW from optical observations depends on how early a CCSN is detected, as well as the ability to model the early optical emission. Here we present several approaches to constrain the GSW, ranging in complexity from model-independent analytical fits of the early light curve, model-dependent fits of the rising or entire light curve, and a new data-driven approach using existing well-sampled CCSN light curves from {\it Kepler} and the Transiting Exoplanet Survey Satellite (TESS). We use these approaches to determine the time of core-collapse and its associated uncertainty (i.e., the GSW). We apply our methods to two Type II See that occurred during LIGO/Virgo Observing Run 3: SN\,2019fcn and SN\,2019ejj (both in the same galaxy at d = 15.7 Mac). Our approach shortens the duration of the GSW and improves the robustness of the GSW compared to techniques used in past GW CCSN searches.

Analyzing N-Point Energy Correlators Inside Jets with CMS Open Data
Patrick T. Komiske, Ian Moult, Jesse Thaler, Hua Xing Zhu
[ arXiv:2201.07800 | code ]

Abstract Jets of hadrons produced at high-energy colliders provide experimental access to the dynamics of asymptotically free quarks and gluons and their confinement into hadrons. In this paper, we show that the high energies of the Large Hadron Collider (LHC), together with the exceptional resolution of its detectors, allow multipoint correlation functions of energy flow operators to be directly measured within jets for the first time. Using Open Data from the CMS experiment, we show that reformulating jet substructure in terms of these correlators provides new ways of probing the dynamics of QCD jets, which enables direct imaging of the confining transition to free hadrons as well as precision measurements of the scaling properties and interactions of quarks and gluons. This opens a new era in our understanding of jet substructure and illustrates the immense unexploited potential of high-quality LHC data sets for elucidating the dynamics of QCD.

Photometry on Structured Backgrounds: Local Pixelwise Infilling by Regression
Andrew K. Saydjari, Douglas P. Finkbeiner
[ arXiv:2201.07246 ]

Abstract Photometric pipelines struggle to estimate both the flux and flux uncertainty for stars in the presence of structured backgrounds such as filaments or clouds. However, it is exactly stars in these complex regions that are critical to understanding star formation and the structure of the interstellar medium. We develop a method, similar to Gaussian process regression, which we term local pixelwise infilling (LPI). Using a local covariance estimate, we predict the background behind each star and the uncertainty on that prediction in order to improve estimates of flux and flux uncertainty. We show the validity of our model on synthetic data and real dust fields. We further demonstrate that the method is stable even in the crowded field limit. While we focus on optical-IR photometry, this method is not restricted to those wavelengths. We apply this technique to the 34 billion detections in the second data release of the Dark Energy Camera Plane Survey (DECaPS2). In addition to removing many >3σ outliers and improving uncertainty estimates by a factor of ∼2−3 on nebulous fields, we also show that our method is well-behaved on uncrowded fields. The entirely post-processing nature of our implementation of LPI photometry allows it to easily improve the flux and flux uncertainty estimates of past as well as future surveys.

Cracking the Quantum Scaling Limit with Machine Learned Electron Densities
Joshua A. Rackers, Lucas Tecot, Mario Geiger, Tess E. Smidt
[ arXiv:2201.03726 ]

Abstract A long-standing goal of science is to accurately solve the Schrödinger equation for large molecular systems. The poor scaling of current quantum chemistry algorithms on classical computers imposes an effective limit of about a few dozen atoms for which we can calculate molecular electronic structure. We present a machine learning (ML) method to break through this scaling limit and make quantum chemistry calculations of very large systems possible. We show that Euclidean Neural Networks can be trained to predict the electron density with high fidelity from limited data. Learning the electron density allows us to train a machine learning model on small systems and make accurate predictions on large ones. We show that this ML electron density model can break through the quantum scaling limit and calculate the electron density of systems of thousands of atoms with quantum accuracy.

Impact of Massive Binary Star and Cosmic Evolution on Gravitational Wave Observations II: Double Compact Object Rates and Properties
Floor S. Broekgaarden, Edo Berger, Simon Stevenson, Stephen Justham, Ilya Mandel, Martyna Churślińska, Like A. C. van Son, Tom Wagg, Alejandro Vigna-Gómez, Selma E. De Mink, Debatri Chattopadhyay, Coenraad J. Neijssel
[ arXiv:2112.05763 ]

Abstract Making the most of the rapidly increasing population of gravitational-wave detections of black hole (BH) and neutron star (NS) mergers requires comparing observations with population synthesis predictions. In this work we investigate the combined impact from the key uncertainties in population synthesis modelling of the isolated binary evolution channel: the physical processes in massive binary-star evolution and the star formation history as a function of metallicity, Z, and redshift z,S(Z,z). Considering these uncertainties we create 560 different publicly available model realizations and calculate the rate and distribution characteristics of detectable BHBH, BHNS, and NSNS mergers. We find that our stellar evolution and S(Z,z) variations can impact the predicted intrinsic and detectable merger rates by factors 102-104. We find that BHBH rates are dominantly impacted by S(Z,z) variations, NSNS rates by stellar evolution variations and BHNS rates by both. We then consider the combined impact from all uncertainties considered in this work on the detectable mass distribution shapes (chirp mass, individual masses and mass ratio). We find that the BHNS mass distributions are predominantly impacted by massive binary-star evolution changes. For BHBH and NSNS we find that both uncertainties are important. We also find that the shape of the delay time and birth metallicity distributions are typically dominated by the choice of S(Z,z) for BHBH, BHNS and NSNS. We identify several examples of robust features in the mass distributions predicted by all 560 models, such that we expect more than 95% of BHBH detections to contain a BH ≳8M⊙ and have mass ratios ≲4. Our work demonstrates that it is essential to consider a wide range of allowed models to study double compact object merger rates and properties.

SymmetryGAN: Symmetry Discovery with Deep Learning
Krish Desai, Benjamin Nachman, Jesse Thaler
Physical. Rev. D, 2022, 105:096031 [ arXiv:2112.05722 ]

Abstract What are the symmetries of a dataset? Whereas the symmetries of an individual data element can be characterized by its invariance under various transformations, the symmetries of an ensemble of data elements are ambiguous due to Jacobian factors introduced while changing coordinates. In this paper, we provide a rigorous statistical definition of the symmetries of a dataset, which involves inertial reference densities, in analogy to inertial frames in classical mechanics. We then propose SymmetryGAN as a novel and powerful approach to automatically discover symmetries using a deep learning method based on generative adversarial networks (GANs). When applied to Gaussian examples, SymmetryGAN shows excellent empirical performance, in agreement with expectations from the analytic loss landscape. SymmetryGAN is then applied to simulated dijet events from the Large Hadron Collider (LHC) to demonstrate the potential utility of this method in high energy collider physics applications. Going beyond symmetry discovery, we consider procedures to infer the underlying symmetry group from empirical data.

Neural Descriptor Fields: SE(3) Equivariant Object Representations for Manipulation
Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
[ arXiv:2112.05124 | code ]

Abstract We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.

Building Quantum Field Theories Out of Neurons
James Halverson
[ arXiv:2112.04527 ]

Abstract An approach to field theory is studied in which fields are comprised of N constituent random neurons. Gaussian theories arise in the infinite-N limit when neurons are independently distributed, via the Central Limit Theorem, while interactions arise due to finite-N effects or non-independently distributed neurons. Euclidean-invariant ensembles of neurons are engineered, with tunable two-point function, yielding families of Euclidean-invariant field theories. Some Gaussian, Euclidean invariant theories are reflection positive, which allows for analytic continuation to a Lorentz-invariant quantum field theory. Examples are presented that yield dual theories at infinite-N, but have different symmetries at finite-N. Landscapes of classical field configurations are determined by local maxima of parameter distributions. Predictions arise from mixed field-neuron correlators. Near-Gaussianity is exhibited at large-N, potentially explaining a feature of field theories in Nature.

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan
[ arXiv:2112.03907 ]

Abstract Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing.

Artificial Intelligence and Machine Learning in Nuclear Physics
Amber Boehnlein, Markus Diefenthaler, Cristiano Fanelli, Morten Hjorth-Jensen, Tanja Horn, Michelle P. Kuchera, Dean Lee, Witold Nazarewicz, Kostas Orginos, Peter Ostroumov, Long-Gang Pang, Alan Poon, Nobuo Sato, Malachi Schram, Alexander Scheinker, Michael S. Smith, Xin-Nian Wang, Veronique Ziegler
[ arXiv:2112.02309 ]

Abstract Advances in artificial intelligence/machine learning methods provide tools that have broad applicability in scientific research. These techniques are being applied across the diversity of nuclear physics research topics, leading to advances that will facilitate scientific discoveries and societal applications. This Review gives a snapshot of nuclear physics research which has been transformed by artificial intelligence and machine learning techniques.

Infinite Neural Network Quantum States
Di Luo, James Halverson
[ arXiv:2112.00723 ]

Abstract We study infinite limits of neural network quantum states (∞-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For ∞-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an ∞-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. ∞-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.

Substructure Detection Reanalyzed: Dark Perturber shown to be a Line-of-Sight Halo
Atınç Çağan Şengül, Cora Dvorkin, Bryan Ostdiek, Arthur Tsang
[ arXiv:2112.00749 ]

Abstract Observations of structure at sub-galactic scales are crucial for probing the properties of dark matter, which is the dominant source of gravity in the universe. It will become increasingly important for future surveys to distinguish between line-of-sight halos and subhalos to avoid wrong inferences on the nature of dark matter. We reanalyze a sub-galactic structure (in lens JVAS B1938+666) that has been previously found using the gravitational imaging technique in galaxy-galaxy lensing systems. This structure has been assumed to be a satellite in the halo of the main lens galaxy. We fit the redshift of the perturber of the system as a free parameter, using the multi-plane thin-lens approximation, and find that the redshift of the perturber is zint=1.22+0.11−0.11 (with a main lens redshift of z=0.881). Our analysis indicates that this structure is more massive than the previous result by more than an order of magnitude. This constitutes the first dark perturber shown to be a line-of-sight halo with a gravitational lensing method.

Robust and Provably Motonic Networks
Ouail Kitouni, Niklas Nolte, Mike Williams
[ arXiv:2112.00038 ]

Abstract The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor decays in the LHCb realtime data-processing system.

Quantum reservoir computing using arrays of Rydberg atoms
Rodrigo Araiza Bravo, Khadijeh Najafi, Xun Gao, Susanne F. Yelin
[ arXiv:2111.10956 ]

Abstract Quantum computing promises to provide machine learning with computational advantages. However, noisy intermediate-scale quantum (NISQ) devices pose engineering challenges to realizing quantum machine learning (QML) advantages. Recently, a series of QML computational models inspired by the noise-tolerant dynamics on the brain have emerged as a means to circumvent the hardware limitations of NISQ devices. In this article, we introduce a quantum version of a recurrent neural network (RNN), a well-known model for neural circuits in the brain. Our quantum RNN (qRNN) makes use of the natural Hamiltonian dynamics of an ensemble of interacting spin-1/2 particles as a means for computation. In the limit where the Hamiltonian is diagonal, the qRNN recovers the dynamics of the classical version. Beyond this limit, we observe that the quantum dynamics of the qRNN provide it quantum computational features that can aid it in computation. To this end, we study a qRNN based on arrays of Rydberg atoms, and show that the qRNN is indeed capable of replicating the learning of several cognitive tasks such as multitasking, decision making, and long-term memory by taking advantage of several key features of this platform such as interatomic species interactions, and quantum many-body scars.

New limits on light dark matter: proton cross section from the cosmic large-scale structure
Keir K. Rogers, Cora Dvorkin, Hiranya V. Peiris
[ arXiv:2111.10386 ]

Abstract We set the strongest limits to-date on the velocity-independent dark matter (DM) - proton cross section σ for DM masses m=10keV to 100GeV, using large-scale structure traced by the Lyman-alpha forest: e.g., a 95% lower limit σ<6×10−30cm2, for m=100keV. Our results complement direct detection, which has limited sensitivity to sub-GeV DM. We use an emulator of cosmological simulations, combined with data from the smallest cosmological scales used to-date, to model and search for the imprint of primordial DM-proton collisions. Cosmological bounds are improved by up to a factor of 25.

Equivariant Contrastive Learning
Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić
[ arXiv:2111.00899 ]

Abstract In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according to the way the inputs transform. Here, we show that rather than using only invariance, pre-training that encourages non-trivial equivariance to some transformations, while maintaining invariance to other transformations, can be used to improve the semantic quality of representations. Specifically, we extend popular SSL methods to a more general framework which we name Equivariant Self-Supervised Learning (E-SSL). In E-SSL, a simple additional pre-training objective encourages equivariance by predicting the transformations applied to the input. We demonstrate E-SSL's effectiveness empirically on several popular computer vision benchmarks. Furthermore, we demonstrate usefulness of E-SSL for applications beyond computer vision; in particular, we show its utility on regression problems in photonics science. We will release our code.

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
Charlotte Loh, Thomas Christensen, Rumen Dangovski, Samuel Kim, Marin Soljačić
[ arXiv:2110.08406 ]

Abstract Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

A neural simulation-based inference approach for characterizing the Galactic Center γ-ray excess
Siddharth Mishra-Sharma, Kyle Cranmer
Physical Review D, 2922, Volume 105, Article 063017 [ arXiv:2110.06931 ]

Abstract The nature of the Fermi gamma-ray Galactic Center Excess (GCE) has remained a persistent mystery for over a decade. Although the excess is broadly compatible with emission expected due to dark matter annihilation, an explanation in terms of a population of unresolved astrophysical point sources e.g., millisecond pulsars, remains viable. The effort to uncover the origin of the GCE is hampered in particular by an incomplete understanding of diffuse emission of Galactic origin. This can lead to spurious features that make it difficult to robustly differentiate smooth emission, as expected for a dark matter origin, from more "clumpy" emission expected for a population of relatively bright, unresolved point sources. We use recent advancements in the field of simulation-based inference, in particular density estimation techniques using normalizing flows, in order to characterize the contribution of modeled components, including unresolved point source populations, to the GCE. Compared to traditional techniques based on the statistical distribution of photon counts, our machine learning-based method is able to utilize more of the information contained in a given model of the Galactic Center emission, and in particular can perform posterior parameter estimation while accounting for pixel-to-pixel spatial correlations in the gamma-ray map. This makes the method demonstrably more resilient to certain forms of model misspecification. On application to Fermi data, the method generically attributes a smaller fraction of the GCE flux to unresolved point sources when compared to traditional approaches. We nevertheless infer such a contribution to make up a non-negligible fraction of the GCE across all analysis variations considered, with at least 38+9−19% of the excess attributed to unresolved points sources in our baseline analysis.

Challenges for Unsupervised Anomaly Detection in Particle Physics
Katherine Fraser, Samuel Homiller, Rashmish K. Mishra, Bryan Ostdiek, Matthew D. Schwartz
Journal of High Energy Physics, 2022, Article 66 [ arXiv:2110.06948 ]

Abstract Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and W) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches.

Mixture Model Auto-Encoders: Deep Clustering through Dictionary Learning
Alexander Lin, Andrew H. Song, Demba Ba
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3368-3372 [ arXiv:2110.04683 ]

Abstract State-of-the-art approaches for clustering high-dimensional data utilize deep auto-encoder architectures. Many of these networks require a large number of parameters and suffer from a lack of interpretability, due to the black-box nature of the auto-encoders. We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. Derived from the perspective of sparse dictionary learning and mixture models, MixMate comprises several auto-encoders, each tasked with reconstructing data in a distinct cluster, while enforcing sparsity in the latent space. Through experiments on various image datasets, we show that MixMate achieves competitive performance compared to state-of-the-art deep clustering algorithms, while using orders of magnitude fewer parameters.

Pruning a restricted Boltzmann machine for quantum state reconstruction
Anna Golubeva, Roger G. Melko
Physical Review B, 2022, Volume 105, Article 125124 [ arXiv:2110.03676 ]

Abstract Restricted Boltzmann machines (RBMs) have proven to be a powerful tool for learning quantum wavefunction representations from qubit projective measurement data. Since the number of classical parameters needed to encode a quantum wavefunction scales rapidly with the number of qubits, the ability to learn efficient representations is of critical importance. In this paper we study magnitude-based pruning as a way to compress the wavefunction representation in an RBM, focusing on RBMs trained on data from the transverse-field Ising model in one dimension. We find that pruning can reduce the total number of RBM weights, but the threshold at which the reconstruction accuracy starts to degrade varies significantly depending on the phase of the model. In a gapped region of the phase diagram, the RBM admits pruning over half of the weights while still accurately reproducing relevant physical observables. At the quantum critical point however, even a small amount of pruning can lead to significant loss of accuracy in the physical properties of the reconstructed quantum state. Our results highlight the importance of tracking all relevant observables as their sensitivity varies strongly with pruning. Finally, we find that sparse RBMs are trainable and discuss how a successful sparsity pattern can be created without pruning.

Inferring dark matter substructure with astrometric lensing beyond the power spectrum
Siddharth Mishra-Sharma
[ arXiv:2110.01620 ]

Abstract Astrometry -- the precise measurement of positions and motions of celestial objects -- has emerged as a promising avenue for characterizing the dark matter population in our Galaxy. By leveraging recent advances in simulation-based inference and neural network architectures, we introduce a novel method to search for global dark matter-induced gravitational lensing signatures in astrometric datasets. Our method based on neural likelihood-ratio estimation shows significantly enhanced sensitivity to a cold dark matter population and more favorable scaling with measurement noise compared to existing approaches based on two-point correlation statistics, establishing machine learning as a powerful tool for characterizing dark matter using astrometric data.

Physics-Augmented Learning: A New Paradigm Beyond Physics-Informed Learning
Ziming Liu, Yunyue Chen, Yuanqi Du, Max Tegmark
[ arXiv:2109.13901 ]

Abstract Integrating physical inductive biases into machine learning can improve model generalizability. We generalize the successful paradigm of physics-informed learning (PIL) into a more general framework that also includes what we term physics-augmented learning (PAL). PIL and PAL complement each other by handling discriminative and generative properties, respectively. In numerical experiments, we show that PAL performs well on examples where PIL is inapplicable or inefficient.

Overcoming the Spectral Bias of Neural Value Approximation
Ge Yang, Anurag Ajay, Pulkit Agrawal
ICLR 2022 Conference Proceedings [ arXiv:2206.04672 ]

Abstract Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a \textit{spectral bias}, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD(0)'s estimation bias on a few tasks. Code and analysis available at https://geyang.github.io/ffn.

Machine-learning hidden symmetries
Ziming Liu, Max Tegmark
Physical Review Letters, 2022, 128, 180201 [ arXiv:2109.09721 ]

Abstract We present an automated method for finding hidden symmetries, defined as symmetries that become manifest only in a new coordinate system that must be discovered. Its core idea is to quantify asymmetry as violation of certain partial differential equations, and to numerically minimize such violation over the space of all invertible transformations, parametrized as invertible neural networks. For example, our method rediscovers the famous Gullstrand-Painleve metric that manifests hidden translational symmetry in the Schwarzschild metric of non-rotating black holes, as well as Hamiltonicity, modularity and other simplifying traits not traditionally viewed as symmetries.

Deep Set Auto Encoders for Anomaly Detection in Particle Physics
Bryan Ostdiek
SciPost Physics, 2022, Vol. 12, Issue 1 [ arXiv:2109.01695 ]

Abstract There is an increased interest in model agnostic search strategies for physics beyond the standard model at the Large Hadron Collider. We introduce a Deep Set Variational Autoencoder and present results on the Dark Machines Anomaly Score Challenge. We find that the method attains the best anomaly detection ability when there is no decoding step for the network, and the anomaly score is based solely on the representation within the encoded latent space. This method was one of the top-performing models in the Dark Machines Challenge, both for the open data sets as well as the blinded data sets.

Machine-Learning media bias
Samantha D’Alonzo, Max Tegmark
[ arXiv:2109.00024 ]

Abstract We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.

Hardware-accelerated Inference for Real-Time Gravitational-Wave Astronomy
Alec Gunny, Dylan Rankin, Jeffrey Krupa, Muhammed Saleem, Tri Nguyen, Michael Coughlin, Philip Harris, Erik Katsavounidis, Steven Timm, Burt Holzman
[ arXiv:2108.12430 ]

Abstract The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deployment of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.

Towards an Optimal Estimation of Cosmological Parameters with the Wavelet Scattering Transform
Georgios Valogiannis, Cora Dvorkin
Physical Review D, 2022, 105, 103534 [ arXiv:2108.07821 ]

Abstract Optimal extraction of the non-Gaussian information encoded in the Large-Scale Structure (LSS) of the universe lies at the forefront of modern precision cosmology. We propose achieving this task through the use of the Wavelet Scattering Transform (WST), which subjects an input field to a layer of non-linear transformations that are sensitive to non-Gaussianity in spatial density distributions through a generated set of WST coefficients. In order to assess its applicability in the context of LSS surveys, we apply the WST on the 3D overdensity field obtained by the Quijote simulations, out of which we extract the Fisher information in 6 cosmological parameters. It is subsequently found to deliver a large improvement in the marginalized errors on all parameters, ranging between 1.2−4× tighter than the corresponding ones obtained from the regular 3D cold dark matter + baryon power spectrum, as well as a 50% improvement over the neutrino mass constraint given by the marked power spectrum. Through this first application on 3D cosmological fields, we demonstrate the great promise held by this novel statistic and set the stage for its future application to actual galaxy observations.

Deep multi-task mining Calabi-Yau four-folds
Harold Erbin, Riccardo Finotello, Robin Schneider, Mohamed Tamaazousti
Machine Learning: Science and Technology, 2021, Volume 3, Number 1 [ arXiv:2108.02221 ]

Abstract We continue earlier efforts in computing the dimensions of tangent space cohomologies of Calabi-Yau manifolds using deep learning. In this paper, we consider the dataset of all Calabi-Yau four-folds constructed as complete intersections in products of projective spaces. Employing neural networks inspired by state-of-the-art computer vision architectures, we improve earlier benchmarks and demonstrate that all four non-trivial Hodge numbers can be learned at the same time using a multi-task architecture. With 30% (80%) training ratio, we reach an accuracy of 100% for h(1,1) and 97% for h(2,1) (100% for both), 81% (96%) for h(3,1), and 49% (83%) for h(2,2). Assuming that the Euler number is known, as it is easy to compute, and taking into account the linear constraint arising from index computations, we get 100% total accuracy.

Nonperturbative renormalization for the neural network–QFT correspondence
Harold Erbin, Vincent Lahoche, Dine Ousmane Samary
Machine Learning Science and Technology, 2022, Volume 3, Number 1, Article 015027 [ arXiv:2108.01403 ]

Abstract In a recent work~[1], Halverson, Maiti and Stoner proposed a description of neural networks in terms of a Wilsonian effective field theory. The infinite-width limit is mapped to a free field theory, while finite N corrections are taken into account by interactions (non-Gaussian terms in the action). In this paper, we study two related aspects of this correspondence. First, we comment on the concepts of locality and power-counting in this context. Indeed, these usual space-time notions may not hold for neural networks (since inputs can be arbitrary), however, the renormalization group provides natural notions of locality and scaling. Moreover, we comment on several subtleties, for example, that data components may not have a permutation symmetry: in that case, we argue that random tensor field theories could provide a natural generalization. Second, we improve the perturbative Wilsonian renormalization from~[1] by providing an analysis in terms of the nonperturbative renormalization group using the Wetterich-Morris equation. An important difference with usual nonperturbative RG analysis is that only the effective (IR) 2-point function is known, which requires setting the problem with care. Our aim is to provide a useful formalism to investigate neural networks behavior beyond the large-width limit (i.e.~far from Gaussian limit) in a nonperturbative fashion. A major result of our analysis is that changing the standard deviation of the neural network weight distribution can be interpreted as a renormalization flow in the space of networks. We focus on translations invariant kernels and provide preliminary numerical results.

Discovering Sparse Interpretable Dynamics from Partial Observations
Peter Y. Lu, Joan Ariño, Marin Soljačić
[ arXiv:2107.10879 ]

Abstract Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. We propose a machine learning framework for discovering these governing equations using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. Our tests show that this method can successfully reconstruct the full system state and identify the underlying dynamics for a variety of ODE and PDE systems.

Flow-based sampling for multimodal distributions in lattice field theory
Daniel C. Hackett, Chung-Chun Hsieh, Michael S. Albergo, Denis Boyda, Jiunn-Wei Chen, Kai-Feng Chen, Kyle Cranmer, Gurtej Kanwar, Phiala E. Shanahan
[ arXiv:2107.00734 ]

Abstract Recent results have demonstrated that samplers constructed with flow-based generative models are a promising new approach for configuration generation in lattice field theory. In this paper, we present a set of methods to construct flow models for targets with multiple separated modes (i.e. theories with multiple vacua). We demonstrate the application of these methods to modeling two-dimensional real scalar field theory in its symmetry-broken phase. In this context we investigate the performance of different flow-based sampling algorithms, including a composite sampling algorithm where flow-based proposals are occasionally augmented by applying updates using traditional algorithms like HMC.

Learning Task Informed Abstractions
Xiang Fu, Ge Yang, Pulkit Agrawal, Tommi Jaakkola
[ arXiv:2106.15612 | code ]

Abstract Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.

The Principles of Deep Learning Theory
Daniel A. Roberts, Sho Yaida, Boris Hanin
Cambridge University Press (Book), 2022 [ arXiv:2106.10165 ]

Abstract This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Single electrons on solid neon as a solid-state quit platform
Xianjing Zhou, Gerwin Koolstra, Xufeng Zhang, Ge Yang, Xu Han, Brennan Dizdar, Divan Ralu, Wei Guo, Kater W. Murch, David I. Shuster, Dafei Jin
Nature, 2022, 605, 46-50 [ arXiv:2106.10326 ]

Abstract Progress toward the realization of quantum computers requires persistent advances in their constituent building blocks - qubits. Novel qubit platforms that simultaneously embody long coherence, fast operation, and large scalability offer compelling advantages in the construction of quantum computers and many other quantum information systems. Electrons, ubiquitous elementary particles of nonzero charge, spin, and mass, have commonly been perceived as paradigmatic local quantum information carriers. Despite superior controllability and configurability, their practical performance as qubits via either motional or spin states depends critically on their material environment. Here we report our experimental realization of a new qubit platform based upon isolated single electrons trapped on an ultraclean solid neon surface in vacuum. By integrating an electron trap in a circuit quantum electrodynamics architecture, we achieve strong coupling between the motional states of a single electron and a single microwave photon in an on-chip superconducting resonator. Qubit gate operations and dispersive readout are implemented to measure the energy relaxation time T1 of 15 μs and phase coherence time T2 over 200 ns. These results indicate that the electron-on-solid-neon qubit already performs near the state of the art as a charge qubit.

Flow-based sampling for fermionic lattice field theories
Michael S. Albergo, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Julian M. Urban, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Phiala E. Shanahan
Physical Review D, 2021, Vol. 104, Iss. 11 – 1 [ arXiv:2106.05934 ]

Abstract Algorithms based on normalizing flows are emerging as promising machine learning approaches to sampling complicated probability distributions in a way that can be made asymptotically exact. In the context of lattice field theory, proof-of-principle studies have demonstrated the effectiveness of this approach for scalar theories, gauge theories, and statistical systems. This work develops approaches that enable flow-based sampling of theories with dynamical fermions, which is necessary for the technique to be applied to lattice field theory studies of the Standard Model of particle physics and many condensed matter systems. As a practical demonstration, these methods are applied to the sampling of field configurations for a two-dimensional theory of massless staggered fermions coupled to a scalar field via a Yukawa interaction.

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering
Vincent Sitzmann, Semon Rezchikov, William T. Freeman, Joshua B. Tenenbaum, Fredo Durand
[ arXiv:2106.02634 ]

Abstract Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the underlying 3D scene in a 360-degree, four-dimensional light field parameterized via a neural implicit representation. Rendering a ray from an LFN requires only a *single* network evaluation, as opposed to hundreds of evaluations per ray for ray-marching or volumetric based renderers in 3D-structured neural scene representations. In the setting of simple scenes, we leverage meta-learning to learn a prior over LFNs that enables multi-view consistent light field reconstruction from as little as a single image observation. This results in dramatic reductions in time and memory complexity, and enables real-time rendering. The cost of storing a 360-degree light field via an LFN is two orders of magnitude lower than conventional methods such as the Lumigraph. Utilizing the analytical differentiability of neural implicit representations and a novel parameterization of light space, we further demonstrate the extraction of sparse depth maps from LFNs.

Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators
Anindita Maiti, Keegan Stoner, James Halverson
[ arXiv:2106.00694 ]

Abstract Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.

Machine-Learning Non-Conservative Dynamics for New-Physics Detection
Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu
Physical Review E, 2021, Vol. 104, Article 055302 [ arXiv:2106.00026 ]

Abstract Energy conservation is a basic physics principle, the breakdown of which often implies new physics. This paper presents a method for data-driven "new physics" discovery. Specifically, given a trajectory governed by unknown forces, our Neural New-Physics Detector (NNPhD) aims to detect new physics by decomposing the force field into conservative and non-conservative components, which are represented by a Lagrangian Neural Network (LNN) and a universal approximator network (UAN), respectively, trained to minimize the force recovery error plus a constant λ times the magnitude of the predicted non-conservative force. We show that a phase transition occurs at λ=1, universally for arbitrary forces. We demonstrate that NNPhD successfully discovers new physics in toy numerical experiments, rediscovering friction (1493) from a damped double pendulum, Neptune from Uranus' orbit (1846) and gravitational waves (2017) from an inspiraling orbit. We also show how NNPhD coupled with an integrator outperforms previous methods for predicting the future of a damped double pendulum.

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider
T. Aarrestad, M. Van Beekveld, M. Bona, A. Bovenin, S. Caron, J. Davies, A. De Simone, C. Doglioni, J.M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek, M. Pierini, B. Ravina, R. Ruiz de Austri, S. Sekmen, M. Touranakou, M. Vaškevičiūte, R. Vilalta, J.-R. Vlimant, R. Verheyen, M. White, E. Wulff, E. Wallin, K.A. Wozniak, Z. Zhang
SciPost Physics, 2022, Volume 12, Issue 1, Page 43 [ arXiv:2105.14027 | code ]

Abstract We describe the outcome of a data challenge conducted as part of the Dark Machines initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of > 1 Billion simulated LHC events corresponding to 10 fb−1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.

Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution
Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Adi Suresh, and Jesse Thaler
Workshop paper at ICLR 2021 SimDL Workshop [ arXiv:2105.04448 ]

Abstract A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OmniFold. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model parameter estimation, the goal of deconvolution is to remove detector distortions in order to enable a variety of down-stream inference tasks. Our approach is the deep learning generalization of the common Richardson-Lucy approach that is also called Iterative Bayesian Unfolding in particle physics. We show how OmniFold can not only remove detector distortions, but it can also account for noise processes and acceptance effects.

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC
Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig, Manuel Blanco Valentin, Javier Duarte, Cristian Gingu, Philip Harris, James Hirschauer, Martin Kwok, Vladimir Loncar, Yingyi Luo, Llovizna Miranda, Jennifer Ngadiuba, Daniel Noonan, Seda Ogrenci-Memik, Maurizio Pierini, Sioni Summers, Nhan Tran
IEEE Transactions on Nuclear Science, 2021, Vol. 68, Issue 8 [ arXiv:2105.01683 ]

Abstract Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications.

Towards Designing and Exploiting Generative Networks for Neutrino Physics Experiments using Liquid Argon Time Projection Chambers
Paul Lutkus, Taritree Wongjirad, Schuchin Aeron
Conference paper at ICLR 2021 [ | code ]

Abstract In this paper, we show that a hybrid approach to generative modeling via combin- ing the decoder from an autoencoder together with an explicit generative model for the latent space is a promising method for producing images of particle tra- jectories in a liquid argon time projection chamber (LArTPC). LArTPCs are a type of particle physics detector used by several current and future experiments focused on studies of the neutrino. We implement a Vector-Quantized Variational Autoencoder (VQ-VAE) and PixelCNN which produces images with LArTPC- like features and introduce a method to evaluate the quality of the images using a semantic segmentation that identifies important physics-based features.

Scalable and Flexible Deep Bayesian Optimization with Auxiliary Information for Scientific Problems
Samuel Kim, Peter Y. Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, Marin Soljačić
[ arXiv:2104.11667 ]

Abstract Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely black-box. The data may have some known structure, e.g. symmetries, and the data generation process can yield useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and struggle to incorporate known structure or auxiliary information. Instead, we propose performing BO on complex, structured problems by using Bayesian Neural Networks (BNNs), a class of scalable surrogate models that have the representation power and flexibility to handle structured data and exploit auxiliary information. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that BNNs often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost.

A Compound Poisson Generator approach to Point-Source Inference in Astrophysics
Gabriel H. Collin, Nicholas L. Rodd, Tyler Erjavec, Kerstin Perez
The Astrophysical Journal, 2022, Volume 260, Number 2 [ arXiv:2104.04529 | code ]

Abstract The identification and description of point sources is one of the oldest problems in astronomy; yet, even today the correct statistical treatment for point sources remains as one of the field's hardest problems. For dim or crowded sources, likelihood based inference methods are required to estimate the uncertainty on the characteristics of the source population. In this work, a new parametric likelihood is constructed for this problem using Compound Poisson Generator (CPG) functionals which incorporate instrumental effects from first principles. We demonstrate that the CPG approach exhibits a number advantages over Non-Poissonian Template Fitting (NPTF) - an existing parametric likelihood method - in a series of test scenarios in the context of X-ray astronomy. These demonstrations show that the effect of the point-spread function, effective area, and choice of point-source spatial distribution cannot, in general, be factorised as they are in the NPTF construction, while the new CPG construction is validated in these scenarios. Separately, an examination of the diffuse-flux emission limit is used to show that most simple choices of priors on the standard parameterisation of the population model can result in unexpected biases: when a model comprising both a point-source population and diffuse component is applied to this limit, nearly all observed flux will be assigned to either the population or to the diffuse component. A new parametrisation is presented for these priors which is demonstrated to properly estimate the uncertainties in this limit. In this choice of priors, the CPG correctly identifies that the fraction of flux assigned to the population model cannot be constrained by the data.

Why is AI hard and Physics simple?
Daniel A. Roberts
[ arXiv:2104.00008 ]

Abstract We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.

Machine Learning the 6th Dimension: Stellar Radial Velocities from 5D Phase-Space Correlations
Adriana Dropulic, Bryan Ostdiek, Laura J. Chang, Hongwan Liu, Timothy Cohen, and Mariangela Lisanti
The Astrophysical Journal Letters, 2021, 915, L14 [ arXiv:2103.14039 ]

Abstract The Gaia satellite will observe the positions and velocities of over a billion Milky Way stars. In the early data releases, the majority of observed stars do not have complete 6D phase-space information. In this Letter, we demonstrate the ability to infer the missing line-of-sight velocities until more spectroscopic observations become available. We utilize a novel neural network architecture that, after being trained on a subset of data with complete phase-space information, takes in a star's 5D astrometry (angular coordinates, proper motions, and parallax) and outputs a predicted line-of-sight velocity with an associated uncertainty. Working with a mock Gaia catalog, we show that the network can successfully recover the distributions and correlations of each velocity component for stars that fall within ∼5 kpc of the Sun. We also demonstrate that the network can accurately reconstruct the velocity distribution of a kinematic substructure in the stellar halo that is spatially uniform, even when it comprises a small fraction of the total star count.

Modern Machine Learning and Particle Physics
Matthew D. Schwartz
Harvard Data Science Review, 2021, Issue 3.2, 13 May [ arXiv:2103.12226 ]

Abstract Over the past five years, modern machine learning has been quietly revolutionizing particle physics. Old methodology is being outdated and entirely new ways of thinking about data are becoming commonplace. This article will review some aspects of the natural synergy between modern machine learning and particle physics, focusing on applications at the Large Hadron Collider. A sampling of examples is given, from signal/background discrimination tasks using supervised learning to direct data-driven approaches. Some comments on persistent challenges and possible future directions for the field are included at the end.

Deep learning: a statistical viewpoint
Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin
[ arXiv:2103.09177 ]

Abstract The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu, Jennifer Ngadiuba, Mia Liu, Duc Hoang, Edward Kreinar, Zhenbin Wu
[ arXiv:2103.05579 ]

Abstract Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

The Luminous and Double-Peaked Type Ic Supernova 2019stc: Evidence for Multiple Energy Sources
Sebastian Gomez, Edo Berger, Griffin Hosseinzadeh, Peter K. Blanchard, Matt Nicholl, V. Ashley Villar
The Astrophysical Journal, 2021, Vol. 913, Article 143 [ arXiv:2103.02611 ]

Abstract

On the Minimal Error of Empirical Risk Minimization
Gil Kur, Alexander Rakhlin
[ arXiv:2102.12066 ]

Abstract RWe study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression, both in the random and the fixed design settings. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data. In the fixed design setting, we show that the error is governed by the global complexity of the entire class. In contrast, in random design, ERM may only adapt to simpler models if the local neighborhoods around the regression function are nearly as complex as the class itself, a somewhat counter-intuitive conclusion. We provide sharp lower bounds for performance of ERM for both Donsker and non-Donsker classes. We also discuss our results through the lens of recent studies on interpolation in overparameterized models.

Topological obstructions to autoencoding
Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts
Journal of High Energy Physics, 2021, Issue 4, Article 280 [ arXiv:2102.08380 ]

Abstract Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so clear. In particular, for data sets with nontrivial topology, there will always be points that erroneously seem anomalous due to global issues. Conversely, neural networks typically have an inductive bias or prior to locally interpolate such that undersampled or rare events may be reconstructed with small error, despite actually being the desired anomalies. Taken together, these facts are in tension with the simple picture of the autoencoder as an anomaly detector. Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training. We ground this analysis in the discussion of a mock "bump hunt" in which the autoencoder fails to identify an anomalous "signal" for reasons tied to the intrinsic topology of n-particle phase space.

On the convergence of group-sparse autoencoders
Emmanouil Theodosis, Bahareh Tolooshams, Pranay Tankala, Abiy Tasissa, Demba Ba
[ arXiv:2102.07003 ]

Abstract Recent approaches in the theoretical analysis of model-based deep learning architectures have studied the convergence of gradient descent in shallow ReLU networks that arise from generative models whose hidden layers are sparse. Motivated by the success of architectures that impose structured forms of sparsity, we introduce and study a group-sparse autoencoder that accounts for a variety of generative models, and utilizes a group-sparse ReLU activation function to force the non-zero units at a given layer to occur in blocks. For clustering models, inputs that result in the same group of active units belong to the same cluster. We proceed to analyze the gradient dynamics of a shallow instance of the proposed autoencoder, trained with data adhering to a group-sparse generative model. In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix. We validate our model through numerical analysis and highlight the superior performance of networks with a group-sparse ReLU compared to networks that utilize traditional ReLUs, both in sparse coding and in parameter recovery tasks. We also provide real data experiments to corroborate the simulated results, and emphasize the clustering capabilities of structured sparsity models.

Path integral contour deformations for observables in SU(N) gauge theory
William Detmold, Gurtej Kanwar, Henry Lamm, Michael L. Wagman, Neill C. Warrington
Physical Review D, 2021, Vol. 103, Issue 9, Article 094517 [ arXiv:2101.12668 ]

Abstract Path integral contour deformations have been shown to mitigate sign and signal-to-noise problems associated with phase fluctuations in lattice field theories. We define a family of contour deformations applicable to SU(N) lattice gauge theory that can reduce sign and signal-to-noise problems associated with complex actions and complex observables. For observables, these contours can be used to define deformed observables with identical expectation value but different variance. As a proof-of-principle, we apply machine learning techniques to optimize the deformed observables associated with Wilson loops in two dimensional SU(2) and SU(3) gauge theory. We study loops consisting of up to 64 plaquettes and achieve variance reduction of up to 4 orders of magnitude.

The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics
Gregor Kasieczka (ed), Benjamin Nachman (ed), David Shih (ed), Oz Amram, Anders Andreassen, Kees Benkendorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H. Collins, Biwei Dai, Felipe F. De Freitas, Barry M. Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D. A. Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F. Kamenik, Charanjit K. Khosa, Patrick Komiske, Luc Le Pottier, Pablo Martín-Ramiro, Andrej Matevc, Eric Metodiev, Vinicius Mikuni, Inês Ochoa, Sang Eon Park, Maurizio Pierini, Dylan Rankin, Veronica Sanz, Nilai Sarda, Urous Seljak, Aleks Smolkovic, George Stein, Cristina Mantilla Suarez, Manuel Szewc, Jesse Thaler, Steven Tsan, Silviu-Marian Udrescu, Louis Vaslin, Jean-Roch Vlimant, Daniel Williams, Mikaeel Yunus
Reports on Progress in Physics, 2021, Volume 84, Number 12 [ arXiv:2101.08320 ]

Abstract A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.

Introduction to Normalizing Flows for Lattice Field Theory
Michael S. Albergo, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Kyle Cranmer, Sébastien Racanière, Danilo Jimenez Rezende, and Phiala E. Shanahan
[ arXiv:2101.08176 ]

Abstract This notebook tutorial demonstrates a method for sampling Boltzmann distributions of lattice field theories using a class of machine learning models known as normalizing flows. The ideas and approaches proposed in arXiv:1904.12072, arXiv:2002.02428, and arXiv:2003.06413 are reviewed and a concrete implementation of the framework is presented. We apply this framework to a lattice scalar field theory and to U(1) gauge theory, explicitly encoding gauge symmetries in the flow-based approach to the latter. This presentation is intended to be interactive and working with the attached Jupyter notebook is recommended.

E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once
Benjamin Nachman and Jesse Thaler
Physical Review D, 2021, Vol. 103, Issue 11, Article 116013 [ arXiv:2101.07263 | code ]

Abstract There have been a number of recent proposals to enhance the performance of machine learning strategies for collider physics by combining many distinct events into a single ensemble feature. To evaluate the efficacy of these proposals, we study the connection between single-event classifiers and multi-event classifiers under the assumption that collider events are independent and identically distributed (IID). We show how one can build optimal multi-event classifiers from single-event classifiers, and we also show how to construct multi-event classifiers such that they produce optimal single-event classifiers. This is illustrated for a Gaussian example as well as for classification tasks relevant for searches and measurements at the Large Hadron Collider. We extend our discussion to regression tasks by showing how they can be phrased in terms of parametrized classifiers. Empirically, we find that training a single-event (per-instance) classifier is more effective than training a multi-event (per-ensemble) classifier, as least for the cases we studied, and we relate this fact to properties of the loss function gradient in the two cases. While we did not identify a clear benefit from using multi-event classifiers in the collider context, we speculate on the potential value of these methods in cases involving only approximate independence, as relevant for jet substructure studies.

Fast convolutional neural networks on FPGAs with hls4ml
Thea Aarrestad, Vladimir Loncar, Nicolò Ghielmetti, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Kevin Pedro, Nhan Tran, Mia Liu, Edward Kreinar, Zhenbin Wu, Duc Hoang
Machine Learning Science and Technology, 2021, Volume 2, Issue 4, Article 045015 [ arXiv:2101.05108 ]

Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of 5μs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-Star Mergers in Real LIGO Data using Deep Learning
Plamen G. Krastev, Kiranjyot Gill, V. Ashley Villar, Edo Berger
Physics Letters B, 2021, Vol. 815, Article 136161 [ arXiv:2012.13101 ]

Abstract One of the key challenges of real-time detection and parameter estimation of gravitational waves from compact binary mergers is the computational cost of conventional matched-filtering and Bayesian inference approaches. In particular, the application of these methods to the full signal parameter space available to the gravitational-wave detectors, and/or real-time parameter estimation is computationally prohibitive. On the other hand, rapid detection and inference are critical for prompt follow-up of the electromagnetic and astro-particle counterparts accompanying important transients, such as binary neutron-star and black-hole neutron-star mergers. Training deep neural networks to identify specific signals and learn a computationally efficient representation of the mapping between gravitational-wave signals and their parameters allows both detection and inference to be done quickly and reliably, with high sensitivity and accuracy. In this work we apply a deep-learning approach to rapidly identify and characterize transient gravitational-wave signals from binary neutron-star mergers in real LIGO data. We show for the first time that artificial neural networks can promptly detect and characterize binary neutron star gravitational-wave signals in real LIGO data, and distinguish them from noise and signals from coalescing black-hole binaries. We illustrate this key result by demonstrating that our deep-learning framework classifies correctly all gravitational-wave events from the Gravitational-Wave Transient Catalog, GWTC-1 [Phys. Rev. X 9 (2019), 031040]. These results emphasize the importance of using realistic gravitational-wave detector data in machine learning approaches, and represent a step towards achieving real-time detection and inference of gravitational waves.

Field of Junctions: Extracting Boundary Structure at Low SNR
Dor Verbin, Todd Zickler
[ arXiv:2011.13866 ]

Abstract We introduce a bottom-up model for simultaneously finding many boundary elements in an image, including contours, corners and junctions. The model explains boundary shape in each small patch using a 'generalized M-junction' comprising M angles and a freely-moving vertex. Images are analyzed using non-convex optimization to cooperatively find M+2 junction values at every location, with spatial consistency being enforced by a novel regularizer that reduces curvature while preserving corners and junctions. The resulting 'field of junctions' is simultaneously a contour detector, corner/junction detector, and boundary-aware smoothing of regional appearance. Notably, its unified analysis of contours, corners, junctions and uniform regions allows it to succeed at high noise levels, where other methods for segmentation and boundary detection fail.

AI Poincaré: Machine Learning Conservation Laws from Trajectories
Ziming Liu and Max Tegmark
Physical Review Letters, 2021, Volume 126, Issue 18, Article 180604 [ arXiv:2011.04698 ]

Abstract We present AI Poincaré, a machine learning algorithm for auto-discovering conserved quantities using trajectory data from unknown dynamical systems. We test it on five Hamiltonian systems, including the gravitational 3-body problem, and find that it discovers not only all exactly conserved quantities, but also periodic orbits, phase transitions and breakdown timescales for approximate conservation laws.

Parameter Inference from Event Ensembles and the Top-Quark Mass
Forrest Flesher, Katherine Fraser, Charles Hutchison, Bryan Ostdiek, Matthew D. Schwartz
Journal of High Energy Physics, 2021, Article 58 [ arXiv:2011.04666 ]

Abstract One of the key tasks of any particle collider is measurement. In practice, this is often done by fitting data to a simulation, which depends on many parameters. Sometimes, when the effects of varying different parameters are highly correlated, a large ensemble of data may be needed to resolve parameter-space degeneracies. An important example is measuring the top-quark mass, where other physical and unphysical parameters in the simulation must be marginalized over when fitting the top-quark mass parameter. We compare three different methodologies for top-quark mass measurement: a classical histogram fitting procedure, similar to one commonly used in experiment optionally augmented with soft-drop jet grooming; a machine-learning method called DCTR; and a linear regression approach, either using a least-squares fit or with a dense linearly-activated neural network. Despite the fact that individual events are totally uncorrelated, we find that the linear regression methods work most effectively when we input an ensemble of events sorted by mass, rather than training them on individual events. Although all methods provide robust extraction of the top-quark mass parameter, the linear network does marginally best and is remarkably simple. For the top study, we conclude that the Monte-Carlo-based uncertainty on current extractions of the top-quark mass from LHC data can be reduced significantly (by perhaps a factor of 2) using networks trained on sorted event ensembles. More generally, machine learning from ensembles for parameter estimation has broad potential for collider physics measurements.

Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge
Sang Eon Park, Dylan Rankin, Silviu-Marian Udrescu, Mikaeel Yunus, Philip Harris
Journal of High Energy Physics, 2021, Article 30 [ arXiv:2011.03550 | code ]

Abstract Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow.

Learning to Unknot
Sergei Gukov, James Halverson, Fabian Ruehle, and Piotr Sułkowski
Machine Learning - Science and Technology, 2021, Volume 2, Number 2, Article 025035 [ arXiv:2010.16263 ]

Abstract We introduce natural language processing into the study of knot theory, as made natural by the braid word representation of knots. We study the UNKNOT problem of determining whether or not a given knot is the unknot. After describing an algorithm to randomly generate $N$-crossing braids and their knot closures and discussing the induced prior on the distribution of knots, we apply binary classification to the UNKNOT decision problem. We find that the Reformer and shared-QK Transformer network architectures outperform fully-connected networks, though all perform well. Perhaps surprisingly, we find that accuracy increases with the length of the braid word, and that the networks learn a direct correlation between the confidence of their predictions and the degree of the Jones polynomial. Finally, we utilize reinforcement learning (RL) to find sequences of Markov moves and braid relations that simplify knots and can identify unknots by explicitly giving the sequence of unknotting actions. Trust region policy optimization (TRPO) performs consistently well for a wide range of crossing numbers and thoroughly outperformed other RL algorithms and random walkers. Studying these actions, we find that braid relations are more useful in simplifying to the unknot than one of the Markov moves.

Enhancing searches for resonances with machine learning and moment decomposition
Ouail Kitouni, Benjamin Nachman, Constantin Weisser, and Mike Williams
Journal of High Energy Physics, 2021, Article 70 [ arXiv:2010.09745 | code ]

Abstract A key challenge in searches for resonant new physics is that classifiers trained to enhance potential signals must not induce localized structures. Such structures could result in a false signal when the background is estimated from data using sideband methods. A variety of techniques have been developed to construct classifiers which are independent from the resonant feature (often a mass). Such strategies are sufficient to avoid localized structures, but are not necessary. We develop a new set of tools using a novel moment loss function (Moment Decomposition or MoDe) which relax the assumption of independence without creating structures in the background. By allowing classifiers to be more flexible, we enhance the sensitivity to new physics without compromising the fidelity of the background estimation.