Applied Algebra Seminar Fall 2024: Difference between revisions

When: 11am Central, Thursdays

Where: Van Vleck 901 Hall

Email List: subscribe to our mailing list, or email JIR to add you.

Contacts: Jose Israel Rodriguez, Shamgar Gurevich, Thomas Yahl,

How: Talks are in-person talks and 40-50 minutes long.

Inclusivity: We welcome participants with applied algebra interests from other institutions and departments. We have social time 5-10 minutes before the talk starts.

Fall 2024 Schedule

Contact jRodriguez43@wisc.edu to suggest speakers.
Football weekends have dates *

date	Location	speaker	title
July 1 (Monday)	VV 901	Julia Lindberg	Density estimation for Gaussian mixture models
Aug 22	VV 901	Zehua Lai	ReLU transformers and piecewise polynomials
Aug 29*		No Seminar
Sept 5*	VV 901	Taylor Brysiewicz	The Area-Length System and the Heron Variety
Sept 12*	VV901	Emily King	Old is New Again: Group Actions and PCA in Data Science
Sept 26	VV 901	Tim Duff	Algebra & Geometry of Camera Resectioning
October 3*	VV 901	No seminar
October 10	VV 901	Fabian Maximilian Faulstich	Algebraic varieties in coupled cluster theory for quantum chemistry
October 17	VV 901	Oskar Henriksson	Moment identifiability of mixture distributions: the gamma and inverse Gaussian case
October 24*		No seminar --- SIAM MDS (Atlanta)
Oct 31 (2:25pm)	B139	Chuanqi Zhang	Faster isomorphism testing of p-groups of Frattini class-2 (joint with Number Theory and Representation Theory)
Nov 7	VV 901	Yulia Alexandr	Mixtures of Discrete Decomposable Graphical Models
Nov 14*	VV 901	Laura Balzano	Efficient Low-Dimensional Compression for Deep Overparameterized Learning and Fine-Tuning
Nov 21	VV 901	Eitan Levin	Any-dimensional optimization
Spring 2025		Aviva Englander	Identifiability of Level-2 Networks
Spring 2025		Hannah Friedman	TBD
Spring 2025		Yingying Wu	TBD

Abstracts (2024 Fall)

Yulia Alexandr

Mixtures of Discrete Decomposable Graphical Models

Graphical models capture relationships between random variables, with nodes representing variables and edges indicating dependencies. Their natural graph-based parametrization makes them useful in statistics, biology, and phylogenetics. In practice, one often encounters latent variables that are impossible or infeasible to measure directly, which can lead to geometric challenges. Mixture models, featuring only one hidden variable, are among the most accessible latent variable models. I will talk about mixtures of special families of discrete graphical models, called decomposable. I will focus on their algebraic properties, such as polynomial constraints that distributions in such models must satisfy. This talk is based on joint work with Jane Ivy Coons and Nils Sturma.

Laura Balzano

Efficient Low-Dimensional Compression for Deep Overparameterized Learning and Fine-Tuning

While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that leveraging inherent low-dimensional structure within the model parameter updates, we can reap the benefits of overparameterization without the computational burden. We study two problems: deep low-rank matrix completion and low-rank fine-tuning of deep networks. First, we develop theory showing that when training the parameters of a deep linear network to fit a low-rank or wide matrix, the gradient dynamics of each weight matrix are confined to a fixed low-dimensional subspace. Consequently, we can construct and train compact, highly compressed factorizations possessing the same benefits as their overparameterized counterparts. For language model fine-tuning, we introduce a method called “Deep LoRA”, which improves the existing low-rank adaptation (LoRA) technique. While this technique does not arise directly from our theory, it involves only a minor modification that is surprisingly effective and of great interest for future theoretical study. The effectiveness of Deep LoRA is validated through its performance on natural language understanding tasks, particularly when fine-tuning with a limited number of samples.

Emily King

Old is New Again: PCA and Group Actions in Data Science"

Although neural networks yield powerful results in many applications, there are drawbacks to their use. For example, there are many fields where the amount of labeled data needed to train neural networks is insufficient (i.e., in the realm of un- and semi-supervised learning). Further, the internal decision-making process of a neural network is often opaque, leading to a desire for more interpretable techniques. These concerns have led some to take another look at algorithms built on more classical mathematics, often with algebra at their core. To that end, we will learn about two applications of principal component analysis, one in the context of subspace geometry and applied to computer vision and the other as a post-processing step to topological data analysis applied to data from atmospheric science. Then, invariant and equivariant learning will be introduced. If time remains, some new results in this field will also be presented.

Fabian Maximilian Faulstich

Algebraic geometry and varieties in quantum chemistry

``Linear algebra was great, but non-linear algebra is the future." In the spirit of this quote, I will introduce the coupled cluster theory – a non-linear ansatz to approximate solutions to the electronic Schrödinger equation – which is arguably one of the most widely used wavefunction methods in computational quantum chemistry. I will illustrate its theoretical underpinnings and elaborate on some of the most recent advances that include insights from algebraic geometry.

Tim Duff

Algebraic Stories about Camera Resectioning

Abstract: The problem of absolute camera pose estimation, also known as camera resectioning, may be stated as follows: determine the location and orientation of an unknown pinhole camera, given some set of correspondences between known 3D entities in the scene and their 2D projections (and possibly some of the camera’s internal parameters like focal length.) This problem has received an enormous amount of attention in the case of 3D-2D point correspondences. The first part of this talk concerns the uncalibrated case. I will describe results (joint with Connelly and Loucks-Tavitas) that characterize the constraints on 2D image projections that must be satisfied if the corresponding 3D points are fixed and sufficiently general. We call the set of all such 2D image projections the resectioning variety associated to the 3D point arrangement. I will state our conjecture concerning the Euclidean distance degree of the generic resectioning variety. The second part deals with calibrated cameras, and considers also 3D-2D line correspondences. I will report on recent work (joint with Hruby and Pollefeys) where we used Galois theory to find more efficient solutions to the calibrated camera resectioning problem using mixed point and line features in the so-called "minimal cases", where exact solutions exist even for noisy data. Time-permitting, I will discuss intriguing constraints on a camera's location that arise from such hybrid point-line correspondences.

Oskar Henriksson

Moment identifiability of mixture distributions: the gamma and inverse Gaussian case

The method of moments is a classical technique for parameter estimation in statistics, and leads to a fundamental problem of identifiability: How many moments do we expect to need, in order to determine the parameters? This can be studied from the point of view of algebraic geometry, through the so-called moment varieties associated to the distribution, which in the case of mixture distributions are highly structured secant varieties. In this talk, I will discuss some new results on algebraic and rational identifiability for mixture distributions built from the gamma and inverse Gaussian distribution. The techniques we use build on previous work on Gaussian mixtures, as well as the theory of determinantal varieties and the intersection theory of secant varieties. This is ongoing work with Kristian Ranestad, Lisa Seccia and Teresa Yu.

Zehua Lai

ReLU transformers and piecewise polynomials

The attention mechanism, as used in a transformer, underlies many modern AI technologies in the current news cycle, such as ChatGPT and other large language models. In this talk, we will give an introduction to the mathematical structure of transformers and discuss its basic structural properties. We highlight a perhaps important but hitherto unobserved insight: The Re-LU attention module in a transformer is a cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. Conversely, if we assume an old unsolved conjecture by Pierce--Birkhoff in real algebraic geometry, then every spline is also a ReLU-transformer. This could help us understand the approximation/interpolation power of current AI models.

Julia Lindberg

Density estimation for Gaussian mixture models

Density estimation for Gaussian mixture models is a classical problem in statistics that has applications in a variety of disciplines. Two solution techniques are commonly used for this problem: the method of moments and maximum likelihood estimation. This talk will discuss both methods by focusing on the underlying geometry of each problem.

Taylor Brysiewicz

The Area-Length System and the Heron Variety

Discretizing spacetime involves triangulating a 4-dimensional Lorentzian manifold into many 4-simplices. In several physical theories, the fundamental degrees of freedom involve area variables instead of length variables, and hence identifying a 4-simplex by its 10 triangular areas becomes desirable. A natural question emerges: ``Is a 4-simplex, up to rigid motions, determined by its 10 triangular areas?"

The answer to this identifiability question is "no". We quantify the extent to which identifiability fails by studying the degree of a corresponding branched cover and its Galois group. We show that there is no general formula in terms of radicals expressing the lengths in terms of the areas. Finally, we extend these results to other recovery problems on volumes of simplices, encoded by the algebraic matroid of the Heron variety.

Eitan Levin (Caltech)

Any-dimensinal optimization

Optimization problems in many applications arise as sequences indexed by dimension. Indeed, graph problems are defined for graphs of all sizes, regularizers for inverse problems are defined for signals and images of any size, and (quantum) information problems are defined for distributions over any number of (qu)bits. We present a framework to systematically study such sequences of optimization problems, fit tractable data-driven relaxations, and derive lower bounds on their limiting values. We do so by observing that these problems are described in a “free” way, which makes it obvious how to instantiate them in any dimension. We explain how such free descriptions arise from the phenomenon of representation stability, and study their properties by considering the relations between problems across dimensions and their symmetries.

Earlier Abstracts

Zinan Wang (Wisconsin)

Applied algebra in kinematics of a spatial serial robot

Algebra plays a crucial role in kinematics, providing the mathematical framework to analyze and describe motion. A spatial serial robot with 6 revolute joints gathers interest in both academics and industry. In this talk, we first give a brief introduction to Denavit-Hartenberg convention and screw theory, which are different frames which describe the motion of such robot. Second, we use techniques in ideals to give an algebraic classification about singularity of the robot and their closed form, verifying the classification from engineering viewpoint. Third, we introduce a novel path-following method to find all alternative solutions to inverse kinematic problem by encircling a path around isolated or even coincident singularities at a constant distance. This method formalizes Huitfeldt's folklore.

Joseph Cummings (Notre Dame)

The Pfaffian Structure of CFN Phylogenetic Networks

Algebraic techniques in phylogenetics have historically been successful at proving identifiability results and have also led to novel reconstruction algorithms. In this talk, we look at the ideal of phylogenetic invariants of the Cavender-Farris-Neyman (CFN) model on a level-1 phylogenetic network with the goal of providing a description of the invariants which is useful for network inference. Geometrically, the CFN model is a complex variety intersected with the real probability simplex, and the ideal of phylogenetic invariants is the vanishing ideal of this variety. We will provide an explicit Gröbner basis for the vanishing ideal of the model. This is done by reparameterizing the model through the space of skew-symmetric matrices via Pfaffians. Under this reparameterization, the ideal is cut out by certain minors of a generic skew-symmetric matrix. Lastly, we will talk about some identifiability results and numerical simulations which hint at a method for network reconstruction. If there is time, we will discuss a statistical interpretation of the reparameterization. In broad strokes, it says that level-1 CFN phylogenetic networks are completely characterized by the covariances of the random variables associated with the leaves of the network.

Margaret Regan (College of the Holy Cross)

Using numerical algebraic geometry for problems in computer vision

Many problems in computer vision can be formulated using a parameterized system of polynomials which must be solved for given instances of the parameters. Solutions and behavior over the real numbers are those that provide meaningful information for these applications. By using homotopy continuation within numerical algebraic geometry, one can solve these parameterized polynomial systems and analyze their structure. First, I propose a new approach which uses locally adaptive methods and sparse matrix calculations to solve parameterized overdetermined systems in projective space. Examples are provided in 2D image reconstruction to compare the new methods with traditional approaches in numerical algebraic geometry. Second, I discuss new homotopy continuation methods for solving two minimal trifocal calibrated relative pose problems defined by point and line correspondences, which appear together, e.g., in urban scenes or observing curves. Experiments are shown using real and synthetic data to demonstrate that challenging scenes can be reconstructed where standard methods fail. Third, I present a use of Galois/monodromy groups to uncover symmetries and structure in the polynomial systems, as well as finding their corresponding equations using multivariate rational function interpolation. Practical examples will be shown to demonstrate the identification of the symmetries and computing their formulas.

Guido Montufar (UCLA)

Geometry of Linear Convolutional Networks

We consider linear convolutional networks, described in terms of polynomials that admit certain factorizations. Based on this, we investigate the impact of the network architecture on the set of representable functions and obtain a detailed description of the parameter optimization problem, including critical points and dynamical invariants of the gradient flow. In contrast to fully connected linear networks, here the function space is subject to polynomial inequality constraints and the results reveal a variety of new phenomena, including the existence spurious minima introduced solely by the parametrization, and a sparsity bias whereby the networks preferentially learn convolutional maps with repeated filters. The talk is based on joint works with Kathlén Kohn, Thomas Merkh, Vahid Shahverdi, Matthew Trager.

Joshua Cape (UW-Madison)

On Procrustes-type problems and dimensionality reduction

We will briefly revisit the classical orthogonal Procrustes problem before discussing recent related developments in the study of multivariate statistics and data dimensionality reduction. This ~~talk~~ discussion will be interactive. Audience members might even be called upon to speak!

Michela Mancini (Georgia Tech)

Crater projection in pushbroom cameras: application to spacecraft state estimation

Pushbroom cameras are scanning sensors widely used to take pictures of planetary surfaces. These cameras are equipped with a linear sensor array that captures a one-dimensional image. As the camera moves, successive 1D images are stacked together and the final two-dimensional image is con-structed. Under linear motion, the effect of this projection geometry is to transform lines on the ground to hyperbolas in the pushbroom image. In this talk, we will analyze the projection of a planetary crater rim when imaged by a pushbroom camera. For spacecraft optical navigation purposes, craters are often approximated as ellipses, but conics do not project to conics in this framework unless special geometric conditions are met. Starting from the description of the ellipse in terms of its canonical parametric representation, we will develop the analytic expression of the projected curve in the image plane.

Thanks to the polynomial description of the projected rim, we will finally see how parameter homotopy continuation allows to estimate the position and velocity of a camera onboard a spacecraft using the pushbroom image of a known crater.

Thomas Yahl (UW-Madison)

The Unbalanced Procrustes Problem and Algebraic Optimization

The Procrustes problem looks to find the orientation for which one data set most closely resembles another data set and appears frequently in applications to data science, image reconstruction, and satellite positioning. The balanced Procrustes problem, of when these data sets have the same dimension, can be understood as a distance minimization problem and the optimal solutions can be expressed in a closed form. However, the unbalanced Procrustes problem, of orthogonally projecting a higher dimensional data set to model a lower dimensional data set, is less understood and there is no known closed form for its optimal solutions.

We study the unbalanced Procrustes problem as an algebraic optimization problem – we analyze defining equations for its set of critical points. The algebraic degree of the problem is the number of critical points for general parameters, and is a measure of the complexity of computing the set of critical points numerically. Using numerical and symbolic techniques, we provide formulas for the algebraic degree of the unbalanced Procrustes problem in special cases and conjecture general results on the degree. We detail how these theoretical results may be used in applications to develop methods of computing numerical solutions to the unbalanced Procrustes problem.

Micheline Soley (UW-Madison)

Tensor Networks and Quantum Computing for Highly Multidimensional Molecular Simulations

Today, exact quantum dynamics simulations face the “curse of dimensionality,” in which computational cost grows exponentially as the dimensionality of molecular systems increases. This limits exact grid-based quantum dynamics simulations to the smallest molecular systems. Low-rank tensor-network approaches provide a way to surmount this curse for many chemical systems, as they can exponentially reduce computational cost for weakly coupled molecular systems. Our work capitalizes on the native advantages of tensor networks and the high degree of entanglement possible on quantum computers to develop new approaches to simulate molecular systems.

Tung Nguyen (Western University)

Join algebra and applications to dynamics of the Kuramoto model

Network theory plays a fundamental role in the study and understanding of collective behavior in several systems. For example, it has found applications in neuroscience, ecology, neural dynamics, biology, and many physical systems. In this domain, the Kuramoto model has emerged as the central mathematical model for studying various synchronization phenomena in nature. Typically, we often consider these systems as a single network. However, many real-world systems are multi-leveled; namely, the system can have two levels of connections. On the first level, there are connections within each sub-network, and on the higher level, there are connections between the sub-networks. In my talk, I will introduce the join algebra as a way to model these multi-level systems. I will then discuss some basic structures of this algebra as well as various applications to the Kuramoto model.

Jiayi Li (UCLA)

Geometry of Polynomial Neural Networks

Polynomial neural networks achieved remarkable performance in practical tasks such as face detection, image generation, and forecasting trading signals. Besides its empirical success, the underlying theory remains an open field of research. We study the expressivity and the learning process of polynomial neural networks from an algebraic perspective. The functional space parameterized by the weights of a network is a semi-algebraic set. We refer to the functional space as the neuromanifold, and its Zariski closure as the neurovariety. We studied the dimension, defectiveness, and the learning degree of different network architectures, providing a theoretical understanding of the expressivity and optimization complexity of these architectures. Our results are accompanied by experiments. This is joint work with Kaie Kubjas and Maximilian Wiesmann.

Felix Rydell (KTH)

Euclidean Distance Degree Polynomials Associated to Families of Rational Maps

Reconstructing 3D objects such as points and lines based on their images onto several views is a fundamental problem in computer vision. Given an element in a multiview variety, defined informally as the set of all object matches for a given set of views, we seek to find the 3D object that projects onto them. However, in practice data comes with noise and therefore doesn't strictly lie on the multiview variety. We remedy this by first solving the Euclidean distance problem, i.e. fitting the data onto the multiview variety by finding the closest point on it. The complexity of this problem is often measured via the Euclidean distance degree (EDD), which is the number of complex critical points of this optimization problem. A recent example is the computation of the EDD of the point multiview variety, which turns out to be a polynomial in the number of factors of degree 3.

In our work, we investigate the following more general version of the problem: Consider a finite dimensional family of rational maps with fixed domain and codomain. The map sending a point in the domain to the image tuple defined by a number of generic maps in this family defines a sequence of varieties, one for each number of generic maps. We aim to show that under reasonable assumptions, the EDD of such sequences of varieties are polynomials in the number of maps of degree equal to the dimension of the domain.

Aravind Baskar (Notre Dame)

Optimal synthesis of robotic mechanisms using homotopy continuation

Kinematic synthesis aims to find the dimensions of a mechanism after desired constraints have been posed on its motion. For exact synthesis, the number of constraints and dimensional design variables are equal. For approximate synthesis, the former exceeds the latter. The current research tackles approximate synthesis via an optimization formulation that is invariant to the number of constraint specifications. The objective function, which is polynomial in nature, is a sum of squares of the error in the kinematic equations at the task specifications. Using the first-order necessary conditions of optimality, solving the optimization problem boils down to that of finding the roots of a system of polynomial equations. Although these systems are of a larger total degree than their respective exact synthesis systems, homotopy continuation techniques based on multi-homogeneous structure and monodromy-based methods can be used to solve these systems completely. The roots of such systems comprise all the critical points including minima, saddles, and maxima which characterize the design space. Since secondary considerations further dictate whether a minimum is feasible for a practical design, some low index saddles are also found to lead to suitable designs. Investigating gradient descent paths starting from these low index saddles into the adjoining minima leads to useful families of solutions. This provides the designer with parameterized sets of design candidates enabled by the computation of all possible critical points of the objective. Two design case studies, namely, the design of a humanoid finger and the design of a robot leg for energetic hops are presented.

Kaitlyn Phillipson (UW Madison)

What can algebra and geometry tell us about a stimulus space?

Neural activity can be stored as collections of binary strings, called neural codes, which can provide information about the stimulus that the brain is receiving. In 1971, O'Keefe and Dostrovksy [OD71] found hippocampal neurons, called place cells, which fire when a neuron is in a specific location in a stimulus space, called the receptive field of the corresponding place cell. When the animal leaves the receptive field, the neuron goes silent, and when the animal returns to the receptive field, the neuron begins firing again, thus making a memory map of the space. Furthermore, it was discovered that these receptive fields were roughly convex sets in the stimulus space. A natural mathematical question, then, is: can we determine whether or not a general neural code can represent a convex receptive field diagram from the combinatorial structure of the code itself? While this question is still open in general, this talk will introduce techniques from algebra and geometry that, when applied to this problem, give some results that partially answer this question.

Arthur Bik (IAS)

Strength of (infinite) polynomials

Usually, when presented with a huge set of data in the form of a tensor, one seeks to decompose this tensor as a sum of a low number of pure tensors in order to find hidden structure. While this often works in practice, it is not clear why such a thing should be possible. Define the strength of a tensor T as the minimal number r such that T can be written as a sum of r tensors with a rank-1 flattening. For this notion of rank, we have the following result: Any nontrivial property of tensors that is preserved under changing coordinates in each direction implies bounded strength (independent of the dimensions of the tensor). In other words, given a huge tensor with structure we can expect it to be have low strength. This motivates us to study this rank function. In this talk, we focus on the strength of homogeneous polynomials (i.e. symmetric tensors).

Ananyan and Hochster defined the strength (before that known as Schmidt rank) of a polynomial in their paper proving Stillman's conjecture and it has appeared in various works since. I will state what is known and not known about it. Afterwards, we go to the infinite setting, where I will discuss how strength and other notions derived from it can be used to understand when infinite polynomial series degenerate to each other.

Colby Long

Algebraic Invariants in Phylogenetics.

Phylogenetics is the study of the evolutionary relationships among organisms. A central challenge in the field is to use biological data for a given set of organisms in order to determine the evolutionary tree that describes their history. One approach for doing so is to build tree-based models of DNA sequence evolution and then to select the model that best fits the data according to some criteria. Once a model is selected, the tree parameter of the model is assumed to describe the history of the taxa under consideration. However, in order for such a method to be reliable, the tree parameter of the model must be identifiable. In this talk, we will discuss how phylogenetic invariants, algebraic relationships among DNA site-pattern frequencies, can be used to establish model identifiability. We’ll also discuss some invariants-based methods for phylogenetic inference.

Zvi Rosen

Neural Codes and Oriented Matroids

In the 1970's, neuroscientists discovered that the hippocampus of a rat contains a virtual map of its environment. When the rat is in region 1, neuron 1 fires an electric impulse, when it moves to region 2, neuron 2 fires, and in the intersection, both neurons fire. In the decades since, algebraists have begun to model this situation abstractly as follows: Fix n convex subsets of Euclidean space, representing stimuli; then, record all possible subsets of [n] whose intersection is nonempty and not covered by other sets. This combinatorial object is called a "convex neural code."

In this talk, we relate the emerging theory of convex neural codes to the established theory of oriented matroids, which generalize systems of signed linear relations. By way of this connection, we prove that all convex codes are related to some representable oriented matroid, and we show that deciding whether a neural code is convex is NP-hard.

Jiaxin Hu

Beyond matrices: Statistical learning with tensor data

Tensors efficiently represent the multiway data and serve as the foundation of higher-order data analysis in various fields such as social networks, neuroscience, and genetics studies. Identifying the low-dimensional representation from noisy tensor observations is the primary goal in many statistical learning problems including interpretable tensor decomposition, tensor regression, and multiway clustering. Though people may unfold tensors to matrices and apply the matrix methods, the unfolding strategy ignores the higher-order structure and leads to sub-optimal results in several tensor problems. The gap between matrix and tensor analyses motivates the development of tensor tools.

In this talk, I will focus on the statistical methods with higher-order tensors. First, I will introduce the basic tensor concepts and overview the low-dimensional tensor methods with various applications. Second, I will present my works on generalized tensor decomposition with multiple features and degree-corrected tensor block model for multiway clustering. We will see the difference between the matrix and tensor analyses from both modelling and theoretical aspects.

Hanbaek Lyu

Phase transition in contingency tables with two linear margins

Contingency tables are matrices of nonnegative integer entries with prescribed row sums and columns sums (margins). They are fundamental objects in statistics for studying dependence structure between two or more variables, and they also correspond to bipartite multi-graphs with given degrees and play an important role in combinatorics and graph theory. Both counting the number of contingency tables and sampling one uniformly at random are fundamental problems concerning contingency tables with many connections and applications to other fields (e.g., testing hypothesis on co-occurrence of species in ecology).

A historic guiding principle to these problems is the independence heuristic, which was introduced by I. J. Good as far back as in 1950. The heuristic states that the constraints for the rows and columns of the table are asymptotically independent as the size of the table grows to infinity. This yields a simple yet surprisingly accurate formula for the count of contingency tables as well as a sampling heuristic using the hypergeometric (or Fisher-Yates) distribution. In this talk, we will discuss that the independence heuristic captures only one part of the picture. Specifically, for n by n contingency tables with two linear margins $BCn$ and $Cn$, we establish that contingency tables exhibit a sharp phase transition at $B = B_{c} = 1+\sqrt{1+1/C}. This yields that the independence heuristic may capture the structure of contingency tables well for near-homogeneous margins (when $B<B_{c}$), but in if the margins are not quite homogeneous ($B>B_{c}$) positive correlation between rows and columns may emerge and the heuristic fails dramatically.

Kathryn Heal

A Lighting-Invariant Point Processor for Shape from Shading

Under the conventional diffuse shading model with unknown directional lighting, the set of quadratic surface shapes that are consistent with the spatial derivatives of intensity at a single image point is a two-dimensional algebraic variety embedded in the five-dimensional space of quadratic shapes. In this talk will describe the geometry of this variety, and introduce a concise feedforward model that computes an explicit, differentiable approximation of the variety from the intensity and its derivatives at any single image point. The result is a parallelizable processor that operates at each image point and produces a lighting-invariant descriptor of the continuous set of compatible surface shapes at the point. I will describe two applications of this processor: two-shot uncalibrated photometric stereo and quadratic-surface shape from shading. This talk will touch on both theory and application.

Abeer Al Ahmadieh

Determinantal Representations and the Image of the Principal Minor Map

The principal minor map takes an n by n square matrix to the length $2^n$-vector of its principal minors. A basic question is to give necessary and sufficient conditions that characterize the image of various spaces of matrices under this map. In this talk I will describe the image of the space of complex matrices using a characterization of determinantal representations of multiaffine polynomials, based on the factorization of their Rayleigh differences. Using these techniques I will give equations and inequalities characterizing the images of the spaces of real and complex symmetric and Hermitian matrices. For complex symmetric matrices this recovers a result of Oeding from 2011. This is based on a joint work with Cynthia Vinzant.

Daniel Pimentel Alarcon

Intersecting Schubert Varieties for Subspace Clustering

Subspace clustering aims to assign a collection of data points to the smallest possible union of subspaces. This is particularly challenging when data is missing. In this talk I will describe a new approach to overcome missing data. The main idea is to search for subspaces in the Grassmannian that simultaneously fit multiple incomplete-data points. This boils down to identifying the intersection of the Schubert varieties defined by the observed data, which is easier said than done. I will tell you what we have achieved so far, and the remaining challenges.

Chung Pang Mok

Pseudorandom Vectors Generation Using Elliptic Curves And Applications to Wiener Processes

Using the arithmetic of elliptic curves over finite fields, we present an algorithm for the efficient generation of sequence of uniform pseudorandom vectors in high dimension with long period, that simulates sample sequence of a sequence of independent identically distributed random variables, with values in the hypercube $[0,1]^d$ with uniform distribution. As an application, we obtain, in the discrete time simulation, an efficient algorithm to simulate, uniformly distributed sample path sequence of a sequence of independent standard Wiener processes.

Ola Sobieska

Identifiability of Linear Compartmental Models: The Impact of Removing Leaks and Edges

In biology, pharmacokinetics, and various other fields, linear compartmental models are used to represent the transfer of substances between compartments in a system, e.g. the interchange of a drug between organs in the body. Such a model is identifiable if we can recover the flow parameters from generic input and output data. In this talk, I analyze changes to identifiability resulting from deleting a pathway between two compartments or from adding or deleting an internal leak. In particular, I will present some criteria for identifiability dependant on the combinatorics of the parent model, as well as a restatement and partial answer to a conjecture of Gross, Harrington, Meshkat, and Shu. This is joint work with Patrick Chan, Katherine Johnston, Anne Shiu, and Clare Spinner.

Diego Cifuentes

Polynomial time guarantees for the Burer-Monteiro method

The Burer-Monteiro method is one of the most widely used techniques for solving large-scale semidefinite programs (SDP). The basic idea is to solve a nonconvex program in Y, where Y is an n×p matrix such that X = YYT. In this paper, we show that this method can solve SDPs in polynomial time in a smoothed analysis setting. More precisely, we consider an SDP whose domain satisfies some compactness and smoothness assumptions, and slightly perturb the cost matrix and the constraints. We show that if p ≳ √(2+2η)m, where m is the number of constraints and η>0 is any fixed constant, then the Burer-Monteiro method can solve SDPs to any desired accuracy in polynomial time, in the setting of smooth analysis. Our bound on p approaches the celebrated Barvinok-Pataki bound in the limit as η goes to zero, beneath which it is known that the nonconvex program can be suboptimal.

Papri Dey

Hyperbolic Polynomials, and Helton-Vinnikov curves

In this talk, I shall give a panoramic view of my research work. I shall introduce the notion of hyperbolic polynomials and discuss an algebraic method to test hyperbolicity of a multivariate polynomial w.r.t some fixed point via sum-of-squares relaxation, proposed in my research work. An important class of hyperbolic polynomials are definite determinantal polynomials. Helton-Vinnikov curves in the projective plane are cut-out by hyperbolic polynomials in three variables. This leads to the computational problem of explicitly producing a symmetric positive definite linear determinantal representation for a given curve. I shall focus on two approaches to this problem proposed in my research work: an algebraic approach via solving polynomial equations, and a geometric-combinatorial approach via scalar product representations of the coefficients and its connection with polytope membership problem. The algorithms to solve determinantal representation problems are implemented in Macaulay2 as a software package DeterminantalRepresentations. m2.

Angélica Torres

The line multi-view variety in computer vision

Given an arrangement of cameras, the line multi-view variety is the Zariski closure of the image of world lines in the cameras. This is a similar definition to the multi-view variety studied extensively in computer vision. The goal of this talk is to present a description of the ideal of the line multi-view variety, and discuss some geometric properties of this variety. This is joint work with Felix Rydell, Elima Shehu, Paul Breiding, and Kathlén Kohn.

Kevin Shu

Nonnegative Quadratics over Stanley Reisner Varieties

Nonnegative polynomials are of fundamental interest in the field of real algebraic geometry. We will discuss the convex geometry of nonnegative polynomials over an interesting class of algebraic varieties known as Stanley-Reisner varieties. Stanley-Reisner varieties are fairly simple, as they are unions of coordinate planes, but they have surprising connections to algebraic topology that we can use to obtain a detailed understanding of the convex geometry of these nonnegative quadratics. We will be particularly interested in the extreme nonnegative quadratic forms over such varieties. Though this will not be emphasized in the talk, this subject also has some direct applications to optimization theory.

Jesús A. De Loera

A Combinatorial Geometry View of Inference and Learning Problems

In statistical inference we wish to learn the properties or parameters of a distribution through sufficiently many samples. A famous example is logistic regression, a popular non-linear model in multivariate statistics and supervised learning. Users often rely on optimizing of maximum likelihood estimation, but how much training data do we need, as a function of the dimension of the covariates of the data, before we expect an MLE to exist with high probability? Similarly, for unsupervised learning and non-parametric statistics, one wishes to uncover the shape and patterns from samples of a measure or measures. We use only the intrinsic geometry and topology of the sample. A famous example of this type of method is the $k$-means clustering algorithm. A fascinating challenge is to explain the variability of behavior of $k$-means algorithms with distinct random initializations and the shapes of the clusters.

In this talk we present new stochastic combinatorial theorems, that give bounds on the probability of existence of maximum likelihood estimators in multinomial logistic regression and also relate to the variability of clustering initializations. Along the way we will see fascinating connections to the coupon collector problem, topological data analysis, and to the computation of Tukey centerpoints of data clouds (a high-dimensional generalization of median). This is joint work with T. Hogan, R. D. Oliveros, E. Jaramillo-Rodriguez, and A. Torres-Hernandez.

Timo de Wolff

Certificates of Nonnegativity and Maximal Mediated Sets

In science and engineering, we regularly face polynomial optimization problems, that is: minimize a real, multivariate polynomial under polynomial constraints. Solving these problems is essentially equivalent to certifying of nonnegativity of real polynomials -- a key problem in real algebraic geometry since the 19th century. Since this problem is notoriously hard to solve, one is interested in certificates that imply nonnegativity and are easier to check than nonnegativity itself. In particular, a polynomial is nonnegative if it is a sums of squares (SOS) of other polynomials. In 2014, Iliman and I introduced a new certificate of nonnegativity based on sums of nonnegative circuit polynomials (SONC), which I have developed further since then both in theory and practice joint with different coauthors.

A canonical question is: When is a nonnegative circuit polynomial also a sums of squares. It turns out that this question is purely combinatorial and is governed by sets of particular (integral) lattice points called maximal mediated sets, which where first introduced by Reznick. In this talk, I will give a brief introduction to nonnegativity, SOS, and SONC, and then discuss recent results on maximal mediated sets.

Jeroen Zuiddam

Geometric rank of tensors and applications

There are several important problems in computational complexity and extremal combinatorics that can naturally be phrased in terms of tensors. This includes the matrix multiplication problem, the cap set problem and the sunflower problem. We present a new geometric method in this area, called Geometric Rank (Kopparty–Moshkovitz–Zuiddam, 2020). Geometric Rank is a natural extension of matrix rank to tensors. As an application, we use Geometric Rank to solve a problem of Strassen about matrix multiplication tensors.

Anna Seigal

Invariant Theory for Maximum Likelihood Estimation

Maximum likelihood estimation is an optimization problem over a statistical model, to obtain the parameters that best fit observed data. I will focus on two settings: log-linear models and Gaussian group models. I will describe connections between maximum likelihood estimation and notions of stability from invariant theory. This talk is based on joint work with Carlos Améndola, Kathlén Kohn and Philipp Reichenbach.

Hamza Fawzi

Semidefinite representations, and the set of separable states

The convex set of separable states plays a fundamental role in quantum information theory as it corresponds to the set of non-entangled states. In this talk I will discuss the question of semidefinite programming representations for this convex set. Using connections with nonnegative polynomials and sums of squares I will show that except in the low-dimensions, this semialgebraic set has no semidefinite representation. As a consequence this set furnishes a counter-example to the Helton-Nie conjecture. The proof I will present is elementary and relies only on basic results about semialgebraic sets and functions.

James Saunderson

Lifting For Simplicity: Concise Descriptions of Convex Sets

This talk will give a selective overview of the theory and applications of lifts of convex sets. A lift of a convex set is a higher-dimensional convex set that projects onto the original set. Many interesting convex sets have lifts that are dramatically simpler to describe than the original set. Finding such simple lifts has significant algorithmic implications, particularly for associated optimization problems. We will consider both the classical case of polyhedral lifts, which are described by linear inequalities, as well as spectrahedral lifts, which are defined by linear matrix inequalities. The talk will include discussion of ways to construct lifts, ways to find obstructions to the existence of lifts, and a number of interesting examples from a variety of mathematical contexts. (Based on joint work with H. Fawzi, J. Gouveia, P. Parrilo, and R. Thomas).

Greg Blekherman

Locally Positive Semidefinite Matrices

The cone of positive semidefinite matrices plays a prominent role in optimization, and many hard computational problems have well-performing semidefinite relaxations. In practice, enforcing the constraint that a large matrix is positive semidefinite can be expensive. We introduce the cone of k-locally posiitive semidefinite matrices, which consists of matrices all of whose k by k principal submatrices are positive semidefinite. We consider the distance between the cones of positive and locally positive semidefinite matrices, and possible eigenvalues of locally positive semidefinite matrices. Hyperbolic polynomials play a role in some of the proofs. Joint work with Santanu Dey, Marco Molinaro, Kevin Shu and Shengding Sun.

Carla Michini

Short simplex paths in lattice polytopes

We consider the problem of optimizing a linear function over a lattice polytope P contained in [0,k]^n and defined via m linear inequalities. We design a simplex algorithm that, given an initial vertex, reaches an optimal vertex by tracing a path along the edges of P of length at most O(n^6 k log k). The length of this path is independent on m and is the best possible up to a polynomial function, since it is only polynomially far from the worst case diameter. The number of arithmetic operations needed to compute the next vertex in the path is polynomial in n, m and log k. If k is polynomially bounded by n and m, the algorithm runs in strongly polynomial time. This is a joint work with Alberto Del Pia.

Alisha Zachariah

Efficiently Estimating a Sparse Delay-Doppler Channel

Multiple wireless sensing tasks, e.g., radar detection for driver safety, involve estimating the ”channel” or relationship between signal transmitted and received. In this talk, I will focus on a certain type of channel known as the delay-doppler channel. This channel model starts to be applicable in high frequency carrier settings, which are increasingly common with recent developments in mmWave technology. Moreover, in this setting, both the channel model and existing technologies are amenable to working with signals of large bandwidth, and using such signals is a standard approach to achieving high resolution channel estimation. However, when high resolution is desirable, this approach creates a tension with the desire for efficiency because, in particular, it immediately implies that the signals in play live in a space of very high dimension N (e.g., ~10^6 in some applications), as per the Shannon-Nyquist sampling theorem.

To address this, I will propose a randomized algorithm for channel estimation in the k-sparse setting (e.g., k objects in radar detection), with sampling and space complexity both on the order of k(log N)^2, and arithmetic complexity on the order of k(log N)^3+k^2, for N sufficiently large.

While this algorithm seems to be extremely efficient -- to the best of our knowledge, the first of this nature in terms of complexity -- it is just a simple combination of three ingredients, two of which are well-known and widely used, namely digital chirp signals and discrete Gaussian filter functions, and the third being recent developments in Sparse Fast Fourier Transform algorithms.

Ruhui Jin (UW Madison)

Scalable symmetric Tucker tensor decomposition

In this talk, we will study the best low-rank Tucker decomposition of symmetric tensors. The main motivation is in decomposing higher-order multivariate moments, which has crucial applications in data science.

We will show the scalable adaptations of the projected gradient descent (PGD) and the higher-order eigenvalue decomposition (HOEVD) methods to decompose sample moment tensors. With the help of implicit and streaming techniques, we can evade the overhead cost of building and storing the moment tensor. Such reductions make computing the Tucker decomposition realizable for large data instances in high dimensions.

We will also demonstrate the efficiency of the algorithms and the applicability to real-world datasets. For convergence guarantee, the update sequence derived by the PGD solver achieves first and second-order criticality.

Max Hill (UW Madison)

An algebraic approach to understanding long branch attraction in phylogenetics

Reconstructing and understanding the Tree of Life is a central question biology, and while genetic data has proven to be an essential tool in studying the evolutionary relationships between living organisms, the "best" way to use such data to evolutionary trees is a major area of study. One commonly-used approach is maximum likelihood: one first assumes that the gene sequences evolved according to some model of evolution parameterized by an evolutionary tree parameter (representing a hypothesis about the species' evolutionary histories). Then, given some observed data, one can estimate the tree which maximizes the likelihood of the data under the model. However, sometimes this method fails catastrophically in ways that are not yet fully understood.

In this talk, I will present ongoing work with Jose Rodriguez investigating Long Branch Attraction (LBA), a phenomenon in which maximum likelihood incorrectly infers distantly related taxa to be much more closely related than they really are. We present an approach using phylogenetic invariants and homotopy continuation to numerically solve for the MLE in the simplest case of a four leaf tree, and use this to establish criteria for LBA susceptibility using Likelihood Ratios.

Fred Sala (UW Madison, CS)

Moment Techniques for Weakly-Supervised Learning

Manually labeling data is a major bottleneck for building large-scale training sets used for modern machine learning models. Weak supervision methods instead assemble a collection of cheap-but-noisy sources to estimate labels. These sources have unknown qualities and interactions, and therefore learning latent variable models to estimate their reliabilities is a useful goal. I will start by describing techniques that rely on algebraic relationships between moments to learn such models, focusing on identifiability, consistency, and finite-sample bounds. In the second part of the talk, I will discuss how to generalize such techniques to structured prediction cases where the label is an element of a structured space (for example, a ranking or a graph). I will introduce an approach to estimation that combines pseudo-Euclidean embeddings and tensor decomposition algorithms. Finally, I will describe some opportunities and outstanding challenges.

Marina Garrote López (UBC)

Using Algebraic and Semi-Algebraic conditions for Phylogenetic Reconstruction

Phylogenetic reconstruction aims to estimate the evolutionary relationships among different biological entities using only information from their genomes. Typically, DNA sequences are assumed to evolve according to a Markov process on a phylogenetic tree governed by a model of nucleotide substitutions. The distribution of nucleotide patterns in the leaves of a phylogenetic tree is represented by a vector whose entries satisfy certain algebraic relationships and can be expressed as polynomials over the model parameters. The study of these relationships and of the geometry of the algebraic varieties defined by them has provided successful insight into the problem of phylogenetic reconstruction.

However, not all points in the algebraic varieties have biological or probabilistic sense, and it is essential to study the subset of varieties that has biological meaning. In this talk we will emphasize the importance of studying this subsets, characterize them and prove that, in some cases, it is fundamental for phylogenetic reconstruction. We will discuss the use of maximum likelihood estimation, Euclidean distance minimization over the corresponding subset of the variety, and a new phylogenetic reconstruction method called ASAQ, based on some semi-algebraic conditions.

We will also present some results on the maximum likelihood degree and the Euclidean distance degree for phylogenetic models, which measure the complexity of optimization problems, and discuss their relationship and differences. Ultimately, this talk aims to provide insights into the problem of phylogenetic reconstruction and to show how algebraic and semi-algebraic methods can help in the development of more accurate and effective methods for this task.

Serkan Hosten (SFSU)

Solving Polynomial Systems for Optimization and Neural Networks

This talk has two parts. In the first part, I will tell about computing the degree of the central curve in linear, quadratic, and semidefinite programming problems arising in the context of interior point algorithms. I will present a new proof for this degree in the linear programming case based on counting the number of solutions to zero dimensional polynomial systems using the BKK bound. In the case of quadratic programming, a very similar system of polynomials gives an upper bound to the degree of the corresponding central curve.

In the second part, I will tell you about training deep linear neural networks and how solving systems of polynomials arise in this context. For networks with a single hidden layer trained on a single data point I will give an improved bound on the number of complex critical points of the loss function for the network. I will also show that for any number of hidden layers complex critical points with zero coordinates arise in certain patterns which we completely classify for networks with one hidden layer. I report the results of computational experiments with varying network architectures defining small deep linear networks using HomotopyContinuation.jl.

Elizabeth W. (Eliza) O'Reilly (Caltech)

Spectrahedral Regression

Convex regression is the problem of fitting a convex function to a collection of input-output pairs, and arises naturally in applications such as economics, engineering and computed tomography. We present a new approach to this problem called spectrahedral regression, in which we fit a spectrahedral function to the data, i.e. a function that is the maximum eigenvalue of an affine matrix expression of the input. This method generalizes polyhedral (also called max-affine) regression, in which a maximum of a fixed number of affine functions is fit to the data. We first provide bounds on how well spectrahedral functions can approximate arbitrary convex functions via statistical risk analysis. Second, we analyze an alternating minimization algorithm for the non-convex optimization problem of fitting a spectrahedral function to data. Finally, we demonstrate the utility of our approach with experiments on synthetic and real data sets. This talk is based on joint work with Venkat Chandrasekaran.

Alberto Del Pia (UW Madison, I&SE)

Minimizing quadratics over integers

Mixed integer quadratic programming is the problem of minimizing a quadratic polynomial over points in a polyhedral region with some integer components. It is a natural extension of mixed integer linear programming and it has a wide array of applications. In this talk, I will survey some recent theoretical developments in mixed integer quadratic programming, with a focus on complexity, algorithms, and fundamental properties.

Related events to note

date	event/title	location/speaker	info
Friday at 2:30pm	Algebra+Algebraic Geometry Seminar	Van Vleck	Mailing list
Sporadically	Machine Assistance for Algebra	UW Madison
Biweekly Mondays	Algebraic Statistics Online Seminar (ASOS)	Virtual: Recordings	Mailing list sign-up for Zoom-links
Monthly	Fellowship of the Ring	Online (Utah)

@@ Line 11: / Line 11: @@
 '''How''':  Talks are in-person talks and 40-50 minutes long.
-'''Inclusiveness''':  We  welcome participants with applied algebra interests from other institutions and departments. We have social time 5-10 minutes before the talk starts.
+'''Inclusivity''':  We  welcome participants with applied algebra interests from other institutions and departments. We have social time 5-10 minutes before the talk starts.
 == Fall 2024 Schedule ==
@@ Line 81: / Line 81: @@
 |VV 901
 |[https://yuliaalexandr.github.io/ Yulia Alexandr]
-|Mixtures of Discrete Decomposable Graphical Models
+|[[Applied Algebra Seminar Fall 2024#Yulia Alexandr|Mixtures of Discrete Decomposable Graphical Models]]
 |-
 |Nov 14*
 |VV 901
 |[https://web.eecs.umich.edu/~girasole/ Laura Balzano]
-|TBD (Likely related to low rank compression and varieties)
+|Efficient Low-Dimensional Compression for Deep Overparameterized Learning and Fine-Tuning
 |-
 |Nov 21
 |VV 901
-|Reserved for Jordan's visitor
+|Eitan Levin
+|Any-dimensional optimization
+|-
+|Spring 2025
 |
-|-
-|Dec 12
-|VV 901
 |Aviva Englander
 |Identifiability of Level-2 Networks
@@ Line 101: / Line 101: @@
 |
 |[https://www.hannahyfriedman.com/ Hannah Friedman]
+|TBD
+|-
+|Spring 2025
 |
+|[https://www.yingyingwu.io/ Yingying Wu]
+|TBD
 |}
 ==Abstracts (2024 Fall)==
+===Yulia Alexandr===
+'''Mixtures of Discrete Decomposable Graphical Models'''
+Graphical models capture relationships between random variables, with nodes representing variables and edges indicating dependencies. Their natural graph-based parametrization makes them useful in statistics, biology, and phylogenetics. In practice, one often encounters latent variables that are impossible or infeasible to measure directly, which can lead to geometric challenges. Mixture models, featuring only one hidden variable, are among the most accessible latent variable models. I will talk about mixtures of special families of discrete graphical models, called decomposable. I will focus on their algebraic properties, such as polynomial constraints that distributions in such models must satisfy. This talk is based on joint work with Jane Ivy Coons and Nils Sturma.
+=== Laura Balzano ===
+'''Efficient Low-Dimensional Compression for Deep Overparameterized Learning and Fine-Tuning'''
+While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that leveraging inherent low-dimensional structure within the model parameter updates, we can reap the benefits of overparameterization without the computational burden. We study two problems: deep low-rank matrix completion and low-rank fine-tuning of deep networks. First, we develop theory showing that when training the parameters of a deep linear network to fit a low-rank or wide matrix, the gradient dynamics of each weight matrix are confined to a fixed low-dimensional subspace. Consequently, we can construct and train compact, highly compressed factorizations possessing the same benefits as their overparameterized counterparts. For language model fine-tuning, we introduce a method called “Deep LoRA”, which improves the existing low-rank adaptation (LoRA) technique. While this technique does not arise directly from our theory, it involves only a minor modification that is surprisingly effective and of great interest for future theoretical study. The effectiveness of Deep LoRA is validated through its performance on natural language understanding tasks, particularly when fine-tuning with a limited number of samples.
 === Emily King ===
@@ Line 142: / Line 156: @@
 The answer to this identifiability question is "no". We quantify the extent to which identifiability fails by studying the degree of a corresponding branched cover and its Galois group. We show that there is no general formula in terms of radicals expressing the lengths in terms of the areas. Finally, we extend these results to other recovery problems on volumes of simplices, encoded by the algebraic matroid of the Heron variety.
+=== Eitan Levin (Caltech) ===
+'''Any-dimensinal optimization'''
+Optimization problems in many applications arise as sequences indexed by dimension. Indeed, graph problems are defined for graphs of all sizes, regularizers for inverse problems are defined for signals and images of any size, and (quantum) information problems are defined for distributions over any number of (qu)bits. We present a framework to systematically study such sequences of optimization problems, fit tractable data-driven relaxations, and derive lower bounds on their limiting values. We do so by observing that these problems are described in a “free” way, which makes it obvious how to instantiate them in any dimension. We explain how such free descriptions arise from the phenomenon of representation stability, and study their properties by considering the relations between problems across dimensions and their symmetries.
 ==Earlier Abstracts==