# Difference between revisions of "Probability Seminar"

(145 intermediate revisions by 8 users not shown) | |||

Line 1: | Line 1: | ||

__NOTOC__ | __NOTOC__ | ||

+ | [[Probability | Back to Probability Group]] | ||

− | = Fall | + | = Fall 2022 = |

− | <b>Thursdays in 901 Van Vleck Hall | + | <b>Thursdays at 2:30 PM either in 901 Van Vleck Hall or on Zoom</b> |

− | |||

− | + | We usually end for questions at 3:20 PM. | |

+ | |||

+ | [https://uwmadison.zoom.us/j/91828707031?pwd=YUJXMUJkMDlPR0VRdkRCQVJtVndIdz09 ZOOM LINK. Valid only for online seminars.] | ||

If you would like to sign up for the email list to receive seminar announcements then please join [https://groups.google.com/a/g-groups.wisc.edu/forum/#!forum/probsem our group]. | If you would like to sign up for the email list to receive seminar announcements then please join [https://groups.google.com/a/g-groups.wisc.edu/forum/#!forum/probsem our group]. | ||

+ | |||

+ | |||

+ | == September 22, 2022, in person: [https://sites.google.com/site/pierreyvesgl/home Pierre Yves Gaudreau Lamarre] (University of Chicago) == | ||

+ | |||

+ | '''Moments of the Parabolic Anderson Model with Asymptotically Singular Noise''' | ||

+ | |||

+ | The Parabolic Anderson Model (PAM) is a stochastic partial differential equation that describes the time-evolution of particle system with the following dynamics: Each particle in the system undergoes a diffusion in space, and as they are moving through space, the particles can either multiply or get killed at a rate that depends on a random environment. | ||

+ | |||

+ | One of the fundamental problems in the theory of the PAM is to understand its behavior at large times. More specifically, the solution of the PAM at large times tends to be intermittent, meaning that most of the particles concentrate in small regions where the environment is most favorable for particle multiplication. | ||

− | == September | + | In this talk, we discuss a new technique to study intermittency in the PAM with a singular random environment. In short, the technique consists of approximating the singular PAM with a regularized version that becomes increasingly singular as time goes to infinity. |

+ | |||

+ | This talk is based on a joint work with Promit Ghosal and Yuchen Liao. | ||

+ | |||

+ | == September 29, 2022, in person: Christian Gorski (Northwestern University) == | ||

+ | |||

+ | '''Strict monotonicity for first passage percolation on graphs of polynomial growth and quasi-trees''' | ||

+ | |||

+ | I'll present strict monotonicity results for first passage percolation (FPP) on bounded degree graphs which either have strict polynomial growth (uniform upper and lower volume growth bounds of the same polynomial degree) or are quasi-isometric to a tree; the case of the standard Cayley graph of Z^d is due to van den Berg and Kesten (1993). Roughly speaking, if we use two different weight distributions to perform FPP on a fixed graph, and one of the distributions is "larger" than the other and "subcritical" in some appropriate sense, then the expected passage times with respect to that distribution exceed those of the other distribution by an amount proportional to the graph distance. | ||

+ | If "larger" here refers to stochastic domination of measures, this result is closely related to "absolute continuity with respect to the expected empirical measure," that is, the fact that long geodesics "use all possible weights". If "larger" here refers to variability (another ordering on measures), then a strict monotonicity theorem holds if and only if the graph also satisfies a condition we call "admitting detours". I intend to sketch the proof of absolute continuity, and, if time allows, give some indication of the difficulties that arise when proving strict monotonicity with respect to variability. | ||

+ | |||

+ | == October 6, 2022, in person: [https://danielslonim.github.io/ Daniel Slonim] (University of Virginia) == | ||

+ | |||

+ | '''Random Walks in (Dirichlet) Random Environments with Jumps on Z''' | ||

+ | |||

+ | We introduce the model of random walks in random environments (RWRE), which are random Markov chains on the integer lattice. These random walks are well understood in the nearest-neighbor, one-dimensional case due to reversibility of almost every Markov chain. For example, directional transience and limiting speed can be characterized in terms of simple expectations involving the transition probabilities at a single site. The reversibility is lost, however, if we go up to higher dimensions or relax the nearest-neighbor assumption by allowing jumps, and therefore much less is known in these models. Despite this non-reversibility, certain special cases have proven to be more tractable. Random Walks in Dirichlet environments (RWDE), where the transition probability vectors are drawn according to a Dirichlet distribution, have been fruitfully studied in the nearest-neighbor, higher dimensional setting. We look at RWDE in one dimension with jumps and characterize when the walk is ballistic: that is, when it has non-zero limiting velocity. It turns out that in this model, there are two factors which can cause a directionally transient walk to have zero limiting speed: finite trapping and large-scale backtracking. Finite trapping involves finite subsets of the graph where the walk is liable to get trapped for a long time. It is a highly local phenomenon that depends heavily on the structure of the underlying graph. Large-scale backtracking is a more global and one-dimensional phenomenon. The two operate "independently" in the sense that either can occur with or without the other. Moreover, if neither factor on its own is enough to cause zero speed, then the walk is ballistic, so the two factors cannot conspire together to slow a walk down to zero speed if neither is sufficient to do so on its own. This appearance of two independent factors affecting ballisticity is a new feature not seen in any previously studied RWRE models. | ||

+ | |||

+ | == October 13, 2022, [https://uwmadison.zoom.us/j/91828707031?pwd=YUJXMUJkMDlPR0VRdkRCQVJtVndIdz09 ZOOM]: [https://www.maths.univ-evry.fr/pages_perso/loukianova/ Dasha Loukianova] (Université d'Évry Val d'Essonne) == | ||

+ | |||

+ | '''"In law" ergodic theorem for the environment viewed from Sinaï's walk''' | ||

+ | |||

+ | For Sinaï's walk <math>\scriptsize(X_k)</math> we show that the empirical measure of the environment seen from the particle converges in law to some random measure. This limit measure is explicitly given in terms of the infinite valley, which construction goes back to Golosov. As a consequence an "in law" ergodic theorem holds for additive functionals of the environment's chain. When the limit in this theorem is deterministic, it holds in probability. This allows some extensions to the recurrent case of the ballistic "environment's method" dating back to Kozlov and Molchanov. In particular, we show an LLN and a mixed CLT for the sums <math>\scriptsize\sum_{k=1}^nf(\Delta X_k)</math> where <math>\scriptsize f</math> is bounded and | ||

+ | depending on the steps <math>\scriptsize\Delta X_k:=X_{k+1}-X_k</math>. | ||

− | ''' | + | == October 20, 2022, '''4pm, VV911''', in person: [https://tavarelab.cancerdynamics.columbia.edu/ Simon Tavaré] (Columbia University) == |

+ | ''Note the unusual time and room!'' | ||

− | ''' | + | '''An introduction to counts-of-counts data''' |

− | + | Counts-of-counts data arise in many areas of biology and medicine, and have been studied by statisticians since the 1940s. One of the first examples, discussed by R. A. Fisher and collaborators in 1943 [1], concerns estimation of the number of unobserved species based on summary counts of the number of species observed once, twice, … in a sample of specimens. The data are summarized by the numbers ''C<sub>1</sub>, C<sub>2</sub>, …'' of species represented once, twice, … in a sample of size | |

− | ''' | + | ''N = C<sub>1</sub> + 2 C<sub>2</sub> + 3 C<sub>3</sub> + <sup>….</sup>'' containing ''S = C<sub>1</sub> + C<sub>2</sub> + <sup>…</sup>'' species; the vector ''C ='' ''(C<sub>1</sub>, C<sub>2</sub>, …)'' gives the counts-of-counts. Other examples include the frequencies of the distinct alleles in a human genetics sample, the counts of distinct variants of the SARS-CoV-2 S protein obtained from consensus sequencing experiments, counts of sizes of components in certain combinatorial structures [2], and counts of the numbers of SNVs arising in one cell, two cells, … in a cancer sequencing experiment. |

− | ''' | + | In this talk I will outline some of the stochastic models used to model the distribution of ''C,'' and some of the inferential issues that come from estimating the parameters of these models. I will touch on the celebrated Ewens Sampling Formula [3] and Fisher’s multiple sampling problem concerning the variance expected between values of ''S'' in samples taken from the same population [3]. Variants of birth-death-immigration processes can be used, for example when different variants grow at different rates. Some of these models are mechanistic in spirit, others more statistical. For example, a non-mechanistic model is useful for describing the arrival of covid sequences at a database. Sequences arrive one at a time, and are either a new variant, or a copy of a variant that has appeared before. The classical Yule process with immigration provides a starting point to model this process, as I will illustrate. |

− | |||

− | + | ''References'' | |

− | + | [1] Fisher RA, Corbet AS & Williams CB. J Animal Ecology, 12, 1943 | |

− | + | [2] Arratia R, Barbour AD & Tavaré S. ''Logarithmic Combinatorial Structures,'' EMS, 2002 | |

− | + | [3] Ewens WJ. Theoret Popul Biol, 3, 1972 | |

− | + | [4] Da Silva P, Jamshidpey A, McCullagh P & Tavaré S. Bernoulli Journal, in press, 2022 (online) | |

− | + | == October 27, 2022, [https://uwmadison.zoom.us/j/91828707031?pwd=YUJXMUJkMDlPR0VRdkRCQVJtVndIdz09 ZOOM]: [https://www-users.cse.umn.edu/~arnab/ Arnab Sen] (University of Minnesota, Twin Cities) == | |

− | + | '''Maximum weight matching on sparse graphs''' | |

− | + | We consider the maximum weight matching of a finite bounded degree graph whose edges have i.i.d. random weights. It is natural to ask whether the weight of the maximum weight matching follows a central limit theorem. We obtain an affirmative answer to the above question in the case when the weight distribution is exponential and the graphs are locally tree-like. The key component of the proof involves a cavity analysis on arbitrary bounded degree trees which yields a correlation decay for the maximum weight matching. The central limit theorem holds if we take the underlying graph to be also random with i.i.d. degree distribution (configuration model). | |

− | + | This is joint work with Wai-Kit Lam. | |

− | == | + | == November 3, 2022, in person: [https://www.ias.edu/scholars/sky-yang-cao Sky Cao] (Institute for Advanced Study) == |

− | + | '''Exponential decay of correlations in finite gauge group lattice gauge theories''' | |

− | + | Lattice gauge theories with finite gauge groups are statistical mechanical models, very much akin to the Ising model, but with some twists. In this talk, I will describe how to show exponential decay of correlations for these models at low temperatures. This is based on joint work with Arka Adhikari. | |

− | == | + | == November 10, 2022, in person: [https://ifds.info/david-clancy/ David Clancy] (UW-Madison) == |

− | + | '''Component Sizes of the degree corrected stochastic blockmodel''' | |

− | + | The stochastic blockmodel (SBM) is a simple probabilistic model for graphs which exhibit clustering and is used to test algorithms for detecting these clusters. Each vertex is assigned a type ''i = 1, 2, ..., m'' and edges are included independently with probability depending on the types of the two incident vertices. The degree corrected SBM (DCSBM) exhibits similar clustering behavior but allows for inhomogeneous degree distributions. The sizes of connected components for these graph models are not well understood unless ''m = 1'' or the SBM is a random bipartite graph. We show that under fairly general conditions, the asymptotic sizes of connected components in the DCSBM can be precisely described in terms of a multiparameter and multidimensional random field. Not only that, but we describe the asymptotic proportion of vertices of each type in each of the macroscopic connected components. This talk is based on joint work with Vitalii Konarovskyi and Vlada Limic. | |

− | == November | + | == November 17, 2022, [https://uwmadison.zoom.us/j/91828707031?pwd=YUJXMUJkMDlPR0VRdkRCQVJtVndIdz09 ZOOM]: [https://sites.google.com/site/leandroprpimentel/ Leandro Pimentel] (Federal University of Rio de Janeiro) == |

− | + | '''Integration by Parts and the KPZ Two-Point Function''' | |

− | + | In this talk we will consider two models within Kardar-Parisi-Zhang (KPZ) universality class, and apply the integration by parts formula from Malliavin calculus to establish a key relation between the two-point correlation function, the polymer end-point distribution and the second derivative of the variance of the associated height function. Besides that, we will further develop an adaptation of Malliavin-Stein method that quantifies asymptotic independence with respect to the initial data. | |

− | == | + | == December 1, 2022, in person: [https://cims.nyu.edu/~ajd594/ Alex Dunlap] (Courant Institute) == |

− | |||

− | + | == December 8, 2022, in person: [https://sites.northwestern.edu/juliagaudio/ Julia Gaudio] (Northwestern University) == | |

+ | |||

+ | '''Finding Communities in Networks''' | ||

+ | |||

+ | Networks are used to represent physical, biological, and social systems. Many networks exhibit community structure, meaning that there are two or more groups of nodes which are densely connected. Identifying these communities gives valuable insights about the latent features of the nodes. Community detection has been used in a wide array of applications including online advertising, recommender systems (e.g., Netflix), webpage sorting, fraud detection, and neurobiology. | ||

+ | |||

+ | I will present my work on efficient algorithms for community detection in three contexts. <br> | ||

+ | (1) Censored networks: How can we identify communities when some connectivity information is missing? <br> | ||

+ | (2) Higher-order networks: Beyond pairwise relationships <br> | ||

+ | (3) Multiple correlated networks: How can we effectively combine data from multiple networks? <br> | ||

+ | |||

+ | Joint work with: Souvik Dhara, Nirmit Joshi, Elchanan Mossel, Miklós Rácz, Colin Sandon, and Anirudh Sridhar | ||

[[Past Seminars]] | [[Past Seminars]] |

## Latest revision as of 09:46, 25 November 2022

# Fall 2022

**Thursdays at 2:30 PM either in 901 Van Vleck Hall or on Zoom**

We usually end for questions at 3:20 PM.

ZOOM LINK. Valid only for online seminars.

If you would like to sign up for the email list to receive seminar announcements then please join our group.

## September 22, 2022, in person: Pierre Yves Gaudreau Lamarre (University of Chicago)

**Moments of the Parabolic Anderson Model with Asymptotically Singular Noise**

The Parabolic Anderson Model (PAM) is a stochastic partial differential equation that describes the time-evolution of particle system with the following dynamics: Each particle in the system undergoes a diffusion in space, and as they are moving through space, the particles can either multiply or get killed at a rate that depends on a random environment.

One of the fundamental problems in the theory of the PAM is to understand its behavior at large times. More specifically, the solution of the PAM at large times tends to be intermittent, meaning that most of the particles concentrate in small regions where the environment is most favorable for particle multiplication.

In this talk, we discuss a new technique to study intermittency in the PAM with a singular random environment. In short, the technique consists of approximating the singular PAM with a regularized version that becomes increasingly singular as time goes to infinity.

This talk is based on a joint work with Promit Ghosal and Yuchen Liao.

## September 29, 2022, in person: Christian Gorski (Northwestern University)

**Strict monotonicity for first passage percolation on graphs of polynomial growth and quasi-trees**

I'll present strict monotonicity results for first passage percolation (FPP) on bounded degree graphs which either have strict polynomial growth (uniform upper and lower volume growth bounds of the same polynomial degree) or are quasi-isometric to a tree; the case of the standard Cayley graph of Z^d is due to van den Berg and Kesten (1993). Roughly speaking, if we use two different weight distributions to perform FPP on a fixed graph, and one of the distributions is "larger" than the other and "subcritical" in some appropriate sense, then the expected passage times with respect to that distribution exceed those of the other distribution by an amount proportional to the graph distance. If "larger" here refers to stochastic domination of measures, this result is closely related to "absolute continuity with respect to the expected empirical measure," that is, the fact that long geodesics "use all possible weights". If "larger" here refers to variability (another ordering on measures), then a strict monotonicity theorem holds if and only if the graph also satisfies a condition we call "admitting detours". I intend to sketch the proof of absolute continuity, and, if time allows, give some indication of the difficulties that arise when proving strict monotonicity with respect to variability.

## October 6, 2022, in person: Daniel Slonim (University of Virginia)

**Random Walks in (Dirichlet) Random Environments with Jumps on Z**

We introduce the model of random walks in random environments (RWRE), which are random Markov chains on the integer lattice. These random walks are well understood in the nearest-neighbor, one-dimensional case due to reversibility of almost every Markov chain. For example, directional transience and limiting speed can be characterized in terms of simple expectations involving the transition probabilities at a single site. The reversibility is lost, however, if we go up to higher dimensions or relax the nearest-neighbor assumption by allowing jumps, and therefore much less is known in these models. Despite this non-reversibility, certain special cases have proven to be more tractable. Random Walks in Dirichlet environments (RWDE), where the transition probability vectors are drawn according to a Dirichlet distribution, have been fruitfully studied in the nearest-neighbor, higher dimensional setting. We look at RWDE in one dimension with jumps and characterize when the walk is ballistic: that is, when it has non-zero limiting velocity. It turns out that in this model, there are two factors which can cause a directionally transient walk to have zero limiting speed: finite trapping and large-scale backtracking. Finite trapping involves finite subsets of the graph where the walk is liable to get trapped for a long time. It is a highly local phenomenon that depends heavily on the structure of the underlying graph. Large-scale backtracking is a more global and one-dimensional phenomenon. The two operate "independently" in the sense that either can occur with or without the other. Moreover, if neither factor on its own is enough to cause zero speed, then the walk is ballistic, so the two factors cannot conspire together to slow a walk down to zero speed if neither is sufficient to do so on its own. This appearance of two independent factors affecting ballisticity is a new feature not seen in any previously studied RWRE models.

## October 13, 2022, ZOOM: Dasha Loukianova (Université d'Évry Val d'Essonne)

**"In law" ergodic theorem for the environment viewed from Sinaï's walk**

For Sinaï's walk [math]\displaystyle{ \scriptsize(X_k) }[/math] we show that the empirical measure of the environment seen from the particle converges in law to some random measure. This limit measure is explicitly given in terms of the infinite valley, which construction goes back to Golosov. As a consequence an "in law" ergodic theorem holds for additive functionals of the environment's chain. When the limit in this theorem is deterministic, it holds in probability. This allows some extensions to the recurrent case of the ballistic "environment's method" dating back to Kozlov and Molchanov. In particular, we show an LLN and a mixed CLT for the sums [math]\displaystyle{ \scriptsize\sum_{k=1}^nf(\Delta X_k) }[/math] where [math]\displaystyle{ \scriptsize f }[/math] is bounded and depending on the steps [math]\displaystyle{ \scriptsize\Delta X_k:=X_{k+1}-X_k }[/math].

## October 20, 2022, **4pm, VV911**, in person: Simon Tavaré (Columbia University)

*Note the unusual time and room!*

**An introduction to counts-of-counts data**

Counts-of-counts data arise in many areas of biology and medicine, and have been studied by statisticians since the 1940s. One of the first examples, discussed by R. A. Fisher and collaborators in 1943 [1], concerns estimation of the number of unobserved species based on summary counts of the number of species observed once, twice, … in a sample of specimens. The data are summarized by the numbers *C _{1}, C_{2}, …* of species represented once, twice, … in a sample of size

*N = C _{1} + 2 C_{2} + 3 C_{3} + ^{….}* containing

*S = C*species; the vector

_{1}+ C_{2}+^{…}*C =*

*(C*gives the counts-of-counts. Other examples include the frequencies of the distinct alleles in a human genetics sample, the counts of distinct variants of the SARS-CoV-2 S protein obtained from consensus sequencing experiments, counts of sizes of components in certain combinatorial structures [2], and counts of the numbers of SNVs arising in one cell, two cells, … in a cancer sequencing experiment.

_{1}, C_{2}, …)In this talk I will outline some of the stochastic models used to model the distribution of *C,* and some of the inferential issues that come from estimating the parameters of these models. I will touch on the celebrated Ewens Sampling Formula [3] and Fisher’s multiple sampling problem concerning the variance expected between values of *S* in samples taken from the same population [3]. Variants of birth-death-immigration processes can be used, for example when different variants grow at different rates. Some of these models are mechanistic in spirit, others more statistical. For example, a non-mechanistic model is useful for describing the arrival of covid sequences at a database. Sequences arrive one at a time, and are either a new variant, or a copy of a variant that has appeared before. The classical Yule process with immigration provides a starting point to model this process, as I will illustrate.

*References*

[1] Fisher RA, Corbet AS & Williams CB. J Animal Ecology, 12, 1943

[2] Arratia R, Barbour AD & Tavaré S. *Logarithmic Combinatorial Structures,* EMS, 2002

[3] Ewens WJ. Theoret Popul Biol, 3, 1972

[4] Da Silva P, Jamshidpey A, McCullagh P & Tavaré S. Bernoulli Journal, in press, 2022 (online)

## October 27, 2022, ZOOM: Arnab Sen (University of Minnesota, Twin Cities)

**Maximum weight matching on sparse graphs**

We consider the maximum weight matching of a finite bounded degree graph whose edges have i.i.d. random weights. It is natural to ask whether the weight of the maximum weight matching follows a central limit theorem. We obtain an affirmative answer to the above question in the case when the weight distribution is exponential and the graphs are locally tree-like. The key component of the proof involves a cavity analysis on arbitrary bounded degree trees which yields a correlation decay for the maximum weight matching. The central limit theorem holds if we take the underlying graph to be also random with i.i.d. degree distribution (configuration model).

This is joint work with Wai-Kit Lam.

## November 3, 2022, in person: Sky Cao (Institute for Advanced Study)

**Exponential decay of correlations in finite gauge group lattice gauge theories**

Lattice gauge theories with finite gauge groups are statistical mechanical models, very much akin to the Ising model, but with some twists. In this talk, I will describe how to show exponential decay of correlations for these models at low temperatures. This is based on joint work with Arka Adhikari.

## November 10, 2022, in person: David Clancy (UW-Madison)

**Component Sizes of the degree corrected stochastic blockmodel**

The stochastic blockmodel (SBM) is a simple probabilistic model for graphs which exhibit clustering and is used to test algorithms for detecting these clusters. Each vertex is assigned a type *i = 1, 2, ..., m* and edges are included independently with probability depending on the types of the two incident vertices. The degree corrected SBM (DCSBM) exhibits similar clustering behavior but allows for inhomogeneous degree distributions. The sizes of connected components for these graph models are not well understood unless *m = 1* or the SBM is a random bipartite graph. We show that under fairly general conditions, the asymptotic sizes of connected components in the DCSBM can be precisely described in terms of a multiparameter and multidimensional random field. Not only that, but we describe the asymptotic proportion of vertices of each type in each of the macroscopic connected components. This talk is based on joint work with Vitalii Konarovskyi and Vlada Limic.

## November 17, 2022, ZOOM: Leandro Pimentel (Federal University of Rio de Janeiro)

**Integration by Parts and the KPZ Two-Point Function**

In this talk we will consider two models within Kardar-Parisi-Zhang (KPZ) universality class, and apply the integration by parts formula from Malliavin calculus to establish a key relation between the two-point correlation function, the polymer end-point distribution and the second derivative of the variance of the associated height function. Besides that, we will further develop an adaptation of Malliavin-Stein method that quantifies asymptotic independence with respect to the initial data.

## December 1, 2022, in person: Alex Dunlap (Courant Institute)

## December 8, 2022, in person: Julia Gaudio (Northwestern University)

**Finding Communities in Networks**

Networks are used to represent physical, biological, and social systems. Many networks exhibit community structure, meaning that there are two or more groups of nodes which are densely connected. Identifying these communities gives valuable insights about the latent features of the nodes. Community detection has been used in a wide array of applications including online advertising, recommender systems (e.g., Netflix), webpage sorting, fraud detection, and neurobiology.

I will present my work on efficient algorithms for community detection in three contexts.

(1) Censored networks: How can we identify communities when some connectivity information is missing?

(2) Higher-order networks: Beyond pairwise relationships

(3) Multiple correlated networks: How can we effectively combine data from multiple networks?

Joint work with: Souvik Dhara, Nirmit Joshi, Elchanan Mossel, Miklós Rácz, Colin Sandon, and Anirudh Sridhar