Algebraic Statistics Seminar, Fall 2013
Date & Time Location Speaker/Title
E1 102 The seminar/reading group usually meets Thursdays 4:35pm.
The Fall 2013 seminar will begin the week September 9th.
Monday, Sept 9,
4:40 pm
LS 152 Mustafa Bilgic, IIT Computer Science
Active Learning: Past, Present, and Future
Applied Mathematics Colloquium

A fundamental task of machine learning is prediction, where a model is built using existing input-output pairs, and then it is applied to future instances where the input is known but output is not. Examples include spam detection, sentiment analysis, and movie recommendation. Constructing enough training data for predictive models is a tedious and costly process where expert and user feedback is needed: emails need to be classified as spam/ham, phrases in reviews need to be tagged as positive/negative, and movies need to be rated. Active learning is the subfield of machine learning that aims to train an accurate model with minimal expert and user feedback.

Active learning has been studied in the past two decades and many methods have been developed. In this talk, I will provide a survey of the most-frequently utilized active learning strategies. In addition to providing theoretical background, I will discuss results of an extensive empirical study highlighting strengths and weaknesses of these strategies. I will conclude with current and future research trends with an example applied to homophilic networks.

Thursday, Sept 12,
4:35 pm
E1 102 Sonja Petrović, IIT Applied Math
Introduction to hypothesis testing for contingency tables. (Based on the material from sections 1.1 and 1.2 from ``Lectures on Algebraic Statistics"; book available electronically through the library).

Thursday, Sept 19,
4:35 pm
E1 102 Sonja Petrovic, IIT Applied Math
How do we actually do hypothesis testing for large (sparse) crossclassified data?
Continuation: section 1.2: Higher-order contingency tables. Conditional distributions. Markov bases : definition and uses.
Suggested background reading: Fisher's exact test.

Thursday, Sept 26,
4:35 pm
E1 102 Despina Stasi, IIT Applied Math & Penn State
Reading: Algebraic Algorithms from sampling from conditional distributions: Diaconis and Sturmfels, 1998.

Thursday, Oct 3,
4:35 pm
E1 102 Dane Wilburne, IIT Applied Math
The many bases of an integer lattice. Intro to Groebner bases, and an algorithm for Markov bases.
Thursday, Oct 10,
4:35 pm
E1 102 Aleksandra Slavkovic, Penn State Statistics Department
Differentially private graphical degree sequences and synthetic graphs

Increasing volumes of personal and sensitive data are collected and archived by health networks, government agencies, search engines, social networking websites, and other organizations. The social networks, in particular, are a prominent source of data for researchers in economics, epidemiology, sociology and many other disciplines and have sparked a flurry of research in statistical methodology for network analysis. While the social benefits of analyzing these data are significant, their release can be devastating to the privacy of individuals and organizations. In this talk, we give a brief overview of challenges associated with protecting confidential data, and the problem of releasing summary statistics of graphs needed to build statistical models for networks while preserving privacy of individual relations. We present an algorithm for releasing graphical degree sequences of simple undirected graphs under the framework of differential privacy. The algorithm is designed to provide utility for statistical inference in random graph models whose sufficient statistics are functions of degree sequences. Specifically, we focus on the tasks of existence of maximum likelihood estimates, parameter estimation and goodness-of-fit testing for the beta model of random graphs. We show the usefulness of our algorithm by evaluating it empirically on simulated and real-life datasets. As the released degree sequence is graphical, our algorithm can also be used to release synthetic graphs under the beta model.
(Joint work with Vishesh Karwa)
**Tuesday**, Oct 15,
4:35 pm
E1 102 Elizabeth Gross, NCSU/SJSU
Determining phylogenetic invariants

One research thread in algebraic statistics is focused on hidden Markov models, particularly in the context of phylogenetics. Hidden Markov models are used for inferring trees that best describe the evolutionary history of a collection of living species. The algebraic method for tree reconstruction relies on phylogenetic invariants, polynomials that vanish on the variety defined by the model. For the general Markov model, these phylogenetic invariants have been challenging to understand, and, in fact, the defining ideal of the variety associated to the 4-state general Markov model on the 3-leaf claw tree is still unknown. In this talk, we will describe the general Markov model, its defining polynomials, and its connection to tensors of bounded border rank. Specifically, we will explain how the variety associated to the 4-state general Markov model on the 3-leaf claw tree is the zero set of polynomials of degrees 5, 6, and 9.
**Wednesday**, Oct 23
4:40 pm
LS 152 Jesus De Loera, Univeristy of California, Davis
Algebraic and Geometric Ideas in Linear Optimization
Applied Mathematics Colloquium

Linear programming is undeniably a central software tool of applied mathematics and a source of many fascinating mathematical problems. In this talk I will present several advances from the past 5 years in the theory of algorithms in linear optimization. These results include new results on the complexity of the simplex method, the structure of central paths of interior point methods, and about the geometry of some less well-known iterative techniques. One interesting feature of these advances is that they connect this very applied algorithmic field with topics that are often not thought as applied such as algebraic geometry and combinatorial topology.
Thursday, Oct 31,
**3:15 pm**
**E1 119** Vishesh Karwa, Penn State graduate student
Differentially Private Degree Sequences and Synthetic Graphs

In this talk, I will describe a privacy problem of sharing network data while providing rigorous guarantees. The focus will be on sharing degree sequences privately, while allowing for statistical utility in the form of estimation and hypothesis testing for the $\beta$ model of random graphs. To provide utility, one needs to solve an optimization problem over the polytope of degree sequences for which I will present an efficient algorithm. I will also describe asymptotic results and possible directions for future research. The talk will assume no background on networks and privacy.
Thursday, Nov 7,
4:35 pm
E1 102 Despina Stasi, Penn State Statistics and IIT Applied Math
Dynamic Markov Bases Through Parameter Hypergraphs

Social networks and other large sparse datasets pose significant challenges for statistical inference, as many standard statistical methods for testing model/data fit are not applicable in such settings. Algebraic statistics offers a theoretically justified approach to goodness-of-fit testing that relies on the theory of Markov bases and is intimately connected with the geometry of the model as described by its fibers.
Current practices require the computation of the entire basis, which is infeasible in many practical settings. We present a dynamic approach to explore the fiber of a model, which bypasses this issue. Our algorithm is based on the toric geometry of hypergraphs. The running example is the p1 model for social networks, a statistical model of random directed graphs with reciprocation.
Thursday, Nov 14,
4:35 pm
E1 102 No seminar.
Thursday, Nov 21,
4:35 pm
E1 102 Informal/reading meeting.
SPRING PLANS: Visits by Bernd Sturmfels, Maria Angelica Cueto, and more! Reading seminar topic will be announced later.

For More Information Contact: Sonja Petrović or Despina Stasi