This talk gives an introduction to the R package TDA, which provides
some tools for Topological Data Analysis.
Topological Data Analysis
generally refers to utilizing topological features from data. In this
talk, I will focus on persistent homology. The R package TDA provides
functions to sample on various geometric objects. It also provides
functions that, given some data, provide topological information about
the underlying space, such as distance functions and density
functions. The salient topological features of data can be quantified
with persistent homology. The R package TDA provides an R interface
for the efficient algorithms of the C++ libraries GUDHI, Dionysus, and
PHAT for computing the persistent homology. Specifically, The R
package TDA includes functions for computing the persistent homology
of Rips complex, alpha complex, alpha shape complex, and a function
for the persistent homology of sublevel sets (or superlevel sets) of
arbitrary functions evaluated over a grid of points or on data points.
The R package TDA also provides functions for functional summaries of
the persistent homology, such as the landscape function and the
silhouette function. The R package TDA also provides a function for
computing the confidence band that determines the significance of the
features in the resulting persistence diagrams.
As topological data analysis is increasing in popularity,
there is growing excitement in the mathematical modeling and complex
systems communities about leveraging topological techniques for
data-driven modeling. I will lead a discussion on the intersection of
mathematical modeling and topological tools. As a jumping off point, I
will present some problems from (or related to) my own work.
An observable for a collection of dataset is said to be stable if small change in the dataset results in small change in the observable. This notion of stability of observables is formalized by a stability inequality, which involves suitable metric(s) for the dataset as well as the observables. One of the most well-known example in TDA is for the persistence diagrams, where the metric for topological spaces and persistance diagram are the Gromov-Hausdorff distance and the bottleneck distance, respectively. In this talk, we review stability inequalities of some probabilistic network observables, such as the homomorphism density and conditional homomorphism density profiles, where the cut metric is involved in measuring distance between networks. The essential idea behind their proofs can be viewed as a version of the famous 'Lindeberg replacement trick'.
Persistent Homology (PH) has been used to study the topological
characteristics of data across a variety of scales. In this talk, we will
focus on a variety of spatial applications, as the geometric and
topological features of PH are well suited to exploring data sets which are
embedded in space. We will introduce two novel constructions for
transforming network-based data into simplicial complexes suitable for PH
computations and compare these constructions to state of the art.
Additionally, we will discuss some preliminary results from applying these
constructions to a variety of geographic and spatial applications,
including voting data, cities and urban networks, and biological networks
(i.e. spiders under the influence). We will highlight the computational
performance of our constructions and discuss the implications of the PH
computations for identifying and classifying certain features in our
various data sets. In particular, we will talk about spatial patterns which
emerge in each case, and how those patterns relate to existing scholarship
in the relevant area.