California Innocence Project
Founded in 1999, the California Innocence Project (CIP) is a law school clinical program dedicated to freeing the wrongfully convicted from California's prisons, educating future criminal attorneys, and improving law policies. The state of the art of forensic science has seen a drastic improvement of capabilities since just 2009, leading to new evidence in cases that previously would not have been appealed. In cases with evidence that now points to wrongful conviction, the CIP takes on the case and goes to court to fight for client exoneration. This project will help CIP handle data stemming from two primary sources: mail and archived case reports. Both types of data are multi-modal and temporally varying, making them well represented in tensor format. This project will develop novel mathematical techniques to analyze tensor data and will be in collaboration with CIP to allow real application of the developed methods, informing the science of the approaches, while also directly helping an important cause that has wide societal impact. The main goal of this project is to help CIP streamline their review process through the use of automation, highlighting cases that may be otherwise overlooked, thereby allowing CIP to focus more time and energy on quality claims of innocence and, hopefully, free more innocent people from prison.
Genesis and evolution of scientific fields
This project uses machine learning to understand the genesis and evolution of scientific areas of research. The students will work with data from scientific publications and develop techniques for representing the structure of knowledge and collaboration such that the emergence of new research areas can be identified and understood. Students will use machine learning tools such as natural language processing, knowledge graphs, semantic embeddings, graph embeddings, etc. We are especially interested in techniques that can address multimodal dynamic data (e.g. text, graph structure and time).
We plan to analyze data collected by the city of Los Angeles as part of it's gang reduction program. This data involves both a youth program and a crime reduction program. Recent work in this area by REU students includes natural language processing of text data and dynamic mode decomposition to study the evolution of the program using survey data.
Large Linear Systems in Data Science
Large-scale systems of linear equations arise in many areas of data science, including in machine learning and as subroutines of several optimization methods. When the systems are very large and cannot be read into working memory in their entirety, iterative methods which use a small portion of the data in each iteration are typically employed. These methods can offer a small memory footprint and good convergence guarantees. Kaczmarz methods, a classical example of these types of methods, consist of sequential orthogonal projections towards the solution set of a single equation (or subsystem). There are many variants within this family of methods, often using randomized or greedy strategies to select the row (subsystem) used in each iteration.
There has been a lot of work on Kaczmarz-type methods; some proving convergence results for different variants, some illustrating the application of the Kaczmarz method to specific problems from signal processing, network science, and machine learning, and some developing strategies for systems with adversarial corruption. In this project, we will explore both theoretically and experimentally potential research questions coming from these different areas of Kaczmarz-related study.
Students working on this project will develop skills in literature review, code development, numerical experiment design, theoretical analysis, technical writing, and technical presentation. We will build on prior work to understand these methods both theoretically and empirically on synthetic and real-world linear systems.
Active learning combines two different ideas - the first is a general method for semisupervised learning (SSL). The second is a method to strategically choose a small amount of unlabeled data to send to "human in the loop" for ground truth classification. This project will involve graph-based multi-class SSL classifiers for high dimensional data. Students will develop rigorous theory along with code for generalized graph-based Bayesian models for active learning with both sequential active learning, batch learning, and multiple classes. Students will work on real-world data with asymmetric group sizes. Specific types of data we plan to study are hyperspectral and multimodal imagery and video. We plan to use data in the public domain.
Knowledge graphs are data structures in which information is organized along nodes and edges of a graph and there is an unerlying heirarchical structure defined through an "ontology". This project will build knowledge graphs from datasets involving narratives and social media data. Students will develop machine learning methods for these data structures.
Differentiable Physical Simulations
Numerical methods for partial differential equations play important roles in the forward simulations of continuum solids and fluids. On the other hand, differentiating a nonlinear spatial-temporal discretization enables possibilities in inverse problems such as shape optimization, structural parameter estimation, artistic controlling of animated 3D objects, and more recently, designing embodied AI systems. This project will investigate advanced numerical differentiation schemes for finite-element or particle-based simulations and study their connections to physics-informed machine learning.