Photo: Sylvain Crouzet
About Me
I am an Assistant Professor
in the
Department of Mathematics
at the
University of California, Los Angeles,
where I am a member of the
Probability
and Mathematical Physics Group.
I am also affiliated with the
UCLA Bioinformatics Program.
I work at the intersection of probability, statistics and
theoretical computer science, with an emphasis on biological applications.
More details on my research interests and
publications can be found
here.
The schedule of the UCLA Probability Seminar is
here.
Past committees: SODA 2009.
Current Teaching (for previous courses,
see here)
Winter 2010: MATH275B (Probability Theory)
Spring 2010: MATH285K (Seminar in Probability:
Stochastic Processes in Evolution and Genetics)
Recent Preprints (for a full list,
see here)
Alignment-Free Phylogenetic Reconstruction
Preprint, 2009.
With C. Daskalakis.
We introduce the first polynomial-time phylogenetic
reconstruction algorithm under a model of sequence evolution
allowing insertions
and deletions---or indels. Given appropriate
assumptions, our algorithm requires sequence lengths growing
polynomially in the number of leaf taxa.
Our techniques are distance-based
and largely bypass the problem of multiple alignment.
Global Alignment of Molecular Sequences via Ancestral State Reconstruction
Preprint, 2009.
With A. Andoni, C. Daskalakis, and A. Hassidim.
We consider the trace reconstruction problem on a tree (TRPT):
a binary sequence is broadcast through a tree channel where
we allow substitutions, deletions, and insertions; we seek
to reconstruct the original sequence from the sequences
received at the leaves. The TRPT is motivated by the
multiple sequence alignment problem in computational biology.
We give a simple recursive procedure giving strong reconstruction
guarantees at low mutation rates.
We begin with a random sequence $x_1, \ldots, x_k$ which is located at the root.
If vertex $v$ has the sequence $y_1, \ldots y_{k_{v}}$, then each one of its $d$
children will have a sequence which is generated from $y_1, \ldots y_{k_{v}}$ by
flipping three biased coins for each bit. The first coin has probability $p_s$
for Heads, and determines whether this bit will be substituted or not.
The second coin has probability $p_d$, and determines whether this bit
will be deleted, and the third coin has probability $p_i$ and determines
whether a new random bit will be inserted. The input to the procedure is
the sequences of the $n$ leaves of the tree, as well as the tree structure
(but not the sequences of the inner vertices) and the goal is to reconstruct
an approximation to the sequence of the root (the DNA of the ancestral father).
For every $\epsilon > 0$, we present a deterministic algorithm which outputs an
approximation of $x_1, \ldots, x_k$ with probability $1 - \epsilon$, if
$p_i + p_d < O(1/k^{2/3} \log n)$ and $(1 - 2p_s)^2 > O(d^{-1} \log d)$,
where the probability is on the coin flips which determine the mutation process.
To our knowledge, this is the first rigorous trace reconstruction result on a tree in the presence of indels.
Reconstruction on Trees: Exponential Moment Bounds for Linear Estimators
Preprint, 2009.
With Y. Peres.
Consider a Markov chain $(\xi_v)_{v \in V} \in [k]^V$ on the infinite
$b$-ary tree $T = (V,E)$ with irreducible edge transition
matrix $M$, where $b \geq 2$, $k \geq 2$ and $[k] = \{1,\ldots,k\}$.
We denote by $L_n$ the level-$n$ vertices of $T$.
Assume $M$ has a real second-largest (in absolute value) eigenvalue $\lambda$
with corresponding real eigenvector $\nu \neq 0$.
Letting $\sigma_v = \nu_{\xi_v}$, we consider the following root-state
estimator,
which was introduced by
Mossel and Peres (2003) in the context of the ``recontruction problem''
on trees:
\begin{equation*}
S_n = (b\lambda)^{-n} \sum_{x\in L_n} \sigma_x.
\end{equation*}
As noted by Mossel and Peres, when $b\lambda^2 > 1$ (the so-called Kesten-Stigum reconstruction phase)
the quantity $S_n$ has uniformly bounded variance. Here,
we give bounds on the moment-generating
functions of $S_n$ and $S_n^2$ when $b\lambda^2 > 1$. Our results have implications for the
inference of evolutionary trees.
Evolutionary Trees and the Ising Model on the Bethe Lattice: A Proof of Steel's Conjecture
To appear in PTRF, 2009.
With C. Daskalakis, E. Mossel.
Also in Proceedings of ACM STOC 2006, 159-168.
A major task of evolutionary biology is the
reconstruction of phylogenetic trees from molecular data. The
evolutionary model is given by a Markov chain on
a tree. Given samples from the leaves of the Markov chain,
the goal is to reconstruct the
leaf-labelled tree.
It is well known that in order to
reconstruct a tree on $n$ leaves,
sample sequences
of length $\Omega(\log
n)$ are needed. It was conjectured by M.~Steel that for the CFN/Ising
evolutionary model, if the mutation probability on all edges of
the tree is less than $p^{\ast} = (\sqrt{2}-1)/2^{3/2}$,
then the
tree can be recovered from sequences of length $O(\log n)$. The value
$p^{\ast}$ is given by the transition point for the extremality of the
free Gibbs measure for the Ising model on the binary tree. Steel's
conjecture was proven by the second author in the special case where the tree
is ``balanced.'' The second author also proved that if all edges
have mutation probability larger than $p^{\ast}$ then the length
needed is $n^{\Omega(1)}$.
Here we show that Steel's conjecture holds true for general trees by giving a
reconstruction algorithm that recovers the tree from $O(\log n)$-length sequences
when the mutation probabilities are discretized and less than $p^\ast$.
Our proof and results demonstrate that extremality of the free
Gibbs measure on the infinite binary tree, which has been studied
before in probability, statistical physics and computer science,
determines how distinguishable are Gibbs measures on finite binary
trees.
Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci
To appear in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009.
With E. Mossel.
We introduce a simple algorithm for reconstructing phylogenies
from multiple gene trees in the presence of incomplete lineage
sorting, that is, when the topology of the gene trees may differ
from that of the species tree. We show that our technique is
statistically consistent under standard stochastic assumptions,
that is, it returns the correct tree given sufficiently many
unlinked loci. We also show that it can tolerate moderate estimation errors.
Network Delay Inference from Additive Metrics
To appear in Random Structures and Algorithms, 2009.
With S. Bhamidi and R. Rajagopal.
We demonstrate the use of computational phylogenetic techniques
to solve a central problem in inferential
network monitoring. More precisely, we design a novel algorithm
for multicast-based delay inference, i.e.
the problem of reconstructing the topology and delay characteristics
of a network from end-to-end delay
measurements on network paths. Our inference algorithm is based on
additive metric techniques widely used in phylogenetics. It runs in
polynomial time and requires a sample of size only $\poly(\log n)$.
Contact
If you would like to reach me, here is my contact information.
Office: MS 6228
Phone: (310) 825-4980
Fax: (310) 206-6673
lastname[at]math[dot]ucla[dot]edu
UCLA
Department of Mathematics
520 Portola Plaza
Los Angeles, CA 90095-1555