\documentclass{article}
\usepackage{amsmath,amssymb}
\usepackage{xspace}
\setlength{\oddsidemargin}{0.25 in}
\setlength{\evensidemargin}{-0.25 in}
\setlength{\topmargin}{-0.6 in}
\setlength{\textwidth}{6.5 in}
\setlength{\textheight}{8.5 in}
\setlength{\headsep}{0.75 in}
\setlength{\parindent}{0 in}
\setlength{\parskip}{0.1 in}
\newcounter{lecnum}
%
% The following macro is used to generate the header.
%
\newcommand{\lecture}[4]{
  \pagestyle{myheadings}
  \thispagestyle{plain}
  \newpage
  \setcounter{lecnum}{#1}
  \setcounter{page}{1}
  \noindent
  \begin{center}
    \framebox{
      \vbox{\vspace{2mm}
        \hbox to 6.28in { {{\bf 18.317~Combinatorics, Probability, and
  Computations on Groups} \hfill #2} }
        \vspace{4mm}
        \hbox to 6.28in { {\Large \hfill Lecture #1  \hfill} }
        \vspace{2mm}
        \hbox to 6.28in { {\it Lecturer: #3 \hfill Scribe: #4} }
        \vspace{2mm}}}
  \end{center}
  \markboth{Lecture #1: #2}{Lecture #1: #2}
  \vspace*{4mm}
}

\input{epsf}
%usage: \fig{LABEL}{FIGURE-HEIGHT}{CAPTION}{FILENAME}
\newcommand{\fig}[4]{
        \begin{figure}
        \setlength{\epsfysize}{#2}
        \centerline{\epsfbox{#4}}
        \caption{#3} \label{#1}
        \end{figure}
        }

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newenvironment{proof}{{\bf Proof:}}{\hfill\rule{2mm}{2mm}}
\newenvironment{mproof}[1]{{\bf #1}}{\hfill\rule{2mm}{2mm}}

\def\E#1{{\rm E}[#1]}
\def\i{\hspace*{5mm}}
\newcounter{lineno}
\setcounter{lineno}{0}
\def\n{\addtocounter{lineno}{1} \hbox to 7mm{\hfill \arabic{lineno}:}\hspace{2mm}}
\def\resetn{\setcounter{lineno}{0}}
\def\tqbfk{\tqbf_k^\exists}
\def\ep{\epsilon}
\def\vep{\varepsilon}
\def\erd{Erd\H{o}s}

\renewcommand{\l}{\ensuremath{\ell}}

\begin{document}
\lecture{8}{26 September 2001}{Igor Pak}{Bo-Yin Yang}


\newcommand{\erdren}{Erd\H{o}s--R\'{e}nyi\xspace}
\section*{The Plot}
Our general plan for these few lectures 
is to prove the following will ``usually'' (with probability
arbitrarily close to $1$) happen:
\begin{enumerate}
\item Random $O(\log |G|)$ group elements will generate the group $G$.
\item Random $O(\log |G|)$ group elements will generate the group $G$
  with diameter $O(\log |G|)$, and will do so even with random
  products only.  (Note: This is essentially what the \erdren Theorem says.)
\item Random walks (more strongly, ``lazy'' random walks) on $G$ with
  $O(\log|G|)$ random elements as generators mixes in $O(\log|G|)$
  time.
\item Cayley graph of $G$ with $O(\log|G|)$ random generators is an
  expander (we will get to this later).
\end{enumerate}

At this point the \erdren Theorem is not yet very useful since it is
not that easy to obtain a random sample of elements in the first place.
We aim to ameliorate this in the next two lectures.

\section*{Extending \erdren to Words}

Assume that $w=w_1 w_2 \cdots w_m$ is a ``word'' made up of
``letters'' $w_j$, each of which belongs to the ``alphabet'' $\{g_1,
g_2,\ldots, g_k\}$.  Further assume that each of $g_1, g_2,\ldots,
g_k$ appears at least once within the word.

Let $Q^w_{\bar g}(h) \equiv \textrm{Pr}_\ep(w_1^{\ep_1}\,
w_2^{\ep_2}\, \cdots w_m^{\ep_m}= h)$.  We aim to show that this $Q$
has the same nice properties as $P$.

\begin{proposition}
  $$\textrm{Pr}_{\bar g} \left(\max_{h\in G} \left|Q^w_{\bar g}(h) -
      \frac{1}{|G|}\right| < \frac\ep{|G|}\right) > 1-\delta,$$
  where $ \displaystyle k > 2 \log_2 |G| + 2 \log_2 \frac1\ep +
  \log \frac1\delta$ (as previously in \erdren).
\end{proposition}

\begin{proof}
  We first define the notation of ``conjugation'' for $a,x\in G$ (Igor:
  ``Sorry for the slightly confusing notation, but this is how
  algebraists actually write it.''):
  $$
  a^x \equiv x\,a\,x^{-1}.$$

  \begin{claim}[``obvious''] ~
    \begin{enumerate}
    \item $(a^x)^y = a^{yx}$ 
    \item $(a^x)^\ep = (a^\ep)^x$ for $\ep=0$, or $1$.
    \item if $a$ is uniformly random in $G$, then so is $a^x$ for any
      fixed $x\in G$.
    \end{enumerate}
  \end{claim}
  
  Now we look at the word $w$ and ``restructure'' by look at
  repetitive reappearances of any $g_i$.  For example if
  $$
  w =g_1\, g_2\, g_3\, g_4\, g_1\, g_5\, g_3\, g_7 \cdots $$

  then we proceed as follows:

  \begin{eqnarray*}
    w &=&  \overbrace{g_1\, g_2\, g_3\, g_4\, g_1}^{\textrm{first}}\,
            g_5\, g_3\, g_7 \cdots \\
      &=& g_1 g_2 \underbrace{g_3 g_4 
        (g_5)^{g_1} (g_3)^{g_1}}_{\textrm{second}} (g_7)^{g_1} \cdots g_1\\
      &=& g_1 g_2 g_3 g_4 (g_5)^{g_1} (g_7)^{\left(g_3 g_1\right)
        } \cdots g_3 g_1 \\
      &=& \cdots  \\
  \end{eqnarray*}
  
  We will proceed the same way with or without the $\ep_i$ powers.
  What will we end up with?  Let $h_1, h_2, \ldots, h_k$ be the $g_j$
  permuted to the order of appearance in $w$.  Then we will
  eventually end with
  
  $$
  w = \left( h_1\, (h_2)^{\phi_2(h_1)}\,
    (h_3)^{\phi_3(h_1,h_2)}\, \cdots 
    (h_k)^{\phi_k(h_1,\ldots, h_{k-1})} \right) 
  \cdot \left(
    (\l_1)^{\psi_1(h_1,\ldots,h_k)} \,
    (\l_2)^{\psi_2(h_1,\ldots,h_k)} \,
    (\l_{m-k})^{\psi_{m-k}(h_1,\ldots,h_k)}
  \right), $$
  
  where $\l_1,\ldots,\l_{m-k}$ are ``leftovers'', $g_i$'s in a
  permutation with repetition, and the $\phi_i$'s and $\psi_i$'s are
  fixed products of the letters $h_j$.
  
  Let $\vep_j$ be the corresponding $\ep_i$ power of $h_j$ at each
  initial appearance.  It is then straightforward to verify that
  $$
  w^{\bar \ep} = \left( h_1^{\vep_1}\, 
    \left( (h_2)^{\phi'_2} \right)^{\vep_2}\,
    \left( (h_3)^{\phi'_3} \right)^{\vep_3} \, \cdots 
    \left( (h_k)^{\phi'_k} \right)^{\vep_k} \right)
  \cdot \left( \prod_{i=1}^{m-k}
    \left((\l_i)^{\psi'_i}\right) ^ {\vep'_i} \right)
  $$
  
  True, now the $\phi_i$'s not only depend on $h_j$ for each $j<i$,
  but may also depend on the random powers $\vep'_i\in \{0,1\}$ for
  every $i$; and each $\psi'_i$ not only does depend on all the $h_j$
  but it also may depend on $\vep'_j$ for each $j>i$.  However, we can
  verify that that
  \begin{itemize}
  \item for each given $\vep'_1,\,\vep'_2,\ldots,\vep'_{m-k}$, the
    ``junk tail'' ${\cal J} \equiv \left( \prod_{i=1}^{m-k}
      \left((\l_i)^{\psi'_i}\right) ^ {\vep'_i} \right)$ is fixed in
    $G$.
  \item for each given $\vep'_1,\,\vep'_2,\ldots,\vep'_{m-k}$ and
    given $h_1,\, h_2,\, h_{i-1}$, the function $\phi'_i$ is fixed in
    $G$; hence $\left( (h_i)^{\phi'_i} \right)$ is uniformly random in
    $G$ if $h_i$ itself is.  I.e.\, Probability in terms of $\left(
      (h_i)^{\phi'_i} \right)$ is just like probability in terms of
    $h_i$ themselves.
  \item We see from the above that $Q_{\bar g}^w(h)$ is some kind of
    average over $Q_{\bar h}(d)$ for all $d\in G$, hence
    $$ \max_{h \in G} \left|Q_{\bar g}^w(h) - \frac1{|G|} \right|
    < \max_{h \in G} \left|Q_{\bar g}(h)  - \frac1{|G|} \right|,$$
    and the proposition is proved.
  \end{itemize}
\end{proof}

\section*{Not Quite Uniform Distributions}
Now we turn our attention to the situation where we have a way of
sampling from a group that is not uniformly random.  We can measure
how ``non-uniform'' the sampling is in a few different ways.  Assume
that $P$ is a probability distribution on $G$ and $U$ is the uniform
distribution, then we could use the ``total variation''
$$ \| P -U \|  \equiv \max_{B\subset G} \left| P(B)-{|B| \over |G|} \right|
= \frac12 \sum_{g\in G} \left( P(\{g\})-{1 \over |G|} \right);$$
we could also use the ``separation'', defined as (note! no absolute value):  
$$
\textrm{sep}(P)\equiv |G| \; \max_{g\in G} \left( {1 \over |G|} -
  P(\{g\})\right).$$
The two distances satisfy $0\le \| P -U \| \le
\textrm{sep}(P) \le 1$. Note also that $\textrm{sep}(P)$ is
essentially an $\l_\infty$ norm, since
$$
\textrm{Pr}_{\bar g} \left(\max_{h\in G} \left|Q^w_{\bar g}(h) -
    \frac{1}{|G|}\right| > \frac\ep{|G|}\right) < \delta
\Leftrightarrow \textrm{Pr}\,(\textrm{sep}(Q_g)>\ep)<\delta.$$

The separation is useful because when $\textrm{sep}(P)<\ep$, we can
find a distribution $N$ where $$P= (1-\ep)\,U + \ep\, N.$$
Figuratively, to sample from $P$, we first pick a random variable in
$[0;1]$, if it is less than $\ep$ we then sample from the ``noise''
distribution $N$, otherwise we sample from the uniform distribution $U$.

In other words, if $\textrm{sep}(P)=s$, then we can let $\tilde{k}=
k/(1-s)$, where $k$ is the requisite number from \erdren.  If we take
more than $\tilde{k}$ samples from $P$, enough of these samples --
``usually'' at least $k$ -- will be sampled from the ``uniform'' part
$U$, hence the ``lazy random walk will again be probabilistically
almost uniform.  We will make that more rigorous by repeating the
trick used today in the next lecture to show that we will be able to
approximate a uniform sampling of the group from nonrandomly generated
random products.

\end{document}