Vision Modeling
and Computation
Vision, whether biological or
computational , is the science and technology of perception
generation from observed optical signals (or popularly referred to as images).
It allows a human being or an intelligent robot to sense,
interpret, communicate with, and react to the environment (i.e., time+space).
From quantum mechanics, Einstein's relativity theories, to nanotechnology
and photonics, the role of light or photons has always been critical.
Light (or electromagnetic waves), traveling in space and time, allows
us to see the images of distant galaxies and stars millions of light-years
away (or rather, ago), to feel the black hole ghosts from their gravity
images, to watch the functioning organs of our own
bodies (e.g., 2003 Nobel Prize in Physiology or Medicine was awarded to
Paul Lauterbur and Peter Mansfield for magnetic resonance imaging (MRI)),
and to take pictures for individual molecules in the
nano-
or meso-scales.
Beneath all these sciences and technologies, lie the
fundamental, counterintuitively technical, and often even more profoundly philosophical, questions: What do
we mean by seeing? What do we mean by seeing some patterns
(as Scott Russell discovered the first soliton pattern on the
Edinburg-Glasgow canal when riding on his horse in 1834)? How much
trust shall we put in our own seeing? Does seeing
faithfully reflect existence and reality, or only come from the biased
believing (or computation)
of our mind? But what on earth is reality or existence? If even light cannot
escape the core of a black hole and thus you cannot see it nor measure it, do you still
believe it is meaningful to discuss its existence, time, or other features ?
Travelling near the speed of light, are we still able to see the same world
(Einstein claimed that nothing in this universe can exceed the light speed)? ...
This digital and information age further pushes those questions
to the frontier. What is the chance in this universe for a natural
area on the Mars' surface to bear the pattern of a human face? How
much trust shall we invest in a doctor's words when s/he observes
abnormalities from CT/MRI/PET images? Could a new bright spot in an
astronomical image be a star missed by all the previous observations
(e.g., the tenth newly discovered planet-Sedna- of our solar system)?
Naturally, vision modeling has to be interdisciplinary, cutting
across vision psychology, cognitive science, computational neuron
science, learning theory, pattern theory, image processing, computer
vision, artificial intelligence, and so on. As applied mathematicians,
our goals are to develop models based on all the experimental
results and data, analyze the models (existence, uniqueness,
well-posedness, stability etc.), efficiently compute the models, and
validate and improve them.
We closely follow and are deeply inspired by the pioneering
works of Brown's Pattern Theory Group.
- On the Foundations of Vision Modeling V. Noncommutative
Monoids of Occlusive Preimages
From the abstract::
A significant cue for visual perception is the occlusion pattern in 2-D
images projected onto biological or
digital retinas, which allows humans or robots to successfully sense
and navigate the 3-D environments. There
have been many works on modeling and studying the role of occlusion in
image analysis and visual perception,
mostly from analytical or statistical points of view. The current paper
presents a new theory of occlusion
based on simple topological definitions of preimages and a
binary operation on them called
``occlu." We study many topological as well as algebraic structures of
the resultant preimage monoids (a
monoid is a semigroup with identity). The current paper is intended to
foster the connection between
mathematical ways of abstract thinking and realistic modeling of human
and computer vision. ( UCLA CAM
Tech Report 04-22, April, 2004, by Jianhong Shen.)
[Keywords: Depth, occlusion, preimages, segmentation, monoids
(semi-groups), topology, invariants, knot theory]
- On the Foundations of Vision Modeling IV. Weberized
Mumford-Shah Model with Bose-Einstein Photon Noise: Light Adapted
Segmentation Inspired by Vision Psychology, Retinal Physiology, and
Quantum Statistics
From the abstract: Human vision works equally well in
a large dynamic range of light
intensities, from only a few photons to typical midday sunlight.
Contributing
to such remarkable flexibility is a famous law in perceptual (both
visual and
aural) psychology and psychophysics known as Weber's Law.
There has
been a great deal of efforts in mathematical biology as well to
simulate and
interpret the law in the cellular and molecular level, and by using
linear
and nonlinear system modelling tools. In terms of image and vision
analysis,
it is the first author who has emphasized the significance of the law
in
faithfully modelling both human and computer vision, and attempted to
integrate it into visual processors such as image denoising (
Physica D,
175, pp. 241-251, 2003).
The current paper develops a new segmentation model based on the
integration
of both Weber's Law and the celebrated Mumford-Shah segmentation model (
Comm. Pure Applied Math., 42, pp. 577-685, 1989). Explained
in
details are issues concerning why the classical Mumford-Shah model
lacks
light adaptivity, and why its ``weberized" version can more faithfully
reflect human vision's superior segmentation capability in a variety of
illuminance conditions from dawn to dusk. It is also argued that the
popular
Gaussian noise model is physically inappropriate for the weberization
procedure. As a result, the intrinsic thermal noise of photon ensembles
is
introduced based on Bose and Einstein's distribution in quantum
statistics,
which turns out to be compatible with weberization both analytically
and
computationally.
The current paper then focuses on both the theory and computation of
the
weberized Mumford-Shah model with Bose-Einstein noise. In particular,
Ambrosio-Tortorelli's \Gamma-convergence approximation theory is
adapted
(Boll. Un. Mat. Ital., 6-B, pp. 105-123,1992), and
stable
numerical algorithms are developed for the associated pair of nonlinear
Euler-Lagrange PDEs. Numerical results confirm and highlight the light
adaptivity feature of the new model. ( IMA Tech.
Preprint No. 1949 , December, 2003, by
Jianhong Shen and Yoon-Mo Jung.)
- On the Foundations of Vision Modeling III.
Pattern-Theoretic Analysis of Hopf and Turing's Reaction-Diffusion
Patterns
From the abstract: After Turing's ingenious work on the
chemical basis of morphogenesis fifty years ago, reaction-diffusion
patterns
have been extensively studied in terms of modelling and analysis of
pattern
formations (both in chemistry and biology), pattern growing in complex
laboratory
environments, and novel applications in computer graphics. But one of
the
most fundamental elements has still been missing in the literature.
That
is, what do we mean exactly by (reaction-diffusion) patterns?
When
presented to human vision, the patterns usually look deceptively simple
and
are often tagged by household names like spots or stripes.
But are such split-second pattern identification and classification
equally
simple for a computer vision system? A confirmative answer does not
seem
so obvious, just as in the case of face recognition, one of the
greatest
challenges in contemporary A.I. and computer vision research.
Inspired and fuelled by the recent advancement in
mathematical image and vision analysis (Miva), as well as modern
pattern theory, the
current paper develops both statistical and geometrical tools and
frameworks
for identifying, classifying, and characterizing common
reaction-diffusion
patterns and pattern formations. In essence, it presents a data mining
theory
for the scientific simulations of reaction-diffusion patterns. (CAM Tech. Report 03-19, by Jianhong Shen and
Yoon-Mo Jung. May, 2003.)
- On the Foundations of Vision Modeling II. Mining of
Mirror Symmetry of 2-D Shapes
From the abstract: Vision can be considered as a
feature mining problem. Visually meaningful features are often
geometrical, e.g., boundaries (or edges), corners, T-junctions, and
symmetries. Mirror symmetry or near mirror symmetry is one of
the most common and useful symmetry types in image and vision analysis.
The current paper proposes several different approaches for studying
2-dimensional (2-D) mirror symmetric shapes. Proper mirror symmetry
metrics are introduced based on Lebesgue measures, Hausdorff distance,
and lower-dimensional feature sets. Theory and computation of these
approaches and measures are developed. (CAM Tech. Report 02-62, by Jianhong Shen,
2003.)
- On the Foundations of Vision Modeling I. Weber's Law and
Weberized TV Restoration
From the abstract: Most conventional image processors
consider little the influence of human vision psychology. Weber's
Law in psychology and psychophysics claims that human perception
and response to the intensity fluctuation du of
visual signals should be weighted by the background stimulus u,
instead of being plainly uniform. This paper attempts to integrate this
well known perceptual law into the classical total variation (TV) image
restoration model of Rudin, Osher, and Fatemi [Physica D,
60:259-268, 1992]. We study the issues of existence and uniqueness for
the proposed Weberized nonlinear TV restoration model, making use of
the direct method in the space of functions with bounded variations. We
also propose an iterative
algorithm based on the linearization technique for the associated
nonlinear
Euler-Lagrange equation. (CAM Tech. Report 02-20, by Jianhong Shen. Physica
D, 175(3/4), pp.241-251, 2003.)
First created in December 2002. Last
modified in
June 2003. Here to the
UCLA
Imagers .