Tettelin et. al. proposed a new method for closing the gaps in whole
genome shotgun sequencing projects. The method uses a multiplex PCR
strategy in order to minimize the time and effort required to sequence
the DNA in the missing gaps. This procedure has been used in a number
of microbial sequencing projects including Streptococcus
pneumoniae and other bacteria. We describe a theoretical
framework for this problem and present nearly optimal algorithms for
its solution.
N. Alon, R. Beigel, S. Kasif, S. Rudich,
and B. Sudakov. ``Learning a Hidden Matching'' in the proceedings of the 43rd IEEE Symposium on Foundations of Computer Science, 2002, pp. 197-206.
We have also developed a computational framework for the synthesis of oligonucleotide microarrays.
S. Kasif, Z. Weng, R. Beigel, and C. DeLisi.
``A Computational Framework for Optimal Masking in the Synthesis of Oligonucleotide Microarrays,'' Nucleic Acids Research, 2002, Vol. 30, No. 20 e106.
Dynamic query interfaces (DQIs) are a novel database interface
developed at the University of Maryland's Human-Computer Interaction
Lab (HCIL). They provides continuous realtime feedback to the
user during the query formulation process. To see DQIs in action,
visit out one of HCIL's commercial spinoffs: IVEE. Experiments at HCIL show that
DQIs are an elegant and powerful interface to small databases.
However, previous DQI algorithm were too slow for large databases.
Egemen Tanin, Ben Shneiderman, and I developed and implemented a new
approach to DQIs that works well with large databases.
E. Tanin, R. Beigel, and B. Shneiderman.
``Incremental Data Structures and Algorithms for Dynamic Query Interfaces,''
in the proceedings of the
Workshop on New Paradigms in Information Visualization and Manipulation,
5th ACM International Conference on Information and Knowledge Management (CIKM),
pp. 12-15, 1996.
Also in SIGMOD Record (unrefereed) 25(4), pp. 21-24, 1996.
The online version contains revisions.
Query previews are a user-interface paradigm that facilitates
browsing a huge collection. They were developed by HCIL for the NASA
EOSDIS project. My contribution to the research applies geometric
techniques to make query previews algorithmically feasible.
R. Beigel and E. Tanin.
``The Geometry of Browsing,''
in the proceedings of the 3rd
Latin American Symposium on Theoretical Informatics
(invited lecture), LNCS 1380, pp. 331-340, 1998.
With its exponential parallelism, molecular computing offers the hope
of solving important problems that have resisted solution on
conventional computers. Various molecular operations have been
proposed as feasible; because it is unclear which operations will be
usefully implementable, algorithms that use the simplest operations
are to be preferred. In particular, communication between processes
may be impossible. Parallelism and running time are resources to be
exploited efficiently.
It is known that molecular computers with limited operation sets,
polynomial time, and exponential parallelism can solve any problem in
the class NP. By further limiting the parallelism, various
bounded-nondeterminism subclasses of NP are obtained. Bin Fu and I
are investigating the relationships among molecular computation
classes and these NP subclasses.
While many molecular algorithms for important problems use 2^{n}
parallelism, this severely limits the size of problems that those
algorithms can solve. In the last two decades we have seen an
improvement from 2^{n} to 2^{cn} for various c < 1
in the running time of classical serial algorithms for some
NP-complete problems, such as 3-SAT, graph coloring, and independent set.
Using bounded nondeterminism as an algorithm-design technique, we are
developing more efficient molecular algorithms for those
and other important NP-complete problems.
R. Beigel and B. Fu.
``Molecular Computing, Bounded Nondeterminism, and Efficient Recursion,''
in the proceedings of the
24th International Colloquium on Automata, Languages, and Programming (ICALP),
LNCS 1256, pp. 816-826, 1997.
Also in Algorithmica25, 1999.
B. Fu and R. Beigel.
``A Comparison of Resource-Bounded Molecular Computation Models,''
in the proceedings of the
5th Israeli Symposium on Theory of Computing and Systems,
pp. 6-11, 1997.
Also in Algorithmica24(2), pp. 86-95, 1999.
R. Beigel and B. Fu.
``Solving Intractable Problems with DNA Computing,''
in proceedings of the 13th Annual
Conference on Computational Complexity (invited lecture), pp. 154-168, 1998.
B. Fu and R. Beigel. ``Length Bounded Molecular
Computing,'' in the 4th International Meeting on
DNA Based Computers, pp. 201-211, 1998.
This research is supported by the National Science Foundation under
grant CCR-9700417, ``Connections between Space-Bounded Molecular
Computation and Classical Complexity Theory.''
Imagine a robot surveying a nuclear accident. As it collects data in
radioactive hot spots, its processors tend to get fried. Before too
many processors fail, the robot retreats to a safe spot in order to
heal. During the healing process, the robot's processors test each
other and communicate test results (some correct, some incorrect) to
an external controller in a very safe location as far as possible from
the accident. The controller determines which processors are really
faulty and sends instructions to deactivate them. Then the robot may
return to its mission.
The robot's ``healing'' process is called system level fault
diagnosis, as proposed initially by Preparata, Metze, and Chien. When
a good processor tests another processor it reports the testee's
status correctly to the controller. But when a faulty processor tests
another processor it may report the testee's status correctly or
incorrectly. Such testing protocols have been usefully implemented
by Bianchini et al.
More formally, we assume that each processor is either ``good'' or
``faulty,'' and that processors can test each other. (A good tester
will tell whether the testee is good or faulty. A faulty tester may
lie.) Tests may be performed in parallel, provided that each
processor participates in at most one test per round. My student Will
Hurwood's dissertation answers major questions on parallel
system-level fault diagnosis algorithms in static, dynamic, and
distributed models.
In static fault diagnosis, we assume that a strict majority of the
processors are good and that no processors fail during testing; the
objective is to determine the status (good or faulty) of every
processor. The best known algorithm takes only 10 rounds, independent
of the number of processors.
In dynamic diagnosis, we assume that at most t processors fail per
round and the controller can order at most t repairs per round; the
objective is to maintain a dynamic equilibrium in which the number of
faulty processors is bounded. The best algorithm limits the number of
faulty processors to O(tlogt) at all times.
In distributed diagnosis, there is no controller. Instead each good
processor must determine the status of all other processors. This
problem is equivalent to the collect problem in distributed computing.
The best known algorithm for this problem is randomized and completes
diagnosis after O(nlog^{3}n)
tests with high probability, independent of which processors actually
fail.
R. Beigel, G. Margulis, and D. Spielman.
``Fault Diagnosis in 32 Parallel Testing Rounds,''
in the proceedings of the 5th Annual ACM Symposium on Parallel
Algorithms and Architectures, pp. 21-29, 1993.
R. Beigel, W. Hurwood, and N. Kahale.
``Fault Diagnosis in a Flash,''
in the proceedings of the 36th IEEE Symposium on Foundations of
Computer Science, pp. 571-580, 1995.
W. Hurwood. ``Ongoing diagnosis,'' in the proceedings of the
15th IEEE Symposium on Reliable Distributed Systems, 1996. A
preliminary version is available as
``Dynamic Fault Diagnosis,''
YALEU/DCS/TR-1056, Yale University, Dept. of Computer Science, 1995.
J. Aspnes and W. Hurwood.
``Spreading
rumors rapidly despite an adversary,'' in the proceedings of the
15th Annual ACM Symposium on Principles of Distributed Computing,
pp. 143-151, 1996.
Also in Journal of Algorithms, 26(2), pp. 386-411, 1998.
The goal of complexity theory is to classify computational problems
according to their computational complexity, that is, the minimum
amount of resources needed to solve the problem. The classical
computing devices are the Boolean circuit, the Turing machine, and the
random-access machine. The classical computing mode is deterministic,
and the classical resources are circuit size and depth, and
time and space. Because of the present difficulty in proving lower
bounds in those models, huge classes of important problems have
resisted classification.
In order to classify more problems, new computing devices, modes of
computation, and resources have been propose. The new devices include
threshold circuits, neural nets, DNA computers, and quantum computers;
the modes include nondeterministic, alternating, probabilistic,
counting, and relativized computation; and the resources include
number of DNA strands, nondeterministic bits, alternations, random
bits, and oracle queries.
A complexity class is the set of problems solvable by a particular
device in a particular computing mode with a particular amount of
certain resources. One paradigm in complexity theory is proving that
one complexity class C_{1} is contained in another
complexity class C_{2}. This implies upper bounds for
all problems in C_{1} and lower bounds for all problems
in C_{2}.
Low-degree polynomials over various rings are a very powerful tool for
proving upper and lower bounds on circuits and counting classes.
Although we were not the first to use them in this way, we
systematized their use and demonstrated their usefulness in several
contexts. Our most notable result uses polynomials and rational
functions to show that the unbounded-error probabilistic class PP is
closed under union and intersection, answering a question that had
been open for 17 years. An extension of those techniques shows that
small AND-OR circuits with polylog(n) majority gates are
equivalent to small AND-OR circuits with only 1 majority gate.
R. Beigel, N. Reingold, and D. Spielman.
``PP is closed under intersection,''
Journal of Computer and System Sciences,
50(2), 191-202, 1995. Special issue devoted to STOC '91.
R. Beigel.
``Perceptrons, PP, and the Polynomial Hierarchy,''
Computational Complexity, 4, 339-349, 1994.
Special issue devoted to the 4th Annual McGill Workshop on Complexity Theory.
(Also in Structures '92.)
R. Beigel and J. Tarui.
``On ACC,''
Computational Complexity, 4, 350-366, 1994.
Special issue devoted to the 4th Annual McGill Workshop on Complexity Theory.
(Also in FOCS '91.)
R. Beigel, N. Reingold, and D. Spielman.
``The Perceptron Strikes Back,''
in the proceedings of the 6th Annual Conference on Structure in
Complexity Theory, pp. 286-291, 1991.
R. Beigel.
``The Polynomial Method in Circuit Complexity,'' in the proceedings of the
8th Annual Conference on Structure in Complexity Theory
(invited lecture),
pp. 82-95, 1993.
R. Beigel and A. Maciel.
``Upper and Lower Bounds for Some Depth-3 Circuit Classes,''
Computational Complexity, 6(3): 235-255, 1997.
Also in the 12th Annual IEEE Conference on Computational Complexity, pp. 149--157, 1997.
R. Beigel and and B. Fu.
``Circuits over PP and PL,''
in the proceedings of the 12th Annual IEEE Conference on
Computational Complexity, pp. 24-35, 1997.
R. Beigel and A. Maciel.
``Circuit Lower Bounds Collapse Relativized Complexity Classes,''
in the 14th Annual
Conference on Computational Complexity, pp. 222-226, 1999.
N. Alon and R. Beigel.
``Lower Bounds for Approximations by Low Degree Polynomials Over Z_{m},''
in the 16th Annual Conference on Computational Complexity,
2001.
A question to an oracle is called a ``query.'' Oracle queries are
called ``parallel'' if all of them must be formulated before any of them is
asked, i.e., if no question depends on the answer to another of the
questions. Oracle queries are called ``serial'' if you learn the answer
to each question before asking the next one, i.e., if the questions are
adaptive.
If you can solve a decision problem in polynomial time with 3 parallel
queries to SAT, can you solve it in polynomial time with 2 serial
queries to SAT? Yes, you can. If you can compute a function in
polynomial time with 3 parallel queries to SAT, can you compute it in
polynomial time with 2 serial queries to SAT? No, you can't in
general compute it with 2 serial queries to any set unless P = NP.
Sets like SAT are called p-terse because they don't leak information.
Although non-p-terse sets need not belong to P, non-p-terse sets must
be recognized by polynomial-size circuits.
Bounded queries in the complexity-theoretic framework consists of
comparing the power of polynomial-time computation with k queries of
one type versus the power of polynomial-time computation with j
queries of another type.
R. Beigel.
Query-limited Reducibilities,
Ph.D. Dissertation. Stanford University Dept. of Computer Science, 1987.
R. Beigel.
``A Structural Theorem that Depends Quantitatively on the Complexity of SAT,''
in the proceedings of the 2nd Annual Conference on Structure
in Complexity Theory, pp. 28-34, 1987.
R. Beigel, W. Gasarch, and E. Kinber.
``Frequency Computation and Bounded Queries,''
Theoretical Computer Science163(1&2), pp. 177-192, 1996.
Also in the 10th Annual Conference on
Structure in Complexity Theory, 125-132, 1995.
R. Beigel, M. Kummer, and F. Stephan.
``Approximable Sets,''
Information and Computation, 120(2), pp. 304-314, 1995.
(Also in Structures '94.)
M. Agrawal, R. Beigel, T. Thierauf.
``Modulo Information from Nonadaptive Queries to NP,''
In FST&TCS, LNCS 1180, pp. 322-334, 1996.
Also TR96-001, Electronic Colloquium on Computational Complexity, 1996.
R. Beigel and R. Chang.
``Commutative Queries,''
Information and Computation, 166, pp. 71-91, 2001.
An earlier version appeared in the proceedings of the
5th Israeli Symposium on Theory of Computing and Systems,
pp. 159-165, 1997.
R. Beigel.
``Gaps in Bounded Query Hierarchies,''
in the 14th Annual
Conference on Computational Complexity, pp. 124-141, 1999. Also invited
to the JCSS special issue devoted to Complexity '99.
Counting classes are defined by taking nondeterministic Turing
machines and saying they accept if the number A of accepting
computations satisfies a particular predicate. Using the predicate A
> 0, we see that NP is a counting class. Using the predicate A
> T/2 where T is the total number of computations, we obtain the
class PP.&NBSP; Using the predicate A = 0 (mod 2), we obtain the class
PARITYP. The low-degree polynomial method suffices to prove many
relationships among counting classes.
R. Beigel
``Closure Properties of GapP and #P,''
in the proceedings of the
5th Israeli Symposium on Theory of Computing and Systems, pp. 144-146, 1997.
R. Beigel and J. Feigenbaum.
``On being incoherent without being very hard,''
Computational Complexity, 2(1), 1-17, 1992.
(A version co-authored with Bellare and Goldwasser appeared in FOCS '91.)
R. Beigel and H. Straubing.
``The Power of Local Self-Reductions,''
in the proceedings of the 10th Annual Conference on
Structure in Complexity Theory, 277-285, 1995.
R. Beigel and J. Goldsmith.
``Downward Separation Fails Catastrophically for Limited Nondeterminism Classes,''
SIAM Journal on Computing, 17(5), 1420-1429, 1998.
Also in the 9th Annual Conference on
Structure in Complexity Theory, pp. 134-138, 1994.
Can you solve 3 instances of the halting problem in a finite amount of
time, by asking only two questions to the halting set K? Yes,
you can. (Try to figure it out yourself. The second question depends
on the answer to the first question.) Sets like K are called
verbose because they ``leak'' information. Complete sets for
higher levels of the arithmetical hierarchy are, however, terse.
Can you solve 4 instances of the halting problem in a finite amount of
time, by asking only two questions to the halting set K? No,
in fact you can't solve 4 instances of any nonrecursive problem by
asking only two questions to any set, even a much harder one.
We have made a general study of when k+1 queries are more
powerful than k in recursion theory.
R. Beigel.
Query-limited Reducibilities,
Ph.D. Dissertation. Stanford University Dept. of Computer Science, 1987.
R. Beigel and W. I. Gasarch.
``The mapmaker's dilemma,''
Discrete Applied Mathematics, 34(1-3), 37-48, 1991.
Also in Annals of Discrete Mathematics.
Special issues devoted to the Capital City Conference on
Combinatorics and Theoretical Computer Science.
R. Beigel, M. Kummer, and F. Stephan.
``Quantifying the Amount of Verboseness,''
Information and Computation118(1), pp. 304-314, 1995.
(Also in LFCS '92.)
R. Beigel, W. Gasarch, and E. Kinber.
``Frequency Computation and Bounded Queries,''
Theoretical Computer Science163(1&2), pp. 177-192, 1996.
Also in the 10th Annual Conference on
Structure in Complexity Theory, pp. 125-132, 1995.
R. Beigel, W. Gasarch, M. Kummer, G. Martin, T. McNicholl, and F. Stephan,
``The Complexity of ODD_{n}^{A},''
Journal of Symbolic Logic65, pp. 1-18, 2000.
A prior version entitled ``On the Query Complexity of Sets,'' appeared in the
21st Symposium on Mathematical Foundations of Computer Science, 1996.
We have developed efficient algorithms for sorting with
special-purpose hardware and for searching an infinite sorted list,
the first nontrivial algorithm for searching a file that you are
editing, and the fastest known algorithm for 3-coloring.
R. Beigel.
``Finding Maximum Independent Sets in Sparse and General Graphs,''
in the proceedings of the 10th Annual ACM-SIAM Symposium on
Discrete Algorithms, 1999.
R. Beigel and C. P. Kruskal.
``Processor Networks and Interconnection Networks Without Long Wires,''
in the proceedings of the ACM Symposium on Parallel Algorithms and
Architectures, pp. 42-51, 1989.
Also in Computer Architecture News19(1), 15-24, 1991.
Special issue (unrefereed) devoted to SPAA '89.
Our algorithm for learning oblique decision trees beats previous
methods in practice on star/galaxy discrimination, iris
classification, and breast cancer diagnosis.
An Introduction to Computability and Formal Languages
Robert W Floyd, Stanford University
Richard Beigel, Yale University
An up-to-date, authoritative text for courses in theory of
computability and languages. Redefining the building blocks of
automata theory, the authors offer a single unified model encompassing
all traditional types of computing machines and ``real-world''
electronic computers. This reformulation of computability and formal
language theory provides a framework for building a rich and enduring
body of knowledge.
An instructor's manual containing lecture outlines, solutions to
selected exercises, new exercises, a final exam, and a list of errata
is available from the publisher.
Download the latest errata (updated
March 30, 1999).