Stochastic Context-Free Grammar Induction with a Genetic Algorithm Using Local Search Thomas E. Kammeyer Richard K. Belew Cognitive Computer Science Research Group Computer Science and Engineering Dept. (0114) University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0114 {tkammeye,rik}@cs.ucsd.edu ABSTRACT We have previously used grammars as a formalism to structure a GA's search for simple programs called sorting networks (SNets) [KBW95]. In this paper we restrict ourselves to stochastic context-free grammars which, while more analytically tractable than our SNet grammars, are more difficult than others previously considered by the GA community. In our approach, the production rules of a grammar are encoded as genes of a genome; this grammar is used as a recognizer of strings and assigned a fitness measure that reflects the probability that it captures the structure of a restricted sample of strings generated by a stochastic target language. Our GA introduces a novel encoding of grammars as genotypic strings, and uses a local search component to aid in learning rule probabilities. Both fitness evaluation and the local search algorithm depend on a chart parser. We give results for two grammars whose nonstochastic equivalents have been used in previous studies. We also present arguments about the degree of testing needed for GA-based grammar induction.