Busch Student Center, Rutgers University, Busch Campus, Piscataway, NJ

**Yoav Freund**, AT&T Labs-Research**Rakesh Vohra**, Northwestern University

1. Introduction and overview Yoav Freund, AT&T Labs-Research In this introductory talk I will try and give a roadmap for some of the main results that will be covered in this workshop. My emphasis will be on algorithms and their performance bounds. I will overview the work in the following directions and some of them inter-relationships: - Universal compression, MDL, Bayesian prediction and universal portfolios. - Learning to play repeated matrix games. - Competitive analysis of online algorithms. - The aggregating strategy of Vovk. - The metric based analysis of online learning of Kivinen and Warmuth. 2. Learning in Games Rakesh Vohra, Northwestern University and David Levine, University of California - Los Angeles In this talk I will provide an overview of what the `learning programme' in Game Theory is supposed to accomplish as well as highlight some of its conclusions many of which will be described in the talks to come. I will also describe some of the connections to computational learning theory. What makes learning in games intersting, is that in addition to the usual exploration/exploitation tradeoff common in computational learning theory, there is the veiled issue of `teaching'. Actions taken on the basis of what one has learned affect the rivals hypothesis about ones own future behavior. 3. An overview of the competitive analysis of online algorithms and its relation to game theory Allan Borodin University of Toronto Ran El-Yaniv The Technion, Israel In this talk we will provide an overview of Online Computation and Competitive Analysis. In particular, we will discuss the competitive ratio measure (i.e. the multiplicative regret in the context of repeated decision making) from the perspective of (the worst case) complexity analysis. We will emphasize general (upper and lower bound) analysis techniques, models for online problems and the relation between various problems, the role of randomization, and the relation to more classic fields such as decision theory, game theory, and online learning. We will also discuss the most significant contributions to date, and major open problems. 4. The Prequential Approach to Probability and Statistics Philip Dawid, University College London "Prequential" means "predictive sequential". In this talk I will give an outline of the fundamental ideas of prequential Probability and Statistics. Consider a sequential game between two players, Forecaster and Nature. A conventional probability model can be constructed by ascribing suitable strategies to each player. The prequential approach concentrates, instead, on the sequences of choices made by each player, and attempts to see what can be said under weak, or absent, assumptions about underlying strategies. By considering suitable criteria for the "agreement" between forecasts and outcomes, we obtain a new game-theoretic construction of Probability Theory. Similarly, a standard statistical model can be considered as a strategy for a parametrised collection of "experts". Prequential Statistics focuses attention on the actual forecasts issued, eschewing, so far as possible, further assumptions about underlying strategies. I shall argue that little of value is lost, and much insight is gained, by imposing such constraints. 5. Decision Theory of Regret for Universal Coding, Gambling, and Prediction Andrew Barron, Yale University This talk presents some new aspects of an information-theoretic game and its implications for universal coding, gambling, and for prediction using the log loss. We are given a parameterized family of strategies which is used to define the target level of performance, associated in hindsight with the shortest code length, greatest wealth, and best cumulative log loss. The regret is the difference in what we achieve by an on-line strategy and the ideal target level on the cumulative log scale. In this talk we examine to what extent we should restrict attention to Bayes strategies when examining expected regret or minimax point-wise regret criteria. In the finite sample expected regret setting, a Pythagorean inequality of information theory is used to show that strategies that are not information limits of Bayes strategies are beaten by a constant amount, uniformly over the parameter space. In an asymptotic minimax point-wise regret setting, we consider exponential and certain non-exponential families. In exponential families certain Bayes mixtures are asymptotically minimax. However, for non-exponential families, joint work with Jun-ichi Takeuchi shows that to obtain an asymptotically minimax strategy the mixture can not be restricted to the given family, yet asymptotic minimaxity can be achieved by mixtures over a certain enlargement of the given family. In the point-wise regret setting, asymptotically maximin and minimax strategies have information divergence from the Shtarkov normalized maximum likelihood distribution that tends to zero as the sample size gets large, as shown in joint work with Trent Xie. 6. Reinforcement learning as a useful approximation: accuracy of prediction in experimental games. Alvin Roth, Harvard University One of the goals of experimental economics is to develop models of behavior that can accurately predict the behavior of human players in strategic environments. This is a different task than developing models that learn quickly to play well. And,to the extent that models are viewed as useful approximations (rather than as precisely true descriptions), this raises some questions about how to evaluate models (when they can be rejected by conventional statistical techniques). Preliminary results suggest that simple reinforcement models are quite useful for predicting human behavior in a wide variety of rather simple strategic environments. These results also give suggestions for how to proceed in exploring more complex strategic environments. 7. Adaptive Strategies, Regret and Approachability Sergiu Hart, Hebrew University, Jerusalem (joint work with Andreu Mas-Colell, Universitat Pompeu Fabra, Barcelona) Consider strategies for the repeated play of a game (or a decision problem) which, while simple to implement, generate desirable outcomes (like "no-regret", "calibration", and "correlated equilibria"). An entire class of such adaptive strategies is characterized and exhibited. It includes as particular cases "smooth fictitious play" and "regret-matching". The basic tool for the analysis is a generalization of Blackwell's approachability strategy for games with vector payoffs. 8. Information geometry and reconstruction problems Imre Csiszar, Hungarian Academy of Sciences, Budapest Consider the problem of reconstructing a real valued function f(x) when the available information specifies only a feasible set F of functions that f belongs to. Then, if a default model g is also given, the closest element of F to g is accepted as reconstruction; diffe- rent measures of distance may lead to different recon- structions. If f is known to be non-negative, e.g., a probability density, an intensity, or attenuation fun- ction, the Kullback-Leibler distance (infomation diver- gence) is a favored choice, while for arbitrary real- valued functions, the mean-squared distance is most preferred. These choices have been distinguished also by an axiomatic approach. Geometric properties of information divergence similar to those of Euclidean (or mean-squared) distance, par- ticularly the Pythagorean identity, are very useful for reconstruction. A larger class called Bregman distances also shares these properties. In this talk, results in this direction will be surveyed, covering also iterative algorithms such as iterative scaling, generalized itera- tive scaling (or SMART), and the EM algorithm. 9. Learning to play games using multiplicative weights Robert Schapire, AT&T Labs We study the problem of learning to play a repeated game against any adversary. We present a simple algorithm which is guaranteed to come close to the best payoff achievable with respect to the sequence of plays taken by the opponent. The bounds we obtain are not asymptotic and hold for any finite number of rounds. We also show that the bounds are optimal in a particular sense. The algorithm and its amortized analysis are based directly on the ``on-line prediction'' methods first studied by Littlestone and Warmuth. We also study a variant of this problem in which the learner receives very limited feedback about the game that is being played; in essence, this is an adversarial version of the multi-armed bandit problem. We present an effective algorithm for this problem, and also prove lower bounds on the best possible performance. This talk includes joint work with Peter Auer, Nicolo` Cesa-Bianchi and Yoav Freund. 10. Worst-case analyses of the exploration/exploitation tradeoff Phil Long, National University of Singapore In many natural settings, an on-line learning algorithm must trade between taking actions that its current hypothesis suggests are good, and taking actions to obtain information that will potentially improve its hypothesis. I will survey some worst-case analyses of these types of problems. 11. Tracking the Best Predictor Mark Herbster, Royal Holloway University (joint work with Manfred Warmuth) In this talk I will present on-line learning algorithms which are designed for combining classifiers or ``experts'' in nonstationary environments. The motivating assumption of this work is as follows: given a set of classifiers, and a data sequence whose underlying generative process is changing over time, one expects that different classifiers will perform well over different segments of the sequence. I give an algorithm whose performance guarantee is an additive term larger than the performance of the ``best'' sequence of classifiers. The additive term measures the degree of nonstationarity in the sequence of classifiers. 12. Rational Learning Ehud Kalai, Northwestern University Rational learning is learning guided by expected utility maximization. It implies Bayesian updating. The lecture will survey concepts and some of the results in both single person and multi-person learning. Specific topics may include notions of merging, where Bayesian updating of subjective beliefs leads a player to the true probabilities, being calibrated, where empirical tests of observed data confirms the subjective beliefs, and related notions of equilibrium. 13. Rational Bayesian Learning in Repeated Games John Nachbar, Wash University, St. Louis For a large set of 2-player discounted infinitely repeated games of complete information (stage game payoff functions are common knowledge), there does not exist any Bayesian learning theory in which player beliefs are weakly cautious, symmetric, and consistent. Loosely, weak caution means that if a repeated game strategy is in the support of a player's belief then so are computationally trivial variants of that strategy, symmetry means that the supports of player beliefs contain repeated game strategies of comparable strategic complexity, and consistency means that the support of each player's belief contains one of his opponent's epsilon best responses. A similar impossibility theorem holds for repeated games of incomplete information (stage game payoff functions are private information). The impossibility theorems casts doubt on whether one can construct a convincing theory of rational Bayesian learning in repeated games. 14. "When rational learning fails" Dean Foster, Wharton School, University of Pennsylvania and Peyton Young, Johns Hopkins University Consider an infinitely repeated 2-person game played by Bayesian rational players. Before the game begins, the payoffs of each player are perturbed by iid random errors drawn from a distribution whose support is a small interval containing the game matching pennies. Then with probability one some player almost surely fails to learn how to predict the behavioral strategies of the other player. Further, with probability one, the players' strategies will almost surely not come close to any Nash equilibrium of the repeated game. A particular consequence is that, for some games of incomplete information, the Kalai-Lehrer hypothesis about beliefs holds with probability zero. 15. `Impossibility' of Absolute Continuity Chris Sanchirrico, Columbia University Two agents with different priors watch a sequence unfold over time, updating their priors about the future course of the sequence with each new observation. Blackwell and Dubins (1962) show that the agents' opinions about the future will converge if, roughly speaking, their priors are absolutely continuous: i.e., if both agents agree about which events are possible. We show that the probability that two priors drawn at random have any chance of weakly merging (a fortiori, merging) is zero. As a corollary, we establish that two randomly drawn priors will almost surely be singular (a fortiori, not absolutely continuous). 16. Minimizing regret: the general case Aldo Rustichini, Tilburg University Abstract In repeated games with differential information on one side, the labelling ``general case'' refers to games in which the action of the informed player is not known to the uninformed, who can only observe a signal which is the random outcome of his and his opponent's action. Here we consider the problem of minimizing regret (in the sense first formulated by Hannan) when the information available is of this type. We give a simple condition describing the approachable set. 17. Reasonable Learning in Non-cooperative Games and the Internet Eric Friedman, Rutgers University, Department of Economics In this talk I will propose a definition of what makes a model of learning in games "reasonable." Such definitions are clearly context dependent, so I will focus on the Internet as a particular case study, but will also consider several other settings of Interest to Economists and Computer Scientists. My definition of reasonableness involves computational complexity, rate (and definition of) convergence, and a strong notion of robustness. 18. Bayesian representations of stochastic processes under learning - De-Finetti revisited Matt Jackson, CalTech; Rann Smorodnitsky, Technion, and Ehud Kalai, Northwestern University What are the Bayesian representations (decompositions) of a stochastic process that are learnable and are not further refined by observation of the process? Such a ``maximally'' learnable representation is equivalent to conditioning on the tail field. 19. Fast Universal Portfolios For Parameterized Target Classes Jason Cross, Yale University This talk generalizes and extends some of the concepts of universal portfolios as introduced by Cover (1991). We begin in an abstract framework, generalizing the concept of target class beyond constant rebalanced portfolios to include potential parameterized target classes having dependence on side information of a continuous form. An analogy of Cover's universal portfolio algorithm in discrete time is presented and is shown to be universal with respect to the generalized classes. The problem of computing universal portfolios in discrete time is addressed by extending results to continuous time where we show the existence of an easily computable, continuously updated, universal procedure for linearly parameterized classes of portfolios. Finally, to reconcile ease of computation with applicability we discretize the above procedure and analyze it in near continuous time. Given an appropriate schedule of increasingly frequent rebalances, we are able to show that this discretized portfolio remains universal with respect to its continuously traded target class. Furthermore, if $d$ is the dimension of our associated parameter space, we conclude that the proposed portfolio is computable within a factor of $d^2$ steps per iteration. This is an improvement over previous universal procedures for which calculations grow exponentially with $d$. 20. On-line Learning and the Metrical Task System Problem Avrim Blum, Carnegie Mellon University In this talk I will discuss connections between work in the competitive analysis of online algorithms and work in online machine learning. In particular, I will focus on the Metrical Task System problem, studied in the area of online algorithms, and the problem of combining expert advice, studied in computational learning theory. These problems contain several interesting similarities and we will see that algorithms designed for each can be used to achieve good bounds and new approaches for solving the other. I will also talk about some intermediate problems, such as the problem of combining online algorithms online. 21. On-line load balancing and routing Yossi Azar, Tel-Aviv University We survey some new and old results on on-line load balancing and virtual circuit routing. In the load balancing problems there are m parallel machines and a number of independent tasks (jobs); the tasks arrive at arbitrary times, where each task has an associated load vector and duration. A task has to be assigned immediately to one of the machines, thereby increasing the load on this machine by the amount specified by the corresponding coordinate of the load vector for the duration of the task. The decisions are made on-line, i.e., without knowing the tasks that will arrive in the future. The goal is to minimize the maximum load of the machines (other goals are plausible) . Various online load balancing problems naturally arise in many applications involving allocation of resources. Especially, they are useful for on-line virtual circuit routing. 22. Average case and worst case sample complexity of learning distributions Manfred Opper, Sheffield University This work analyses the on-line regret of optimal algorithms for the problem of predicting a sequence using cumulative log loss. They analyze both the case in which the data is generated by a distribution in the class and the case in which there is no assumption about how the sequence is generated. They show that, surprisingly, the gap between the bounds is very small even in cases where the metric dimension grows exponentially, they also show a breaking point in which the bounds diverge. 23. On prediction of individual sequences relative to a set of experts. N. Cesa-Bianchi, University of Milan, Italy and Gabor Lugosi, Pompeu Fabra University, Spain We study a prediction model where the task is to sequentially guess the bits of an arbitrary sequence making almost as few mistakes as the best predictor (or ``expert'') in a given class. All predictions are allowed to depend on the outcome of biased coin flips. Applying tools from empirical process theory, we show upper and lower bounds on the minimax regret (number of excess mistakes with respect to the mistakes of the best expert) in terms of the geometry of the class of experts. For expert classes such as Markov experts and autoregressive linear experts, our bounds determine the exact order of magnitude of the minimax regret. We then discuss some recent results in a related model, where each randomized prediction is scored according to the log of the probability assigned to the actual outcome. This has applications to problems in data compression and portfolio selection.Other Workshops

DIMACS Homepage

Contacting the Center

Document last modified on July 7, 1999.