IEEE International Parallel and Distributed Processing Symposium, April 2005
In this paper we explore a new clustering approach for reducing the complexity of wide issue in-order processors based on EPIC architectures. Complexity effectiveness is achieved by heavily clustering the pipeline from decode to commit stage without the need for any direct bypass between clusters. This is made possible by assuming support for executing compiler-constructed traces. One trace is executed at a time by executing its coarse-grained dependency chains (DCs) in different in-order clusters. Since the DCs of a trace are mutually data independent of each other they can be executed in different clusters without any direct communication between them. To execute DCs in narrower clusters without compromising ILP, a compiler algorithm that splits large DCs by duplicating instructions is proposed.
Through cycle accurate simulations we show that a DC processor with one 3-wide, one 2-wide and one 1-wide in-order pipeline, could achieve performance equivalent to a 6-wide in-order superscalar processor. Since a clustered DC microarchitecture is complexity efficient, it is amenable to higher clock frequencies and will also be easier to design and validate than a 6-wide monolithic design.