A. Hossain,∗ D. Pease,∗∗ and A. El Kateeb∗


  1. [1] E. Rotenberg & S. Bennett, A trace cache microarchitectureand evaluation, IEEE Transactions on Computers, 48(2),February 1999. doi:10.1109/12.752652
  2. [2] S.J. Patel, Trace Cache design for wide-issue superscalar pro-cessors, Ph.D. dissertation, Department of Computer Scienceand Engineering, University of Michigan, 1999.
  3. [3] E. Rotenberg, et al., Trace processors, Proc. 30th IEEE/ACMInternational Symposium on Microarchitecture, 1997.
  4. [4] Q. Jacobson, E. Rotenberg, & J. Smith, Path-based next traceprediction, Proc. 30th International Symposium on Microar-chitecture, December 1997.
  5. [5] T. Sato, Evaluating trace cache on moderate-scale processors,IEE Proceedings on Computers and Digital Techniques, 147(6),November 2000, 369–374. doi:10.1049/ip-cdt:20000889
  6. [6] T. Conte, K. Menenzes, P. Mills, & B. Patel, Optimization ofinstruction fetch mechanisms for high issue rates, Proc. 22ndInternational Symposium on Computer Architecture, June1995.
  7. [7] J.A. Fisher, Trace scheduling: A technique for global microcodecompaction, IEEE Transactions on Computers, C-30(7), July1981.
  8. [8] R.E. Hank, S.A. Mahlke, R.A. Bringmann, J.C. Gyllenhaal, &W.W. Hwu, Superblock formation using static program anal-ysis, Proc. 26th Annual ACM/IEEE International Symposiumon Microarchitecture, 1993.
  9. [9] M. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank et al., Effectivecompiler support for predicted execution using hyperblocks,Proc. 25th Annual Symposium on Microarchitecture, 1992.
  10. [10] S. Wallace & N. Bagherzadeh, Modeled and measured in-struction fetching performance for superscalar microprocessors,IEEE Transactions on Parallel and Distributed Systems, 9(6),June 1998. doi:10.1109/71.689444
  11. [11] A. Seznec, S. Jourdan, P. Sainrat, & P. Michaud, Multiple-block ahead branch predictors, Proc. 7th International Conf.on Architectural Support for Programming Languages andOperating Systems, October 1996.209
  12. [12] S. Reches & S. Weiss, Implementation and analysis of path his-tory in dynamic branch prediction schemes, IEEE Transactionson Computers, 47(8), August 1998. doi:10.1109/12.707596
  13. [13] S. McFarling, Combining branch predictors, Technical ReportTN-36, Digital Western Research Laboratory, June 1993.
  14. [14] T.-Y. Yeh & Y.N. Patt, Alternative implementations of two-level adaptive branch prediction, 124–134, May 19–21, GoldCoast, Australia, ISCA 1992.
  15. [15] M. Behar, A. Mendelson, & A. Kolodny, Trace Cache samplingfilter, 14th International Conference on Parallel Architecturesand Compilation Techniques, 2005, PACT 2005, September2005, 255–266. doi:10.1109/PACT.2005.38
  16. [16] J.S. Hu, N. Vijaykrishnan, A. Kandemir, & A. Irwin, Power-efficient trace caches, Proceedings of the Conference and Ex-hibition on Design, Automation and Test in Europe, March2002, 1091. doi:10.1109/DATE.2002.999209
  17. [17] B. Black, B. Rychlik, & J.P. Shen, The block-based tracecache, Proceedings of the 26th International Symposium onComputer Architecture, May 1999, 196–207.
  18. [18] A. Agarwal, Performance tradeoffs in multithreaded processors,IEEE Transactions on Parallel and Distributed Systems, 3(5),September 1992. doi:10.1109/71.159037
  19. [19] D. Thiebaut, On the fractal dimension of computer programsand its application to the prediction of cache miss ratio, IEEETransaction on Computers, 38(7), July 1989. doi:10.1109/12.30852
  20. [20] J.S. Harper, D.J. Kerbyson, & G.R. Nudd, Analytical model-ing of set-associative cache behavior, IEEE Transactions onComputer, 48(10), October 1999.
  21. [21] M. Vachharajani, Microarchitecture modeling for design-spaceexploration, Ph.D. Dissertation, Princeton University, 2004.
  22. [22] S.J. Eggers, Simultaneous multithreading: A platform fornext-generation processors, IEEE Micro, September/October1997.
  23. [23] J.S. Burns, Parallel on-chip simultaneous multithreading dis-sertation, The University of Southern California, Los Angeles,May 2000.
  24. [24] C.-Y. Cher, Exploring and evaluating control-flow and thread-level parallelism, Ph.D. Dissertation, Purdue University, 2004.
  25. [25] M. Chaudhuri, Architectural extensions for executing coherenceprotocols on multi-threaded processors with integrated memorycontrollers, Ph.D. Dissertation, Cornell University, 2004.
  26. [26] J.D. Collins, Data prefetching via speculative precomputationon a simultaneous multithreaded processor, Ph.D. Dissertation,University of California, San Diego, 2004.
  27. [27] Multicore ties programmers in KNOTS, EE Times, Monday,October 24, 2005.
  28. [28] L.A. Belady & C.J. Kuehner, Dynamic space-sharing in com-puter systems, Communications of ACM, 12(5), May 1969. doi:10.1145/362946.363002
  29. [29] A. Agarwal, J. Hennessy, & M. Horowitz, Cache performance ofoperating systems and multiprogramming, ACM Transactionson Computer Systems, 6(4), November 1988. doi:10.1145/48012.48037
  30. [30] M. Kobayashi & M.H. MagDougall, The stack growth function:Cache line reference models, IEEE Transactions on Computers,38(6), June 1989. doi:10.1109/12.24288
  31. [31] P.J. Denning, The working-set model for program behavior,Communications of the ACM, 11(5), May 1968. doi:10.1145/363095.363141
  32. [32] The Standard Performance Evaluation Corporation. Availableat:
  33. [33] A. Hossain, Simultaneous multithreading with Trace Cache,Ph.D. dissertation, Syracuse University, May 2002.
  34. [34] E.J. Dudewicz & S.N. Mishra, Modern Mathematical Statistics(New York: Wiley & Sons, Inc., 1988).
  35. [35] D. Kang, Speculation-aware thread scheduling for simultaneousmultithreading, Ph.D. Dissertation, University of SouthernCalifornia, 2004.

Important Links:

Go Back