"Under the hood - Why is the K7 better than the P3" The K7, or more appropriately, Athlon, is Advanced Micro Design's (AMD) latest venture intothe microprocessor market and yet another attempt at tackling that microprocessing giant that is Intel. AMD made 2 attempts at the lower end segment of the industry with their K5 and later on the K6 processors, both CPUs however were not capable of going head to head with the Intel offerings, especially in FPU intensive tasks, like games. AMD made an attempt to overcome that weakness by introducing SIMD instructions (Single Instruction Multiple Data) to their K6 line of processors and calling those CPUs K6-2. Those instructions were not new, MMX was SIMD, introduced by Intel in their Pentium line, but 3DNow!, as AMD called their SIMD instructions, were the first to support floating poing instructions, where as MMX only handled integer instructions. A bold move it was, and as benchmarks showed, the performance gained by optimizing game code and 3D card drivers to support 3DNow! was substantial. Still, the K6-2 was hindered by it's weak FPU and never really caught on as a viable alternative to the Pentium 2, and when Intel launched their Celeron line of CPUs that really hurt AMD's market share at the low end segment where the K6 and K6-2 CPUs were selling quite well. Since then, AMD has decided to drop Intel's architectural designs as with all their previous CPUs, and design an architecture that is unique, hopefully beat Intel at it's own terf sort of speak. So, whereas the K5, K6 and K6-2 all used Socket 7 or Super7 for the K6-2, this time around the K7/Athlon is going to use a deviant of Intel's Slot 1 architecture, dubbed Slot A. Early reports on the Athlon were starting to appear, an overwhelming amount of which were positive touting the Athlon as a viable alternative to Intel's offerings. That much we expected, but when AMD finally started shipping final versions of the Athlon, everyone was blown away. I'll explain. Once AMD's weakpoint, the FPU, is now with the Athlon, it's strong point, finally beating Intel's CPUs. Before I get into that though, I'll give an overview of Athlon's features, compared to the P3's. The Athlon was said to support a 200Mhz bus, something not seen as acheivable with current RAM yields, since current RAM could only be rated up to 133Mhz. This may hold true with Intel's CPUs, since their bus must be synchronised with the system memory. So a P3 that has a 100Mhz bus would always have memory running at 100Mhz as proposed by the GTL+ bus used by Intel's CPUs. AMD, however, chose to use Alpha's EV6 bus, which can have it's bus asynchronous to system memory. So it is possible for AMD, to have a 200Mhz bus from the chipset to the CPU(s) while system memory can run at any speed the chipset sets it to, scaling up of course, as memory yields increase. Before I wrote CPU(s) because with the Athlon, AMD is finally supporting SMP(Symmetric MultiProcessing), meaning multiple CPUs in one system. In fact the EV6 bus enables each CPU to have it's own, seperate pipeline to the chipset(200Mhz bus) whileIntel's CPUs have to share that 100Mhz bus. Obviously AMD has the advantage here. With the K6-2, AMD had on-die cache running at full clock speed, very similar to Intel's Celeron A' range of CPUs. The Athlon, like the P2/P3 has, for the moment, a 512KB L2 cache running at 1/2 the clock rate of the CPU, with plans to have on-die cache, and as much as 8MB(!) of L2 cache, running at either 1/2, 1/3 or 1/4 the clock rate. On top of that the Athlon has 4x the L1 cache the P3 has, meaning 128KB L1 cache, while the P3 has 32KB. On the FPU department, the P3 has 2 FPUs(Floating Point Units), one of which is fully pipelined while the other is partially pipelined. Athlon on the other hand not only has fully pipelined FPUs, but has 3 of them! Dramatic increase indeed. AMD pulled all the stops, increasing the capability of their 3DNow! SIMD instructions with features like integer operation support and DSP instructions, mainly useful for decoding/encodingsound(eg MP3) streams and video among others going from 29 3DNow! instructions on the K6-2 to 45 3DNow! instructions on the Athlon. Of course Intel's P3 line has similar capabilitiesin the form of SSE(Streaming SIMD Extensions). Which better remains to be seen, but it appears that Intel has gained more developer support with SSE than AMD has with 3DNow! A combination of the Athlon's large 128KB L1 cache and 9-issue Superscaler instruction pipeline(meaning that it can use multiple execution units to process certain operations in parallel) results in pipelines that are constantly fed with instructions, almost always busyand not wasting any CPU cycles doing nothing. That 9-issue pipeline is consisted of the 3 FPUs I mentioned above, along with 3 pipeliness for integer operations and 3 addressing units. The P3 has a 12 stage, 2 issue pipeline which means that each execution unit is split into 12 stages, each stage processing a different instruction throughout the pipeline. Again remember that one FPU is only partially pipelined. Those 12 stages the P3's execution units split into, are a viable source of latency, so bad in fact that a single multiply takes 2 CPU cycles to execute(of course the CPU tries to keep the pipelines filled most of the time with other instructions so as to make use of those 2 cycles that is has to spend anyway). The Athlon is not hindered by such latency problems, as it's execution units are not split into stages, whilebeing a 9-issue pipeline it can handle plenty of instructions per clock, still being able to execute a single instruction without taking the performance hit caused by latency like with the P3. All these architectural improvement show up in real world benchmarks. e.g. a single Athlon at 600Mhz outperforms dual P3 550s in Quake3Test! So, undoubtably, the Athlon is, clock per clock, faster than the P3. Apart from that though, the question arises whether AMD can keep it's timeschedule and ship Athlons, in bulk. AMD's new fab plant is being used to pump out 0.18micron Athlons, while the only Athlons availiable now are 0.25micron ones, and not in retail, only by manufacters. The Athlon's retail release date has already slipped from the mid-July/early August that was promised at first. When willwe be able to go guy one at our local computer hardware store is yet unknown. Let's hope their fab can keep up with the demand, on top of that, lets hope AMD can keep up with Intel's staggering price cuts on the P3 line(40% off on the p3 550!!) not to mention win the all important clock rate war. I mean, how many of you would buy an Athlon 650 when a P3 700 is availiable? Most people just look at the raw clock rate, even if an Athlon at 650 would just barely outperform a P3 700(judging by a comparison I saw of an Athlon 600 and a P3 650). I'm as bold to say that most probably the Athlon would outperform even the awaited Coppermine CPU, a new revision of the P3 with a bigger L1 cache and at 0.18microns among other things. Of course Intel's next big thing is the Merced, due to be released sometime next year(see EPIC: What's It All About). Price wars and market share wars only translate to better, faster technology availiable to us, sooner than expected and cheaper too! :) Good news for us. Long live the competition then :)