Intel says Penryn will deal 45 percent speed boostAnd opens up on NehalemBy Martin Veitch: Thursday 29 March 2007, 01:57INTEL SAID today that it will deliver a speed boost of up to 45 percent when it delivers the Penryn shrink later this year.On a conference call, Stephen Smith, Intel’s Digital Enterprise Group director, said that the company’s latest 45nm High-k process technology will help it achieve 20 percent faster transistor switching, clock speeds north of 3GHz and more instructions per clock for Penryn. As per Intel’s recent mantra, all of this will be delivered without breaking the current ceiling on power and thermals.Penryn uses an enhanced version of the current Core microarchitecture, introducing SSE4, a new generation of Intel’s extensions that used to generate a joke about Screaming Cindy. Other additions include what Intel calls the “Radix-16 Divider” to accelerate mathematical and geometric calculations, faster virtualisation task switching and “deep power down” power management.At the system level, Penryn will also be boosted by a new front-side bus that runs at 1600MHz rather than the current 1333MHz, and bigger caches -- 6MB for dual-cores and 12MB for quad-cores.The improvements will feed into uniprocessor, dual-processor and multiprocessor Xeon dual-core and quad-core servers and workstations; a Core 2 dual-core and quad-core desktop, and a quad-core Core 2 Extreme Edition desktop; and a Core 2 dual-core mobile chip.In terms of performance, Intel projects up to a 45 percent gain on workstation or high-performance computing “bandwidth-intensive applications”, up to 40 percent improved media application performance for tasks such as video encoding, and up to 20 percent faster gaming speeds.Intel also previewed its next-gen microarchitecture, codenamed Nehalem and due to go into chip production in 2008. The new news here was that Nehalem will support up to 16 or more threads and eight or more cores and that there will be support for optional integrated graphics. However, Smith said that high-end users will still be using slot-in graphics cards.One area still up in the air for Nehalem is memory support. Smith said buffered and unbuffered memory types will be supported for different usage needs.
Intel talks specifically about PenrynMore architectural tidbits, but few numbersBy Charlie Demerjian: Thursday 29 March 2007, 01:47INTEL IS FINALLY talking specifics on Penryn, and it looks to be a nice family of chips. A lot of people think it is just a dumb shrink of Merom, no slouch itself, but it is far more than that.One of the biggest problems with any chip launch is cutting through the buzzwords and marketspeak. With that in mind, please feel free to ignore things like calling a bump up in cache from 4MB to 6MB "Intel Advanced Smart Cache", their bold, I do. That said, there is a lot of good stuff here.The 10,000 foot view is this is a shrink of Merom from 65 nanometres to 45 nanometres, but it is heavily massaged on top of that. You can read more about the 45nm process here. Basically, you get about half the die area for the same transistor count, 20 per cent faster switching speeds and vastly lower leakage. These numbers are what is possible, not necessarily what you will see.The architectural changes, are more important. The one you probably have heard of is SSE4, a new instruction set that makes multimedia faster and happier if your DRM infection deems you worthy of viewing your purchase.The instruction set is all fine and dandy, you can just slap it in microcode if you want to take the easy way out. Intel didn't and added in a bunch of plumbing to avoid bottlenecking the new instructions. One of those is called the Super Shuffle Engine (S3), and it does what it says, shuffles bits around. S3 is very useful for things like data interleaving, and it can do all of this in (mostly) one cycle. The old way could do it, but it took longer and quite possibly involved multiple operations. S3 at one cycle avoids bottlenecking SSE4 and destroying throughput.S3 is also useful for packing and unpacking, something that you never notice, but everything you use a computer for probably uses. Penryn has the ability to do most of the S3 ops in one clock, down from the 2-5 of the Merom family. Only one of the ops in the family, extract, takes three clocks, but that is down from the five it used to take.The other thing that they added in is called the Radix-16 divider. With Merom, the adder and multiplier were revamped, but the multiplier was more or less same Radix-4 divider that traced it's lineage back to the Pentium, the chip, not the horse. The new one can do 4 bits per cycle instead of 2, more or less doubling the throughput.It also is engineered to run at at much higher clocks, so it will again not throttle the architecture. They also got much more aggressive with the way it works. Penryn will do better at ignoring leading zeros and early exits than Merom, building on many of the advances of the earlier chip. This aggressiveness also caries back to other FP and even Int ops revamped earlier.That brings us to the first big thing that the shrink gets us, bigger caches. The caches are upped from 4MB to 6MB, and associativity was upped from 16-way to 24-way. In addition the most interesting bit is the advances on loads and stores. The old way of dispatching speculative stores would stall when the data would cross cache lines.Penryn removes this limit. You can cross cache lines without introducing a long wait, very handy for a lot of multimedia apps that don't use regular accesses to memory. Intel touts motions estimation, basically the core op of video on PCs, as a prime example of this.Toss in the 1600FSB, and you have a new buzzword, Intel Advanced Smart Cache. That one encompasses the cache size, the associativity and the rest. Combined with the divider, S3 and various other improvements, Intel is expecting a 45% improvement in bandwidth and FP heavy apps.There are other important bits that raise performance, but not necessarily everywhere. One of the big ones is vastly improved VMEntry and WMExit in VT. You can read more here, but basically Penryn improves this by a claimed 25-75%.That brings us to power, or its lack. Intel is officially not changing the TDP specs of Penryn, keeping the current 50/80/120W for quads and 40/65/80W for duals. This is a bit misleading, but in a good way. If you noticed the earlier claims from the 45nm process, you saw both speed and power benefits.Word on the street is that Intel is fudging the TDP numbers up, they could be pulling the 65W parts down to 45-50W and still not be fibbing. The rest of the TDPs would go down accordingly. Basically, these chips are a lot lighter on real world power use than Intel is claiming.There are also no special enhancements to quad core power use. The comms protocol for QC power management has not been changed much. The big change is a new C6 sleep state, IE deeper than C5. As opposed to C4, it turns the core voltage way down, and turns off the the L1 and L2 caches entirely.This means when you wake up, you have to refill the caches to some degree, greatly increasing wake up time. There is a middle ground, aka magic pixie dust, in the CPU, so when you wake up, you don't have to refill the entire cache, just a few lines. Additionally, Intel built in a lag when going into and out of this mode to prevent thrashing.So, how fast is it? One thing Intel didn't mention today was the half clock dividers that Penryn is capable of. Instead of 266/333MHz steps, they can now do 133/166/200MHz steps, allowing for a tighter spread of parts.Intel kept repeating the >3GHz mantra, but with performance numbers let a few things slip. For desktop, they mentioned 3.2GHz, and the demo today was done on a 3.33GHz part. With the performance lead Intel has, it doesn't make sense to increase clocks a lot until AMD puts out Barcelona. Expect modest clock boosts.That in a nutshell is the Penryn Family. It has a lot of things changed from Merom, and a lot of things that stayed the same. It is far from a dumb shrink, but nothing near a full redesign, think of it as a massaged Merom with a bunch of extra goodies. µ
Intel Penryn set to despoil pretty streets of AMD BarcelonaAnalysis Toreador, matador, picador, knocking on a doorBy Nebojsa Novakovic: Thursday 29 March 2007, 08:12WHEN AMD gets its Mediterranean journo bash going on the hot beaches of Tunisia on 22 April, you'll see many R600 GPUs there.Too bad I won't since, well, I'm an Asian journo and those supposedly aren't covered. A birdie from the company told me, though, that more than one system on that demo day will run those mighty GPUs on some shiny new CPUs, nicknamed not Tunis, but a place near it - Barcelona.A good idea to get some extra "brand new CPU" coverage before its actual launch a month later or so? Yes, surely. Will Intel stand still? No way, surely. The brand new naughty chippery from Satan Clara has more than just the process shrink to spoil AMD's game and mess up Barcelona's pretty vista. Not the bloated Microsoft malware, but the actual city view.Intel's update now specifies more clearly what the Penryn family brings across the whole spectrum (mobile, desktop, server). Yes, the 45 nm high-K process gives speeds well above 3 GHz on desktop and server parts, and close to 3 GHz on mobile. Also, the FSB1600 move is now official - together with better chipsets on both single-FSB and dual-FSB fronts. You know already about X38, the preferred high-end chipset for desktop Penryn - Intel has also confirmed today the launch of a brand new dual-FSB chipset for the Penryn flavour of Xeons this year, which they used to get the claimed 45% improvement in "bandwidth intensive" apps. Besides the FSB1600, it has much lower latency memory as well as 2 x 16 PCI-E capability for parallel graphics or massive I/O - helping it on both 3-D workstation and server platforms. This is sorely needed to handle Barcelona's memory advantage.Beyond this, the "Dynamic Acceleration" stuff - in single-threaded apps when other cores are idle, allowing the active core to auto-overclock to use the spare power headroom - is now available across the complete family, including the mobile parts. A potentially deadly issue of massive hot spots on the die when one core overclocks while the others sleep has been supposedly resolved by Intel, according to Steve Smith - so, overclockers need not worry.The CPU voltages are expected to be just slightly below those of current Core 2 parts, so Intel expects no major issues in support by many current mainboards via BIOS updates. They, in fact, used the existing 975X uni-FSB and Greencreek dual-FSB chipsets for initial Penryn runs (prior to benchmarks, where X38 and 'unnamed new dual-FSB platform' were the ones to depend on).Frequency-wise, Intel didn't want to say the precise clocks for the first iteration (which, remember, comes soon after Barcelona), but was adamant about going above 3 GHz - including quad core parts.With these additions, I believe Penryn will be able to match Barcelona clock-for-clock on FP tasks too, and have quite a bit of integer headroom - of course, for end-2007, its clocks will probably also be some 20% above Barcelona entries (3.33 GHz Penryn dual-die QC vs ~ 2.7 GHz Barcelona single-die QC). The faster FSB, with Xeon-graded parts probably able to do over 2 GHz FSB production use on good mobos (water-cooled Asus Striker Extreme?), will help alleviate the memory throughput competition too.Interestingly, the TDP limits are the same as for current 65 nm products - 130 W for QC desktop "Extreme Edition", 120 W for QC server part, and up to 80W for single-die DC part. While the official 45 nm talk is of "before year end" I wouldn't be surprised at all if some of these parts arrive quite a bit earlier: Intel was definite about the shipments happening this year from the first two 45 nm fabs, so they must be having some buffer in place there - prior to X'mas season? In the meantime, the new Core 2 Extreme QX6800, right now in my hands awaiting torture, and its sibling, Xeon X3240 - will face the first Barcelona entries.Oh by the way, I particularly liked these deeply philosophical explanations for the non-techie hacks from Intel's press release - "Thanks to our high-k metal transistor invention, think of 820 million more power efficient light bulbs going on and off at light-speeds.", and, "Imagine a shower with two powerful water shower heads, when one shower head is turned off, the other has increased water pressure" - uuuh, what kind of dimwits are mingling among IT journos these days, that need this kind of explaining?
Nehalem bits bubble upSome new ones too!By Charlie Demerjian: Thursday 29 March 2007, 01:49<A TARGET="_blank" INTEL IS FINALLY starting to talk about Nehalem, oh happy day. In the talk today about Penryn, Pat Gelsinger followed it up with some bits about Nehalem.While it is nothing as detailed as this, this, this, or this, it is still a good first step. For the basics, Nehalem isn't the massively new ground up architecture they want you to think it is, but it definitely is a heavily tweaked core. The biggest advance comes in FP capability, so expect big gains there.The parts that will change heavily is the IO, they will bring CSI and IMCs to the table. We told you about the IMC situation earlier, and not much has changed there. The only noteworthy part is that Intel is copying AMD's micro-buffer strategy on 2S.The CSI component is the interesting part, with the lower end Gainestown having 2 connections and Becton having 4. All run at 6.4GT/S unless they go for the longer length option, then it will drop speeds to 4.8GT/S. The electricals will be similar to PCIe2, giving less off-chip voltages to deal with, avoiding one of the major scaling headaches AMD ran into.For memory, the situation is well known, 4S gets FBD2, 2S gets DDR3 with the micro-buffers, and normal parts get plain old DDR3 with external controllers. The more things change, the more, well, we tell you about.HT makes a comeback with two threads per core, no shock there. The core count itself goes up quite a bit, with Beckton being listed as having four, six and eight core with TDPs ranging from 90W to 130W. This means the current TDP envelopes will stay the same with double the core count and an IMC. No easy feat there.The biggest new thing they dropped on us was integrated graphics as an option. Since this is not set to be pulled on die till Gesher, I can only take a semi-educated guess that they mean on package graphics, not on die. The heavy use of 'optional' only reinforces this.Last up, we get to performance. More FP, no FSB bottlenecks, and an IMC, that does not sound slow. It isn't. We would guesstimate that Thurley will be better than it's Penryn predecessor by a tad more than 1.5x and 2x on Int and FP respectively. Stoutland will be closer to 2.5x the performance of Dunnington, the platform advances will pay off here, especially on the FP side.It is on track for a late 2008 launch, and with Intel executing well, we have no reason to doubt that. Had a lot of this tech come out in 2006 with Whitefield it would have been a monster. Three years later, it is just a big step in the right direction. µ
lalalalal we go paint de tong blueeee we go turn de whole world upside dong bluueeeeeeeewe go paint de tong blueeeerahhhhhhhhhi goin an talk to tribe an see if i could get ah section name penryn yes rahhh jumbie jumbie dem rahhhmm that blue lookin good boy winnyyu should make more posts in blue
hoss just now folding at home might finish (assuming that competition leads to affordable multi core CPUs)ACK!!! BLINDING!!!