[BUGS] BUGS Digest, Vol 16, Issue 2

..I'd rather be coding ASM! uridium at deviate.fi
Fri Jan 16 00:05:40 EST 2009


Andrew wrote:
[..snip..]

> Besides being a bit too hot to go into a laptop, I can't think
> of *any* respect in which the G5 (AKA PPC970) is inferior to the
> G4 (AKA Freescale MPC7455).  What angle were you all discussing
> on the day?  Mind you, I like that the G4 works nicely in
> laptops: my G4 power book is still working nicely.

Well, actually the VMX extentions in the ppc970 were a tack-on by IBM
purely for compatability demands by apple. It's not anywhere near as 
effient as any 74xx even MAX clock for clock. Rule of thumb, if your
code was tight enough to fit in Cache and the vector file was prefilled,
a PPC970 requires about 2.5x the raw clock rate to compete with a G4,
irrespective of single or double precision arguments which are moot with 
both implentations being 128bits.

I had some code doing pattern chaffing and winnowing from raw sample data 
back in my thesis days and was merrily running on a 650mhz 7410 "nitro" in 
a B&W mac with 4x 256mb ram. I "blind" upgraded to a 1.8ghz PPC970 and had 
a rude discovery to find that even with the increased Cache sizes and L2 
being now on-die, the DDR PPC970 mac was still being out-run by the older 
7410. This was further confirmed with my laptop which was a 12" ibook with 
a 1.07mhz apollo7 (7447a). Turns out the freescale designed VMX 
implentation can execute two separate vector ops per clock cycle 
theoretically if your code uses both the VMX-ALU and the permute unit. The 
PPC970 cannot, some operations take 4-5 cycles and you can only do ALU or 
PU but not both at the same time. The code was hand built so that it would 
neatly sit/align in L1 Cache on a Max originally and later a nitro.

In addition to a altivec code, the PPC970 is actually a hack of the 
Power-IV cpu and inherited a couple of "here free" features. The most 
notable of this is when ever it has a branch mis-prediction, it stalls and 
flushes the entire instruction pipeline, not just the TLB's back-out 
buffer which I think (please correct me if I'm wrong) is only a worst 6 
slots. This can leave the poor thing starved for upto 24 cycles, or in the 
case of the G5 power mac (not the imac), upto 48 cycles. So, what you gain in 
brute integer speed you lose in appauling VMX performance and in a lot of 
stally code. There's also some lovely FPU crimes as well if your bored and 
want to look them up.

As far as G4's go, they only in my opinion start to get interesting with 
the 7447a/b and 7447b's. The 7448b is pretty much the bad-boy of of the G4 
era and has some weird tricks and refinements that give it an incredible 
performance boost over all other G4's. I can easily notice it comparing a 
pair of 7448b's @1.8ghz vs a dual 2.0ghz ppc970 vs a intel T7800 Extreme
Ed encoding the same 350mb divX to dvd using the same application/version
which is a universal binary. The Intel is thrown in purely for giggles at
how much worse SSE3 is. The 7448b's are the e600 core. What the 7448b did 
lose though was the ability to use L3 cache to dump L1/L2 contents to when 
a thread/process next gets it's quantum on the other cpu, it's faster than 
dumping cache and letting it decide what should be placed in cache. Less 
cache thrasing this way. The 7455 was the same in this regard not 
supporting L3, afair the 7450 "voyager" was the last to use it. Still no 
word on e700.

Dual 2gb G5 Dual core with shared L2 Cache:
27m09s
Dual 1.5gb G4 7448b with non-shared L2 and no L3
9m31s
Core2Duo X7800 Extreme Ed with unified L2
49m17s

It's not pretty. It still makes me boggle at what the intel's are actually 
5x faster than. Sadly alo there's G4's and there's G4's.  Below your DMD's 
the fsb was a common rail approach so you had bus traffic down a logically 
partitioned bus. When the ddr models of the mac's came out there was not a 
great deal of performance increase other than there was two "rails" one 
for IO and expansion bus and the other rail for memory and interupt 
servicing signals. So essentially they halved the bandwidth for the DDR G4 
mac's. So, if your banging your ram and fsb, and your machine is not 
ticking over on the io bus, the io rails sit idle and it somehow uses a 
tdm style division of resources. A great disapointment.

It's a pity apple dumped the e700 core and freescale hasn't show much 
hurry ince in releasing it. Give me a 74xx over a PPC970 any day.

Al.

-- 
  --
  Al Boyanich
  adb -w -P "world> " -k /dev/meta/galaxy/ksyms /dev/god/brain


More information about the BUGS mailing list