[BUGS] BUGS Digest, Vol 16, Issue 2
..I'd rather be coding ASM!
uridium at deviate.fi
Fri Jan 16 00:05:40 EST 2009
Andrew wrote:
[..snip..]
> Besides being a bit too hot to go into a laptop, I can't think
> of *any* respect in which the G5 (AKA PPC970) is inferior to the
> G4 (AKA Freescale MPC7455). What angle were you all discussing
> on the day? Mind you, I like that the G4 works nicely in
> laptops: my G4 power book is still working nicely.
Well, actually the VMX extentions in the ppc970 were a tack-on by IBM
purely for compatability demands by apple. It's not anywhere near as
effient as any 74xx even MAX clock for clock. Rule of thumb, if your
code was tight enough to fit in Cache and the vector file was prefilled,
a PPC970 requires about 2.5x the raw clock rate to compete with a G4,
irrespective of single or double precision arguments which are moot with
both implentations being 128bits.
I had some code doing pattern chaffing and winnowing from raw sample data
back in my thesis days and was merrily running on a 650mhz 7410 "nitro" in
a B&W mac with 4x 256mb ram. I "blind" upgraded to a 1.8ghz PPC970 and had
a rude discovery to find that even with the increased Cache sizes and L2
being now on-die, the DDR PPC970 mac was still being out-run by the older
7410. This was further confirmed with my laptop which was a 12" ibook with
a 1.07mhz apollo7 (7447a). Turns out the freescale designed VMX
implentation can execute two separate vector ops per clock cycle
theoretically if your code uses both the VMX-ALU and the permute unit. The
PPC970 cannot, some operations take 4-5 cycles and you can only do ALU or
PU but not both at the same time. The code was hand built so that it would
neatly sit/align in L1 Cache on a Max originally and later a nitro.
In addition to a altivec code, the PPC970 is actually a hack of the
Power-IV cpu and inherited a couple of "here free" features. The most
notable of this is when ever it has a branch mis-prediction, it stalls and
flushes the entire instruction pipeline, not just the TLB's back-out
buffer which I think (please correct me if I'm wrong) is only a worst 6
slots. This can leave the poor thing starved for upto 24 cycles, or in the
case of the G5 power mac (not the imac), upto 48 cycles. So, what you gain in
brute integer speed you lose in appauling VMX performance and in a lot of
stally code. There's also some lovely FPU crimes as well if your bored and
want to look them up.
As far as G4's go, they only in my opinion start to get interesting with
the 7447a/b and 7447b's. The 7448b is pretty much the bad-boy of of the G4
era and has some weird tricks and refinements that give it an incredible
performance boost over all other G4's. I can easily notice it comparing a
pair of 7448b's @1.8ghz vs a dual 2.0ghz ppc970 vs a intel T7800 Extreme
Ed encoding the same 350mb divX to dvd using the same application/version
which is a universal binary. The Intel is thrown in purely for giggles at
how much worse SSE3 is. The 7448b's are the e600 core. What the 7448b did
lose though was the ability to use L3 cache to dump L1/L2 contents to when
a thread/process next gets it's quantum on the other cpu, it's faster than
dumping cache and letting it decide what should be placed in cache. Less
cache thrasing this way. The 7455 was the same in this regard not
supporting L3, afair the 7450 "voyager" was the last to use it. Still no
word on e700.
Dual 2gb G5 Dual core with shared L2 Cache:
27m09s
Dual 1.5gb G4 7448b with non-shared L2 and no L3
9m31s
Core2Duo X7800 Extreme Ed with unified L2
49m17s
It's not pretty. It still makes me boggle at what the intel's are actually
5x faster than. Sadly alo there's G4's and there's G4's. Below your DMD's
the fsb was a common rail approach so you had bus traffic down a logically
partitioned bus. When the ddr models of the mac's came out there was not a
great deal of performance increase other than there was two "rails" one
for IO and expansion bus and the other rail for memory and interupt
servicing signals. So essentially they halved the bandwidth for the DDR G4
mac's. So, if your banging your ram and fsb, and your machine is not
ticking over on the io bus, the io rails sit idle and it somehow uses a
tdm style division of resources. A great disapointment.
It's a pity apple dumped the e700 core and freescale hasn't show much
hurry ince in releasing it. Give me a 74xx over a PPC970 any day.
Al.
--
--
Al Boyanich
adb -w -P "world> " -k /dev/meta/galaxy/ksyms /dev/god/brain
More information about the BUGS
mailing list