3  Gekko CPU Overview

index

3.1  Registers

spr9204r/wHID2
 
3124231615870
        
 
bit(s) description
   
2 PSE - Paired-Single load and store instructions enabled
1  
0 LSQE - Paired-Single mode enabled
 
index

3.2  Calling conventions

parameters are passed in r3 (1st) r4 (2nd) and r5 (third) up to r12 (9th), further parameters are passed through the stack.
index

3.3  PPC Instructions

index

3.3.1  Integer Instructions

MnemonicOpcodeDescription
addi  
addis  
add  
addo  
subf  
subfo  
addic  
subfic  
addc  
addco  
subfc  
subfco  
adde  
addeo  
subfe  
subfeo  
addme  
addmeo  
subfme  
subfmeo  
addze  
addzeo  
subfze  
subfzeo  
neg  
nego  
mulli  
mullw  
mullwo  
mulhw  
mulhwu  
divw  
divwo  
divwu  
divwuo  
cmpi  
cmp  
cmpli  
cmpl  
andi  
andis  
ori  
oris  
xori  
xoris  
and  
or  
xor  
nand  
nor  
eqv  
andc  
orc  
extsb  
extsh  
cntlzw  
rlwinm  
rlwnm  
rlwimi  
slw  
srw  
srawi  
sraw  
 
index

3.3.2  Floating-Point Instructions

MnemonicOpcodeDescription
fadd  
fadds (*)  
fsub  
fsubs (*)  
fmul  
fmuls (*)  
fdiv  
fdivs  
fres (*)  
frsqrte  
fsel (*)  
fmadd  
fmadds (*)  
fmsub  
fmsubs (*)  
fnmadd  
fnmadds (*)  
fnmsub  
fnmsubs (*)  
frsp (*)  
fctiw  
fctiwz  
fcmpu  
fcmpo  
mffs  
mcrfs  
mtfsfi  
mtfsf  
mtfsb0  
mtfsb1  
fmr (*)  
fneg  
fabs  
fnabs  


(*) - modified for paired singles
index

3.3.3  Integer Load and Store Instructions

MnemonicOpcodeDescription
lbz  
lbzx  
lbzu  
lbzux  
lhz  
lhzx  
lhzu  
lhzux  
lha  
lhax  
lhau  
lhaux  
lwz  
lwzx  
lwzu  
lwzux  
stb  
stbx  
stbu  
stbux  
sth  
sthx  
sthu  
sthux  
stw  
stwx  
stwu  
stwux  
lhbrx  
lwbrx  
sthbrx  
stwbrx  
lmw  
stmw  
lswi  
lswx  
stswi  
stswx  
index

3.3.4  Floating-Point Load and Store Instructions

MnemonicOpcodeDescription
lfs  
lfsx  
lfsu  
lfsux  
lfd  
lfdx  
lfdu  
lfdux  
stfs  
stfsx  
stfsu  
stfsux  
stfd  
stfdx  
stfdu  
stfdux  
stfiwx  
index

3.3.5  Branch Instructions

MnemonicOpcodeDescription
b unconditional Jump
ba  
bl branch and link
bla  
bc  
bca  
bcl  
bcla  
bclr  
bclrl  
bcctr  
bcctrl  
index

3.3.6  Condition Register Logical Instructions

MnemonicOpcodeDescription
crand  
cror  
crxor  
crnand  
crnor  
creqv  
crandc  
crorc  
mcrf  
index

3.3.7  Misc Instructions

MnemonicOpcodeDescription
twi  
tw  
sc  
rfi  
mtcrf  
mcrxr  
mfcr  
mtmsr  
mfmsr  
mtspr  
mfspr  
lwarx  
stwcx.  
sync  
mftb  
eieio  
isync  
dcbt  
dcbtst  
dcbz  
dcbz_l  
dcbst  
dcbf  
dcbi  
icbi  
eciwx  
ecowx  
mtsr  
mtsrin  
mfsr  
mfsrin  
tlbie  
tlbsync  
index

3.4  additional Gekko Instructions

The Gekko has some additional (and some modified respectivly) in its Paired-single mode which useful for fast vector and matrix calculations and is analog to Intel (and other x86 series) processors "streamed instructions", known as SSE. This extension is unique for the Gekko processor and used to calculate two single-precision numbers ("floats" in C) in one clock cycle. The floating-Point Registers of the Gekko (FPRs) are modified in the following way : one half is used for the first single number, and other for the second. These parts are named as "PS0" and "PS1". PS instructions set is divided into two parts : Load and Store Quantization and Paired-Single Arithmetic instructions. Load and Store Quantization instructions are used for fast integer-float type casting and some specific memory operations, using PS0 and PS1 parts of FPR. If you try to execute any PS instruction without HID2[PSE] and HID2[LSQE] bit set, an illegal instruction exception will be generated.
index

3.4.1  FPR format in paired-single mode

6356554857403932
11111111111111111111111111111111
3124231615870
00000000000000000000000000000000
 
bit(s) description
32-631PS1
0-310PS0
 
index

3.4.2  Arithmetic Instructions

MnemonicOpcodeDescription
ps_abs000100 DDDDD 00000 BBBBB 01000 01000 Rabsolute value
ps_add000100 DDDDD AAAAA BBBBB 00000 10101 Radd
ps_cmpo0000100 DDD00 AAAAA BBBBB 00001 00000 0compare ordered high
ps_cmpo1000100 DDD00 AAAAA BBBBB 00011 00000 0compare ordered low
ps_cmpu0000100 DDD00 AAAAA BBBBB 00000 00000 0compare unordered high
ps_cmpu1000100 DDD00 AAAAA BBBBB 00010 00000 0compare unordered low
ps_div000100 DDDDD AAAAA BBBBB 00000 10010 Rdivide
ps_merge00000100 DDDDD AAAAA BBBBB 10000 10000 Rmerge high
ps_merge01000100 DDDDD AAAAA BBBBB 10001 10000 Rmerge direct
ps_merge10000100 DDDDD AAAAA BBBBB 10010 10000 Rmerge swapped
ps_merge11000100 DDDDD AAAAA BBBBB 10011 10000 Rmerge low
ps_mr000100 DDDDD 00000 BBBBB 00010 01000 Rmove register
ps_nabs000100 DDDDD 00000 BBBBB 00100 01000 Rnegate absolute value
ps_neg000100 DDDDD 00000 BBBBB 00001 01000 Rnegate
ps_res000100 DDDDD 00000 BBBBB 00000 11000 Rreciprocal estimate
ps_rsqrte000100 DDDDD 00000 BBBBB 00000 11010 Rreciprocal square root estimate
ps_sub000100 DDDDD AAAAA BBBBB 00000 10100 R substract
ps_madd000100 DDDDD AAAAA BBBBB CCCCC 11101 R multiply and add
ps_madds0000100 DDDDD AAAAA BBBBB CCCCC 01110 Rmultiply and add scalar high
ps_madds1000100 DDDDD AAAAA BBBBB CCCCC 01111 Rmultiply and add scalar low
ps_msub000100 DDDDD AAAAA BBBBB CCCCC 11100 Rmultiply and substract
ps_mul000100 DDDDD AAAAA 00000 CCCCC 11001 Rmultiply
ps_muls0000100 DDDDD AAAAA 00000 CCCCC 01100 Rmultiply scalar high
ps_muls1000100 DDDDD AAAAA 00000 CCCCC 01101 Rmultiply scalar low
ps_nmadd000100 DDDDD AAAAA BBBBB CCCCC 11111 Rnegative multiply and add
ps_nmsub000100 DDDDD AAAAA BBBBB CCCCC 11110 Rnegative multiply and substract
ps_sel000100 DDDDD AAAAA BBBBB CCCCC 10111 Rselect
ps_sum0000100 DDDDD AAAAA BBBBB CCCCC 01010 Rvector sum high
ps_sum1000100 DDDDD AAAAA BBBBB CCCCC 01011 Rvector sum low


Note : R opcode field (comparsion of result with zero) is unused. (=0)
3.4.2.1   PS_ABS  
absolute value
Clear bit 0 of PS0[B] and copy result to PS0[D]
Clear bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.2   PS_ADD  
add
PS0[D] = PS0[A] + PS0[B]
PS1[D] = PS1[A] + PS1[B]
3.4.2.3   PS_CMPO0  
compare ordered high
"c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.4   PS_CMPO1  
compare ordered low
"c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.5   PS_CMPU0  
compare unordered high
"c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.6   PS_CMPU1  
compare unordered low
"c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
These four compare instructions looks same, because I omitted some
unecessary FPSCR stuff.
3.4.2.7   PS_DIV  
divide
PS0[D] = PS0[A] / PS0[B]
PS1[D] = PS1[A] / PS1[B]
3.4.2.8   PS_MERGE00  
merge high
PS0[D] = PS0[A]
PS1[D] = PS0[B]
3.4.2.9   PS_MERGE01  
merge direct
PS0[D] = PS0[A]
PS1[D] = PS1[B]
3.4.2.10   PS_MERGE10  
merge swapped
PS0[D] = PS1[A]
PS1[D] = PS0[B]
3.4.2.11   PS_MERGE11  
merge low
PS0[D] = PS1[A]
PS1[D] = PS1[B]
3.4.2.12   PS_MR  
move register
PS0[D] = PS0[B]
PS1[D] = PS1[B]
3.4.2.13   PS_NABS  
negate absolute value
Set bit 0 of PS0[B] and copy result to PS0[D]
Set bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.14   PS_NEG  
negate
Invert bit 0 of PS0[B] and copy result to PS0[D]
Invert bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.15   PS_RES  
reciprocal estimate
PS0[D] = 1 / PS0[B]
PS1[D] = 1 / PS1[B]
3.4.2.16   PS_RSQRTE  
reciprocal square root estimate
PS0[D] = 1 / SQRT(PS0[B])
PS1[D] = 1 / SQRT(PS1[B])
3.4.2.17   PS_SUB  
subtract
PS0[D] = PS0[A] - PS0[B]
PS1[D] = PS1[A] - PS1[B]
3.4.2.18   PS_MADD  
multiply-add
PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
3.4.2.19   PS_MADDS0  
multiply-add scalar high
PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS0[C] + PS1[B]
3.4.2.20   PS_MADDS1  
multiply-add scalar low
PS0[D] = PS0[A] * PS1[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
3.4.2.21   PS_MSUB  
multiply-subtract
PS0[D] = PS0[A] * PS0[C] - PS0[B]
PS1[D] = PS1[A] * PS1[C] - PS1[B]
3.4.2.22   PS_MUL  
multiply
PS0[D] = PS0[A] + PS0[C]
PS1[D] = PS1[A] + PS1[C]
3.4.2.23   PS_MULS0  
multiply scalar high
PS0[D] = PS0[A] + PS0[C]
PS1[D] = PS1[A] + PS0[C]
3.4.2.24   PS_MULS1  
multiply scalar low
PS0[D] = PS0[A] + PS1[C]
PS1[D] = PS1[A] + PS1[C]
3.4.2.25   PS_NMADD  
negative multiply-add
PS0[D] = - (PS0[A] * PS0[C] + PS0[B])
PS1[D] = - (PS1[A] * PS1[C] + PS1[B])
3.4.2.26   PS_NMSUB  
negative multiply-subtract
PS0[D] = - (PS0[A] * PS0[C] - PS0[B])
PS1[D] = - (PS1[A] * PS1[C] - PS1[B])
3.4.2.27   PS_SEL  
select
If (PS0[A] >= 0) then PS0[D] = PS0[C] else PS0[D] = PS0[B]
If (PS1[A] >= 0) then PS1[D] = PS1[C] else PS1[D] = PS1[B]
3.4.2.28   PS_SUM0  
vector sum high
PS0[D] = PS0[A] + PS1[B]
PS1[D] = PS1[C]
3.4.2.29   PS_SUM1  
vector sum low
PS0[D] = PS0[C]
PS1[D] = PS0[A] + PS1[B]
index

3.4.3  Load and Store Instructions

MnemonicOpcodeDescription
psq_lx000100 DDDDD AAAAA BBBBB WIII 000110 0Paired Singles Quantized Load indexed
psq_lux000100 DDDDD AAAAA BBBBB WIII 100110 0Paired Singles Quantized Load with Update indexed
psq_stx000100 SSSSS AAAAA BBBBB WIII 000111 0Paired Singles Quantized Store indexed
psq_stux000100 SSSSS AAAAA BBBBB WIII 100111 0Paired Singles Quantized Store with Update indexed


MnemonicOpcodeDescription
psq_l111000 DDDDD AAAAA WIII ddddddddddddPaired Singles Quantized Load
psq_lu111001 DDDDD AAAAA WIII ddddddddddddPaired Singles Quantized Load with Update
psq_st111100 SSSSS AAAAA WIII ddddddddddddPaired Singles Quantized Store
psq_stu111101 SSSSS AAAAA WIII ddddddddddddPaired Singles Quantized Store with Update

3.4.3.1   psq_lx  
Paired Singles Quantized Load indexed
3.4.3.2   psq_lux  
Paired Singles Quantized Load with Update indexed
3.4.3.3   psq_stx  
Paired Singles Quantized Store indexed
3.4.3.4   psq_stux  
Paired Singles Quantized Store with Update indexed
3.4.3.5   psq_l  
Paired Singles Quantized Load
3.4.3.6   psq_lu  
Paired Singles Quantized Load with Update
3.4.3.7   psq_st  
Paired Singles Quantized Store
3.4.3.8   psq_stu  
Paired Singles Quantized Store with Update
index

3.4.4  modified floating point instructions

In paired single mode (HID2[PSE] = 1), all the double-precision floating point instructions are still valid, and execute as in non-paired single mode. All single-precision floating-point instructions (fadds, fsubs, fmuls, fdivs, fmadds, fmsubs, fnmadds, fnmsubs, fres, frsp) switch their meaning and operate on the ps0 operand.

MnemonicOpcodeDescription
fadds  
fsubs  
fmuls  
fdivs  
fmadds  
fmsubs  
fnmadds  
fnmsubs  
fres  
frsp  
fsel  
fmr  

3.4.4.1   fadds  
3.4.4.2   fsubs  
3.4.4.3   fmuls  
3.4.4.4   fdivs  
3.4.4.5   fmadds  
3.4.4.6   fmsubs  
3.4.4.7   fnmadds  
3.4.4.8   fnmsubs  
3.4.4.9   fres  
3.4.4.10   frsp  
3.4.4.11   fsel  
3.4.4.12   fmr  
index

3.5  Programming Tips and additional information

index

3.5.1  Machine State Register

to do
index

3.5.2  Caches

to do
index

3.5.3  branch unit

to flush branch unit's dynamic prediction logic, you must sequentially execute 3 branches

        .... 
        b label1 
label1: b label2 
label2: b label3 
label3: 
        .... 
 
index