0 Members and 1 Guest are viewing this topic.

First digit = 0 ('digits' are 8-bit ints, so on [0,255])Now 8AD1/AC = CE, so 8AD176.00 - AC0980*0.CE = 8AD176-8A6FAF = 61D1Now 61D1/AC = 91, so 61D1.0000 - AC0980*.0091 = 61D1.0-6171.6180 = 5F.9E80Now 5F9E/AC = 8E, so 5F.9E80 - AC0980*.00008E = 5F.9E8000-5F.6D4500 = .313B

Div_Sub:;DE/C, DE <C*256, C>127 ld a,d sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e sla e \ adc a,a \ jr c,$+5 \ ret p \ cp c \ ret c \ inc e \ retFloatDiv_80:; 1 bit sign + 15 bits signed exponent (16384 is exp = 0) (little endian); 64 bits mantissa, (big endian);Inputs:; HL points to dividend; DE points to divisor ex de,hl call LoadFPOPs ld hl,(fpOP1) ld de,(fpOP2) ld a,h xor d push af res 7,d res 7,h sbc hl,de ld bc,16384 add hl,bc pop af and $80 or h ld h,a ld (fpOP3),hl;Now perform the division of fpOP2/fpOP1;The algo works like this:; Take the first byte of fpOP2, compare against that of fpOP1; If it is bigger, since fpOP1 should have bit 7 set (normalized numbers),; it divides at most once. So the first byte is 1, subtract fpOP2-fpOP1->fpOP2; After this, we repeatedly compare the upper two bytes of fpOP1 to the first byte; of fpOP1. This is to estimate how many times fpOP1 can be divided by fpOP1.; This is just a guestimate, but each digit is an overestimate by at most 1!;; Example with smaller numbers. Take 8AD176/AC0980; First digit = 0 ('digits' are 8-bit ints, so on [0,255]); Now 8AD1/AC = CE, so 8AD176.00 - AC0980*0.CE = 8AD176-8A6FAF = 61D1; Now 61D1/AC = 91, so 61D1.0000 - AC0980*.0091 = 61D1.0-6171.6180 = 5F.9E80; Now 5F9E/AC = 8E, so 5F.9E80 - AC0980*.00008E = 5F.9E8000-5F.6D4500 = .313B; In this case, there were no over estimates. We would have know if the subtraction step; yeilded a negative output. To adjust this, decrement the new digit by 1 and add AC0980 to the int.; So the example gives 8AD176/AC0980 = 0.CE918E, or in base 10, 9097590/11274624=.806908488274;fpOP1+2 has denom;fpOP2+2 has num ld de,fpOP2-2 ld hl,fpOP2+2 ldi \ ldi \ ldi ldi \ ldi \ ldi ldi \ ldi \ ldi ldi \ ldi \ ldidenom = fpOP1+2numer = fpOP2-2outp = numer-1 ld hl,denom ld de,numer call cp_64b ld hl,numer-1 ld (hl),0 jr c,noadjust inc (hl) ex de,hl inc de ld hl,denom call sub_64b ex de,hl \ dec hlnoadjust: inc hl ld de,numer+8 call div_sub_1 call div_sub_1 call div_sub_1 call div_sub_1 call div_sub_1 call div_sub_1 call div_sub_1 call div_sub_1 ld de,801Eh ld hl,800Bh ld a,(hl) rra jr nc,directcopy inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de inc hl \ ld a,(hl) \ rra \ ld (de),a \ retdirectcopy: inc hl ldi ldi ldi ldi ldi ldi ldi ldi ld hl,(fpOP3) \ dec hl \ ld (fpOP3),hl \ retdiv_sub_1: ld bc,(denom) ld a,(hl) inc hl push hl ld l,(hl) ld h,a ex de,hl call Div_Sub ld c,e ex de,hl call fused_mul_sub ld hl,9 add hl,de ex de,hl pop hl retfused_mul_sub:;multiply denominator*E and subtract from numerator xor a ld hl,(denom+6) \ ld b,a \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc ld a,(de) \ sub l \ ld (de),a \ dec de ld a,h \ adc a,b ld hl,(denom+5) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom+4) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom+3) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom+2) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom+1) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de ld hl,(denom-1) \ ld l,b sla h \ jr nc,$+3 \ ld l,c add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add hl,hl \ jr nc,$+3 \ add hl,bc add a,l \ jr nc,$+3 \ inc h \ ld l,a ld a,(de) \ sub l \ ld (de),a \ ld a,h \ dec de ld l,a ld a,(de) sbc a,l;if c flag is set, overestimate ld a,c \ ld (de),a ret nc ld hl,8 add hl,de ex de,hl ld hl,denom+7 ld a,(de) \ add a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de ld a,(de) \ adc a,(hl) \ ld (de),a \ dec de ex de,hl \ dec (hl) \ ex de,hl ret;num+7 - hlsub_64b:;(de)-(hl), big endian 64-bit. ld bc,7 add hl,bc ex de,hl add hl,bc ex de,hl ld a,(de) \ sub (hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl ld a,(de) \ sbc a,(hl) \ ld (de),a \ retcp_64b:;compares (de) to (hl), big endian 64-bit ints ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl ld a,(de) \ cp (hl) \ retLoadFPOPs:;HL points to the first;DE points to the second push de ld de,fpOP1 xor a ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ld (de),a \ inc de ld (de),a \ inc de ld (de),a \ inc de ld (de),a \ inc de pop hl ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ld (de),a \ inc de ld (de),a \ inc de ld (de),a \ inc de ld (de),a \ inc de ret.echo "Size:",$-Div_Sub

ld hl,float_e ld de,float_pi jp FloatDiv_80;e/pi =0.dd816a76547ca9910802972996d4e3 float_pi: .dw 16384+1 \ .db $c9,$0f,$da,$a2,$21,$68,$c2,$34 ;pi, not rounded upfloat_e: .dw 16384+1 \ .db $ad,$f8,$54,$58,$a2,$bb,$4a,$9A ;e, not rounded up

clock cycles ops per sec, 6MHzAdd/Sub 1200 cc 5000Multiplication 13000 cc 461Division 19000 cc 315Sqrt 108000 cc 55

Args used:1.57079632679489757.29577951308232For example, 57.29577951308232/1.570796326794897 TI-OS Float80 diff ratio analysisadd/subtract 2758 3166 +408 1.1479 Add/sub is a bit slower, possibly noticeablymultiply 35587 10851 -24736 0.3049 Multiplication is signigicantly faster. Noticeable.divide 40521 18538 -21983 0.4575 Division is significantly faster. Noticeable.square root 86825 46831 -39994 0.5394 Square roots, are significantly faster. Noticeablenotes: TI-Floats are approximately 47 bits of precision. Float80 uses 64 bits of precision (that is 14 digits versus 19)

Well, here are timings I got from WabbitEmu for the OS (86825 ccs) and mine (46831ccs). So it isn't quite twice as fast, but it is almost. I am also working on a routine to cut out another 16000 or so, so then it will be almost 3 times faster. For the timings I have:Code: [Select]Args used:1.57079632679489757.29577951308232For example, 57.29577951308232/1.570796326794897 TI-OS Float80 diff ratio analysisadd/subtract 2758 3166 +408 1.1479 Add/sub is a bit slower, possibly noticeablymultiply 35587 10851 -24736 0.3049 Multiplication is signigicantly faster. Noticeable.divide 40521 18538 -21983 0.4575 Division is significantly faster. Noticeable.square root 86825 46831 -39994 0.5394 Square roots, are significantly faster. Noticeablenotes: TI-Floats are approximately 47 bits of precision. Float80 uses 64 bits of precision (that is 14 digits versus 19)