Using MAT to initialize an Array?

Started by Theo Gottwald, September 05, 2010, 08:45:43 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

You have used an Array and now you just want to clear it and fill all Elements with a zero.

Here is it:

DIM a(1e4,1e3) AS LONG

Now we want to set all Elements to zero.
We can do it "by hand":

REGISTER R01 AS LONG, R02 AS LONG
FOR R01=0 TO 1E4
FOR R02=0 TO 1E3
   a(R01,R02)=0
NEXT
NEXT  



The resulting DISASM looks like this:

FLD SINGLE PTR [0040824C]
FISTP LONG PTR [EBP+FFFFFF5C]
MOV ESI, DWORD 00000000
JMP L40295F
FLD SINGLE PTR [00408250]
FISTP LONG PTR [EBP+FFFFFF54]
MOV EDI, DWORD 00000000
JMP L402953
MOV EAX, EDI
IMUL DWORD PTR [EBP+FFFFFF00]
ADD EAX, ESI
MOV EBX, DWORD PTR [EBP+FFFFFEE0]
MOV DWORD PTR [EBX+4*EAX], DWORD 00000000
INC EDI
MOV EAX, EDI
CMP EAX, DWORD PTR [EBP+FFFFFF54]
JLE SHORT L40293A
INC ESI
MOV EAX, ESI
CMP EAX, DWORD PTR [EBP+FFFFFF5C]
JLE SHORT L402923


My Speed-Test Index shows the number 380302 (smaller is better).
Let me add that using "#OPTIMIZE SIZE"
is just a bit (~7000 Units) faster then " #OPTIMIZE SPEED".
These details (ALIGNMENT) are dependend on CPU Architecture.

Now lets not use REGISTERS here. How much slower will it be?
My Speed Index shows "435000" means 15% slower by ignoring the Power of REGISTER VARIABLES.

Now lets try something else!
We just change the "1E3" and "1E4"  (=FP)  to Numbers (=LONGINT).

FOR R01=0 TO 10000
FOR R02=0 TO 1000
   a(R01,R02)=0
NEXT
NEXT


Surprise?
The resulting DisASM looks different. And we get a bit faster.
390000 around is the Speed Index.

40290C MOV ESI, DWORD 00000000
402912 MOV EDI, DWORD 00000000
402918 MOV EAX, EDI
40291A IMUL DWORD PTR [EBP+FFFFFEFC]
402920 MOV ECX, ESI
402922 ADD EAX, ECX
402924 MOV EBX, DWORD PTR [EBP+FFFFFEDC]
40292A MOV DWORD PTR [EBX+4*EAX], DWORD 00000000
402931 INC EDI
402933 CMP EDI, DWORD 000003E8
402939 JBE SHORT L402918
40293B INC ESI
40293D CMP ESI, DWORD 00002710
402943 JBE SHORT L402912


Now what we do is, we just change the Loops. Instead of 10000 times doing the 1000er loop, we do 1000 times the 10000er Loop.

FOR R02=0 TO 1000
FOR R01=0 TO 10000
   a(R01,R02)=0
NEXT
NEXT


Did you think about it?
The Speed Index is now around 111000 !

Now lets do something else. We try:

MAT a()= ZER

The Speed-Index is surprisingly just 100000 around!
We are faster using MAT instead of "handcoding"!

LEA EBX, DWORD PTR [EBP+FFFFFEE0]
CALL L404417
...
TEST BYTE PTR [EBX+04], BYTE 04
40441B JZ  SHORT L404437
40441D PUSH EDI
40441E CLD
40441F XOR EAX, EAX
404421 MOV ECX, DWORD PTR [EBX+08]
404424 IMUL ECX, DWORD PTR [EBX+14]
404428 MOV EDI, DWORD PTR [EBX]
40442A PUSH ECX
40442B SHR ECX, BYTE 02
40442E REPE: STOSD
404430 POP ECX
404431 AND ECX, BYTE 03
404434 REPE: STOSB
404436 POP EDI
404437 RET NEAR


But how about:

RESET a()

The Speed-Index is even faster: 98000 !

LEA EBX, DWORD PTR [EBP+FFFFFEE0]
MOV EAX, DWORD 00000004
CALL L403627
...
403627 CALL L403634
40362C JB  SHORT L40362F
40362E RET NEAR
...
403634 PUSH ESI
403635 PUSH EDI
403636 MOV AH, BYTE PTR [EBX+04]
403639 TEST AH, BYTE 04
40363C JZ  SHORT L40369B
40363E TEST AL, BYTE 01
403640 JZ  SHORT L403649
403642 TEST AH, BYTE 01
403645 JNZ SHORT L403697
403647 MOV AL, BYTE 08
403649 TEST AH, BYTE 02
40364C JNZ SHORT L403666
40364E TEST AL, BYTE 02
403650 JZ  SHORT L403657
403652 TEST AH, BYTE 01
403655 JNZ SHORT L403666
403657 PUSH EAX
403658 MOV AL, BYTE PTR [EBX+06]
40365B MOV ESI, DWORD PTR [EBX+08]
40365E MOV EDI, DWORD PTR [EBX]
403660 CALL L40369F
403665 POP EAX
403666 TEST AL, BYTE 04
403668 JZ  SHORT L403687
40366A MOV AL, BYTE PTR [EBX+06]
40366D CMP AL, BYTE 2E
40366F JZ  SHORT L40369B
403671 CMP AL, BYTE 24
403673 JZ  SHORT L40369B
403675 CMP AL, BYTE 22
403677 JZ  SHORT L40369B
403679 XOR EAX, EAX
40367B MOV EDX, DWORD PTR [EBX+0C]
40367E MOV EDI, DWORD PTR [EBX]
403680 CALL L4035C1
403685 JMP SHORT L40369B
403687 TEST AH, BYTE 01
40368A JNZ SHORT L403697
40368C XOR EAX, EAX
40368E XCHG EAX, DWORD PTR [EBX]
403690 CALL L402EE1
403695 JB  SHORT L40369C
403697 AND BYTE PTR [EBX+04], BYTE FA
40369B CLC
40369C POP EDI
40369D POP ESI
40369E RET NEAR


We don't go into details here, but its just a bit faster.

Can it get even faster?
We can try the PowerBasic Special-Command "DIM .. AT".


REGISTER R01 AS DWORD,R02 AS DWORD
LOCAL T01 AS LONG
DIM a(1e4,1e3) AS LONG
T01=VARPTR(a())
DIM b(1e7) AS LONG AT T01
FOR R01=0 TO 1E7-1
  b(R01)=0
NEXT  


Our Speed-Index shows 52000 - means we got a truly optimized Loop in here.

402910 LEA EBX, DWORD PTR [EBP+FFFFFEE4]
402916 MOV EAX, EBX
402918 MOV DWORD PTR [EBP+FFFFFF60], EAX
40291E PUSH BYTE 00
402920 FLD SINGLE PTR [00408254]
402926 FISTP QUAD PTR [EBP-6C]
402929 MOV EAX, DWORD PTR [EBP-6C]
40292C PUSH EAX
40292D PUSH BYTE 04
40292F PUSH DWORD 010A0001
402934 MOV EAX, DWORD PTR [EBP+FFFFFF60]
40293A PUSH EAX
40293B LEA EBX, DWORD PTR [EBP+FFFFFE70]
402941 CALL L4034CD
402946 FLD SINGLE PTR [00408254]
40294C CALL L4055C6
' ------------------------- LOOP
402951 MOV DWORD PTR [EBP+FFFFFF58], EAX
402957 MOV ESI, DWORD 00000000
40295D JMP L402973
402962 MOV EAX, ESI
402964 MOV EBX, DWORD PTR [EBP+FFFFFE70]
40296A MOV DWORD PTR [EBX+4*EAX], DWORD 00000000
402971 INC ESI
402973 MOV EAX, ESI
402975 CMP EAX, DWORD PTR [EBP+FFFFFF58]
40297B JBE SHORT L402962

' ------------------------- LOOP

This is the END. It won't get any faster here.
As a result, we can say that the Initialization of an Integer Array can be done in PowerBasic with just one Line:

MAT a() = ZER  

And we will not have a large penalty about speed.
Wecan also initialize the array with any other number.

MAT a() = CON(expr)  

For example, to fill the Array with "8", we would use:

MAT a() = CON(8)  
and the Speed-Index just shows 126000.

And this is also the Last DisASM for today:


40290D LEA EBX, DWORD PTR [EBP+FFFFFEE4]
402913 MOV EDX, EBX
402915 MOV EBX, DWORD PTR [EBX]
402917 PUSH EDX
402918 MOV EAX, DWORD 00000008  ' <--- here is our "8"
40291D POP EDX
40291E MOV DWORD PTR [EBX], EAX
402920 CALL L4042B6
...

4042B6 TEST BYTE PTR [EDX+04], BYTE 04
4042BA JZ  SHORT L4042D2
4042BC PUSH ESI
4042BD PUSH EDI
4042BE CLD
4042BF MOV EAX, DWORD PTR [EDX+14]
4042C2 MOV ECX, DWORD PTR [EDX+08]
4042C5 MOV ESI, DWORD PTR [EDX]
4042C7 LEA EDI, DWORD PTR [ESI+EAX]
4042CA DEC  ECX
4042CB IMUL ECX, EAX
4042CE REPE: CMPSB
4042D0 POP EDI
4042D1 POP ESI
4042D2 RET NEAR