An ASM Exercise In Number Crunching

Started by Charles Pegge, June 13, 2007, 02:00:34 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Charles Pegge

How To Do Pythagoras In Bulk

You can gain considerable performance by processing arrays in assembler, by taking advantage of being able to do most of the work in the registers. And the annoted may look very bulky but this assembles to very few byte of code.

The Floating point unit (FPU) has a stack of 8 registers and is loaded by pushing stuff onto the top of the stack and popping it off again when no longer required. It is important to leave the FPU clean of any debris.
The pop instruction  is part of many operations such as faddp and fstp.

The most significant differences between FreeBasic and PowerBasic are that FB does its multidimensional  arrays (major,minor) whereas PB does its arrays: (minor,major). Also when passing basic variables to registers, FB requires indirection thus: mov eax,[ a] whereas PB does it directly: mov eax, a.

PowerBasic Version

' PYTHAGORAS 3D
' Charles E V Pegge
' 13 June 2007

' PowerBasic PBWin Version

#COMPILE EXE
#DIM ALL
' PYTHAGORAS 3D
' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data

' parameter a:  the pointer to an x y z array of coordinates in 64 bit floating point.
' parameter r:  the pointer to an array for the result r.
' parameter c:  the number of points to process

FUNCTION pythagoras3d( BYVAL a AS DOUBLE PTR, BYVAL r AS DOUBLE PTR, BYVAL c AS LONG) AS LONG
'ASM
'----------------------------'--------------------------------------------------------------------
! mov eax, a                 ' eax will be our source index
! mov edx, r                 ' edx will be our destination index
! mov ecx, c                 ' ecx will be our down-counter
  repeats:                   ' DO
!   dec ecx                  ' if ecx was initially 0 then the loop is completely skipped
!   jl exits                 ' when ecx is -1 or less
!   fld  QWORD PTR [eax]     ' push the x coordinate onto the FPU register stack
!   add eax,8                ' advance eax for the next coordinate (CPU and FPU work in parallel)
!   fmul st(0),st(0)         ' square the x coordinate
!   fld QWORD PTR [eax]      ' push the y coordinate onto the fpu stack
!   add eax,8                ' advance the eax pointer again
!   fmul st(0),st(0)         ' square the y
!   faddp st(1),st(0)        ' add this to the squared x then pop the y to discard it
!   fld QWORD PTR [eax]      ' load the z coordinate onto the fpu stack
!   add eax,8                ' advance the pointer ready for the next xyz point
!   fmul st(0),st(0)         ' square the z
!   faddp st(1),st(0)        ' add this value to the others and pop to discard it.
!   fsqrt                    ' take the quare root of this sum of squares x,y,z
!   fstp QWORD PTR [edx]     ' store the result at the r pointer and pop it from the stack
!   add edx,8                ' advance the r pointer edx ready for the next result.
!  jmp repeats               ' repeat the loop
exits:                      ' exit point
! mov function, ecx          ' the value in our down counter should always be -1
'END ASM                     ' done
'----------------------------'--------------------------------------------------------------------
END FUNCTION


FUNCTION PBMAIN () AS LONG

DIM xyz(2,1000) AS DOUBLE
DIM rads(1000) AS DOUBLE
DIM a AS LONG
xyz(0,2)=3
xyz(1,2)=4
xyz(2,2)=5
a=pythagoras3d(VARPTR(xyz(0,0)),VARPTR(rads(0)),10)
MSGBOX STR$( rads(2))
' dim xx as double ptr =varptr(xyz(0,0)): print xx[5] ' test: arrays in freebasic are major,minor
' whereas in PowerBasic the order id minor,major.
' answer should be 7.071067811865476
'PRINT "end"


'-------------------------------------

END FUNCTION


FreeBasic Version

' PYTHAGORAS 3D

' Charles E V Pegge
' 13 June 2007

'FreeBasic Version



' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data

' parameter a:  the pointer to an x y z array of coordinates in 64 bit floating point.
' parameter r:  the pointer to an array for the result r.
' parameter c:  the number of points to process

function pythagoras3d( byval a as double ptr, byval r as double ptr, byval c as long) as long
asm
'---------------------------'--------------------------------------------------------------------
mov eax,[a]                ' eax will be our source index
mov edx,[r]                ' edx will be our destination index
mov ecx,[c]                ' ecx will be our down-counter
repeats:                   ' DO
   dec ecx                  ' if ecx was initially 0 then the loop is completely skipped
   jl exits                 ' when ecx is -1 or less
   fld  qword ptr [eax]     ' push the x coordinate onto the FPU register stack
   add eax,8                ' advance eax for the next coordinate (CPU and FPU work in parallel)
   fmul st(0),st(0)         ' square the x coordinate
   fld qword ptr [eax]      ' push the y coordinate onto the fpu stack
   add eax,8                ' advance the eax pointer again
   fmul st(0),st(0)         ' square the y
   faddp st(1),st(0)        ' add this to the squared x then pop the y to discard it
   fld qword ptr [eax]      ' load the z coordinate onto the fpu stack
   add eax,8                ' advance the pointer ready for the next xyz point
   fmul st(0),st(0)         ' square the z
   faddp st(1),st(0)        ' add this value to the others and pop to discard it.
   fsqrt                    ' take the quare root of this sum of squares x,y,z
   fstp qword ptr [edx]     ' store the result at the r pointer and pop it from the stack
   add edx,8                ' advance the r pointer edx ready for the next result.
  jmp repeats               ' repeat the loop
exits:                     ' exit point
mov [function], ecx        ' the value in our down counter should always be -1
end asm                     ' done
'---------------------------'--------------------------------------------------------------------
end function

dim xyz(1000,2) as double
dim radius(1000) as double
xyz(2,0)=3
xyz(2,1)=4
xyz(2,2)=5
print pythagoras3d(varptr(xyz(0,0)),varptr(radius(0)),10)
print radius(2)
' dim xx as double ptr =varptr(xyz(0,0)): print xx[5] ' test: arrays in freebasic are major,minor
' answer should be 7.071067811865476
print "end"


'-------------------------------------