Technique with a FASTPROC to use SSE2 Instructions

Started by Theo Gottwald, February 19, 2024, 07:05:33 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Use the 128 bit XMM registers with 64 bit Scalar Double instructions. The "loadsd" macro addresses loading an immediate FP value via a DOUBLE variable as XMM registers can only be loaded via another XMM register or a 64 bit DOUBLE memory operand.


' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    MACRO loadsd(sse2reg,immval)    ' load a 64 bit immediate into an SSE register
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      valdbl = immval
      ! movsd sse2reg, valdbl
    END MACRO

    MACRO FUNCTION SseRtn           ' load an SSE register into a DOUBLE value
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      ! movsd valdbl, xmm0
    END MACRO = valdbl

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 FUNCTION PBmain as LONG

    loadsd(xmm0,1000.0)             ' load an immediate FP value via a DOUBLE into xmm0
    loadsd(xmm1,10.0)               ' load another FP DOUBLE immediate into xmm1
    ! call sse2                     ' call the FASTPROC

    Msgbox format$(SseRtn)          ' display value as string via FORMAT$

 End FUNCTION

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

FASTPROC sse2

    PREFIX "!"

    mulsd xmm0, xmm1                ' xmm0 is the return value

    END PREFIX

END FASTPROC

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Charles Pegge

#1
In o2, operators can be implemented for just about any type by using operator macros, including Asm:

This demo uses the SIMD register as sets of 4 32bit floats for simple arithmetic. But I have not found a suitable use-case yet. Graphics hardware, with its massive parallel processing capability has mostly superceded the SSE2 technology.

'PARTIAL DEMO / 32bit ASSEMBLER CODING

type simd
  float w,x,y,z
end type

  macro simd_"move" (a)
    movups xmm0,a
  end macro
  macro simd_"save" (a)
    movups a,xmm0
  end macro
  macro simd_"+" (a)
    movups xmm1,a
    addps xmm0,xmm1
  end macro
  macro simd_"-" (a)
    movups xmm1,a
    subps xmm0,xmm1
  end macro
  macro simd_"*" (a)
    movups xmm1,a
    mulps xmm0,xmm1
  end macro
  macro simd_"/" (a)
    movups xmm0,a
    divps xmm0,xmm1
  end macro


function str (simd*a) as string
  return a.w ", " a.x ", " a.y ", " a.z
end function
'

'TESTS
'#recordof simd_op
dim simd A={1,2,3,4}
dim simd B={10,20,30,40}
dim simd C[4]={100,200,300,400}
'
'a=b
'print str(A)
'print str(B)
print str(C)
print str(A+B)
print str(C*(A+B))
.\demos\Basics\OperatorsAsmSimd.o2bas

Theo Gottwald

As said, before you include CUDA in O2, i personally would prefer a "PB Compatibility Mode".

Charles Pegge

#3
Those macro were used to implement 'operator overloading', which does not seem to be possible in PowerBasic.

In general o2 macros are quite similar to those in PowerBasic. Instead of MacroTemp, the private macro symbols are listed after the parameters, and in macro functions, the return symbol is the first parameter.

PowerBasic:
    MACRO loadsd(sse2reg,immval)    ' load a 64 bit immediate into an SSE register
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      valdbl = immval
      ! movsd sse2reg, valdbl
    END MACRO

translates to: o2
    MACRO loadsd(sse2reg,immval,  valdbl )    ' load a 64 bit immediate into an SSE register
      LOCAL valdbl as DOUBLE
      valdbl = immval
      movsd sse2reg, valdbl
    END MACRO

and PowerBasic:
MACRO FUNCTION SseRtn           ' load an SSE register into a DOUBLE value
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      ! movsd valdbl, xmm0
    END MACRO = valdbl

becomes o2:
MACRO SseRtn double(valdbl)  ' load an SSE register into a DOUBLE value
      movsd valdbl, xmm0
    END MACRO