Technique with a FASTPROC to use SSE2 Instructions

Theo Gottwald · February 19, 2024, 07:05:33 PM

Use the 128 bit XMM registers with 64 bit Scalar Double instructions. The "loadsd" macro addresses loading an immediate FP value via a DOUBLE variable as XMM registers can only be loaded via another XMM register or a 64 bit DOUBLE memory operand.

Code Select

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    MACRO loadsd(sse2reg,immval)    ' load a 64 bit immediate into an SSE register
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      valdbl = immval
      ! movsd sse2reg, valdbl
    END MACRO

    MACRO FUNCTION SseRtn           ' load an SSE register into a DOUBLE value
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      ! movsd valdbl, xmm0
    END MACRO = valdbl

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 FUNCTION PBmain as LONG

    loadsd(xmm0,1000.0)             ' load an immediate FP value via a DOUBLE into xmm0
    loadsd(xmm1,10.0)               ' load another FP DOUBLE immediate into xmm1
    ! call sse2                     ' call the FASTPROC

    Msgbox format$(SseRtn)          ' display value as string via FORMAT$

 End FUNCTION

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

FASTPROC sse2

    PREFIX "!"

    mulsd xmm0, xmm1                ' xmm0 is the return value

    END PREFIX

END FASTPROC

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Charles Pegge · February 21, 2024, 12:06:31 AM

In o2, operators can be implemented for just about any type by using operator macros, including Asm:

This demo uses the SIMD register as sets of 4 32bit floats for simple arithmetic. But I have not found a suitable use-case yet. Graphics hardware, with its massive parallel processing capability has mostly superceded the SSE2 technology.

Code Select

'PARTIAL DEMO / 32bit ASSEMBLER CODING

type simd
  float w,x,y,z
end type

  macro simd_"move" (a)
    movups xmm0,a
  end macro
  macro simd_"save" (a)
    movups a,xmm0
  end macro
  macro simd_"+" (a)
    movups xmm1,a
    addps xmm0,xmm1
  end macro
  macro simd_"-" (a)
    movups xmm1,a
    subps xmm0,xmm1
  end macro
  macro simd_"*" (a)
    movups xmm1,a
    mulps xmm0,xmm1
  end macro
  macro simd_"/" (a)
    movups xmm0,a
    divps xmm0,xmm1
  end macro


function str (simd*a) as string
  return a.w ", " a.x ", " a.y ", " a.z
end function
'

'TESTS
'#recordof simd_op
dim simd A={1,2,3,4}
dim simd B={10,20,30,40}
dim simd C[4]={100,200,300,400}
'
'a=b
'print str(A)
'print str(B)
print str(C)
print str(A+B)
print str(C*(A+B))

.\demos\Basics\OperatorsAsmSimd.o2bas

Theo Gottwald · February 21, 2024, 10:42:37 AM

As said, before you include CUDA in O2, i personally would prefer a "PB Compatibility Mode".

Charles Pegge · February 21, 2024, 03:56:43 PM

Those macro were used to implement 'operator overloading', which does not seem to be possible in PowerBasic.

In general o2 macros are quite similar to those in PowerBasic. Instead of MacroTemp, the private macro symbols are listed after the parameters, and in macro functions, the return symbol is the first parameter.

PowerBasic:

Code Select

    MACRO loadsd(sse2reg,immval)    ' load a 64 bit immediate into an SSE register
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      valdbl = immval
      ! movsd sse2reg, valdbl
    END MACRO

translates to: o2

Code Select

    MACRO loadsd(sse2reg,immval,  valdbl )    ' load a 64 bit immediate into an SSE register
      LOCAL valdbl as DOUBLE
      valdbl = immval
      movsd sse2reg, valdbl
    END MACRO

and PowerBasic:

Code Select

 MACRO FUNCTION SseRtn           ' load an SSE register into a DOUBLE value
    MACROTEMP valdbl
      LOCAL valdbl as DOUBLE
      ! movsd valdbl, xmm0
    END MACRO = valdbl

becomes o2:

Code Select

 MACRO SseRtn double(valdbl)  ' load an SSE register into a DOUBLE value
      movsd valdbl, xmm0
    END MACRO

Technique with a FASTPROC to use SSE2 Instructions

Theo Gottwald

Charles Pegge

Theo Gottwald

Charles Pegge