Use the 128 bit XMM registers with 64 bit Scalar Double instructions. The "loadsd" macro addresses loading an immediate FP value via a DOUBLE variable as XMM registers can only be loaded via another XMM register or a 64 bit DOUBLE memory operand.
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
MACRO loadsd(sse2reg,immval) ' load a 64 bit immediate into an SSE register
MACROTEMP valdbl
LOCAL valdbl as DOUBLE
valdbl = immval
! movsd sse2reg, valdbl
END MACRO
MACRO FUNCTION SseRtn ' load an SSE register into a DOUBLE value
MACROTEMP valdbl
LOCAL valdbl as DOUBLE
! movsd valdbl, xmm0
END MACRO = valdbl
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION PBmain as LONG
loadsd(xmm0,1000.0) ' load an immediate FP value via a DOUBLE into xmm0
loadsd(xmm1,10.0) ' load another FP DOUBLE immediate into xmm1
! call sse2 ' call the FASTPROC
Msgbox format$(SseRtn) ' display value as string via FORMAT$
End FUNCTION
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FASTPROC sse2
PREFIX "!"
mulsd xmm0, xmm1 ' xmm0 is the return value
END PREFIX
END FASTPROC
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
In o2, operators can be implemented for just about any type by using operator macros, including Asm:
This demo uses the SIMD register as sets of 4 32bit floats for simple arithmetic. But I have not found a suitable use-case yet. Graphics hardware, with its massive parallel processing capability has mostly superceded the SSE2 technology.
'PARTIAL DEMO / 32bit ASSEMBLER CODING
type simd
float w,x,y,z
end type
macro simd_"move" (a)
movups xmm0,a
end macro
macro simd_"save" (a)
movups a,xmm0
end macro
macro simd_"+" (a)
movups xmm1,a
addps xmm0,xmm1
end macro
macro simd_"-" (a)
movups xmm1,a
subps xmm0,xmm1
end macro
macro simd_"*" (a)
movups xmm1,a
mulps xmm0,xmm1
end macro
macro simd_"/" (a)
movups xmm0,a
divps xmm0,xmm1
end macro
function str (simd*a) as string
return a.w ", " a.x ", " a.y ", " a.z
end function
'
'TESTS
'#recordof simd_op
dim simd A={1,2,3,4}
dim simd B={10,20,30,40}
dim simd C[4]={100,200,300,400}
'
'a=b
'print str(A)
'print str(B)
print str(C)
print str(A+B)
print str(C*(A+B))
.\demos\Basics\OperatorsAsmSimd.o2bas
As said, before you include CUDA in O2, i personally would prefer a "PB Compatibility Mode".
Those macro were used to implement 'operator overloading', which does not seem to be possible in PowerBasic.
In general o2 macros are quite similar to those in PowerBasic. Instead of MacroTemp, the private macro symbols are listed after the parameters, and in macro functions, the return symbol is the first parameter.
PowerBasic:
MACRO loadsd(sse2reg,immval) ' load a 64 bit immediate into an SSE register
MACROTEMP valdbl
LOCAL valdbl as DOUBLE
valdbl = immval
! movsd sse2reg, valdbl
END MACRO
translates to: o2
MACRO loadsd(sse2reg,immval, valdbl ) ' load a 64 bit immediate into an SSE register
LOCAL valdbl as DOUBLE
valdbl = immval
movsd sse2reg, valdbl
END MACRO
and PowerBasic:
MACRO FUNCTION SseRtn ' load an SSE register into a DOUBLE value
MACROTEMP valdbl
LOCAL valdbl as DOUBLE
! movsd valdbl, xmm0
END MACRO = valdbl
becomes o2:
MACRO SseRtn double(valdbl) ' load an SSE register into a DOUBLE value
movsd valdbl, xmm0
END MACRO