(Optimization) Compare DWORDS and others...

Started by Theo Gottwald, January 02, 2007, 09:45:29 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Taking a look into my DissASM, I realized that there was still some Floating-Point Code inside where i did not expect it. And did not need it.

This code for example:


DisASM shows what the compiler makes out of it:

40872F 0FB600                 MOVZX EAX,BYTE PTR [EAX]
408732 89C7                   MOV EDI, EAX
408734 C7C663000000           MOV ESI, DWORD 00000063
40873A 8975A4                 MOV DWORD PTR [EBP-5C], ESI
40873D C745A800000000         MOV DWORD PTR [EBP-58], DWORD 00000000
408744 DF6DA4                 FILD QUAD PTR [EBP-5C]
408747 897DA4                 MOV DWORD PTR [EBP-5C], EDI
40874A C745A800000000         MOV DWORD PTR [EBP-58], DWORD 00000000
408751 DF6DA4                 FILD QUAD PTR [EBP-5C]
408754 DED9                   FCOMPP
408756 DFE0                   FNSTSW AX
408758 9E                     SAHF
408759 7306                   JNB SHORT L408761
40875B FF8570FFFFFF           INC DWORD PTR [EBP+FFFFFF70]

Knowing that the result from the PEEK() Command is a Byte, I had expected something else.
Let me do a small change here, to get rid of the "FILD" and the "FCCOMPP".

I change the two variables from DWORD to LONG. As result we have exactly the same code, just that our register-variables are declared AS LONG instead of AS DWORD.


The result of the small change looks promissing:

40872F 0FB600                 MOVZX EAX,BYTE PTR [EAX]
408732 89C7                   MOV EDI, EAX
408734 C7C663000000           MOV ESI, DWORD 00000063
40873A 3BF7                   CMP ESI, EDI
40873C 7E06                   JLE SHORT L408744
40873E FF8570FFFFFF           INC DWORD PTR [EBP+FFFFFF70]

Just a small change - big change in result.
Maybe I'll declare more variables "AS LONG" where possible (where unsigned/signed is not important).

Donald Darden

The PowerBasic Compilers are currently optimized for handling LONG Integer math. It is frequently suggested that you use LONG variables for the most efficient results.  This also agrees with VB, which generally passes integer variables as LONGs (VB does not support DWORD as a type).

However, in ASM (which is the low level language that all compilers produce), there are distinct differences in determining the results of comparing a ong with another long. a long with a DWORD, a DWORD with a Long, or a DWORD with another DWORD.  Jump instructions (all starting with "J") can compare for equality, non-equality, greater than or above, less than or below, or a number of  ombinations, such as Greater Than or Equal.

The trick is, that Greater Than does not mean the same thing as Above, and Less Than does not mean the same as Below.  Greater Than is used as a signed comparison, just as Less Than is, meaning if the highest order bit is set in the integer type, it is considered a negative number, and that is taken into consideration before checking whether the absolute value of one is greater than the other or not.  Above and Below skip the sign bit test, so they threat the comparison as an unsigned (byte, word, or dword) integer during the compare.

The comparison of a DWORD and LONG, in either order, then becomes an issue, and this takes some effort to resolve.  First, your code would have to determine if one is a dword and the other is a long, then if the dword has the upper bit set, it automatically is greater than the long.  If the upper bit of the Long is set, it automatically is less than the dword.  This is because about half the range of values that each type supports is beyond the range of the other, and identifying when you are in that range is determined by the sign (uppermost) bit for each.

However, there is another way to make such comparisons, and that is to translate both into floating point, which encompasses both ranges completely, and then perform floating point math to determine which is the greater or lesser.  But floating point math is slower, though the built-in FPU in modern PCs are quite fast, and do help keep the time penalty from being excessive.

Keep in mind, that an optimizing compiler is any compiler that attempts to improve on the code that you write.  However, the compiler can only optimize to  the extend possible within other constraints.  The first is time - PowerBasic's compile process is super-fast, meaning you can stop and compile at any time to make sure your code will compile, and test-drive your code with or without the debugger.  Spending too much time overanalyzing every statement would seriously slow the compile process.

The other constraint is the size of the finished program.  PowerBasic also boasts the ability to make programs with small footprints, but that means finding ways to handle special cases in ways that are not only expeditious, but give reasonably tight code.

As a consequence, you will likely find ways to further optimize the code by doing exactly this:  Check the finished Assembler code.  But this requires quite a bit of familiarity with certain tools and with Assembler instructions, registers, use of memory, and other aspects of the PC architecture.  In other words, it takes time and effort to achieve significant results.  There is a general rule that is often quoted in the industry:  About 90 percent of program's execution time is taken up by just 10 percent of the code.  You find the ten percent of the code which is hogging so many cpu cycles, and you focus on trying to make it faster.  Or you advise the customer to buy a faster computer - that's what Microsoft generally does.