FreeBASIC CWstr

Started by Juergen Kuehlwein, April 09, 2018, 11:39:00 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

Juergen Kuehlwein

#135
@Marc,


Quotei have cleaned some mistakes from it

please could you explain which mistakes and what was the problem - thanks. I plan to take your version as a basis for a universal ustring.inc, while José´s WINFBX should be a (preferred) Windows alternative.


@all,


here is, what i have right now - it is work in progress. I removed everything from José´s version, which isn´t absolutely necessary and named it "ustring.inc", the pragma now is"ustring" too, i added it to "ustring.inc".

There is a #define (#define ustring jk.DWSTR) making it work with "USTRING" as data type for dynamic wide strings. José could add a similar #define (#define ustring afx.CWSTR) to make his framework work with "USTRING" as data type for dynamic wide strings as well. So it is possible to have the same code for different implementations, if you use "USTRING" as data type (dim u as ustring ...). Of course the original name(s) (afx.cwstr, jk.dwstr, etc. as hardcoded into the compiler) may be used without any restrictions. So nobody (especially José) will have to change existing code.

I had to make two decisive changes to José´s code:
- in OPERATOR DWSTR.[] the line "nIndex -= 1" must be removed to make it consistent with wstring, otherwise u[1] <> w[1]
- m_pBuffer must be the first member variable of the type in order to make "STRPTR" work for ustrings

Please test (maybe add own tests to test.bas) and try to find all the bugs i missed so far, obviously this is for Windows only so far. The compiler should work without having to prepend "**" to ustrings anymore.


@Jeff,


as i´m fairly new to GIT, i need your help. Initially i cloned from the GIT repository without creating a fork, then i created my own branch on my local repo. You wanted it on an extra fork at GitHub - how should i proceed ?

Looking at the code, where the decision is made, which overloaded procedure should be taken, i see the compiler just throws an error (ambiguous ...) if it finds more than one possibly matching procedure. You want it to take the best matching or (if given) the "default" (see previous discussion) procedure. And you only want an error message, if there is no matching procedure at all - right?


Attached: compiler.zip (the compiler´s code + ustring.zip, which includes ustring.inc and test.bas)


  •  

José Roca

Quote
- in OPERATOR DWSTR.[] the line "nIndex -= 1" must be removed to make it consistent with wstring, otherwise u[1] <> w[1]

You're wrong. FreeBsic's [] operator is zero based.
  •  

Juergen Kuehlwein

José,


please run this code (for console):

#include "afx\afxstr.inc"


dim i as integer
dim u as cwstr = "1234"
dim w as wstring * 16 = "1234"


  for i as integer = 1 to len(u)
    print i
    print u[i]
    print w[i]

    if(u[i] <> w[i]) then
      print "error"
      exit for
    end if
  next i


  sleep


end



it fails on my IDE as well as on Paul´s. If you remove or outcomment the line i mentioned - it runs. Please re-check.


JK
  •  

José Roca

Your code is wrong. Try:


'#CONSOLE ON
#define UNICODE
#INCLUDE ONCE "windows.bi"
#INCLUDE ONCE "Afx/cwstr.inc"

dim u as cwstr = "1234"
dim w as wstring * 16 = "1234"

for i as integer = 0 to len(w) - 1
   print i
   print chr(u[i]); "***"
   print chr(w[i]); "---"
next

PRINT
PRINT "Press any key..."
SLEEP

  •  

Juergen Kuehlwein

#139
Sorry José,


this what i get with your code, see Jose.png

when i change one line of it to this: print chr(u[i+1]); "***", it delivers what i would call the expected result, see JK.png


i used paul´s IDE for both and this is the code i have from WINFBX:

PRIVATE OPERATOR CWStr.[] (BYVAL nIndex AS UINT) AS USHORT
   IF nIndex < 1 OR nIndex > m_BufferLen \ 2 THEN EXIT OPERATOR
   ' Get the numeric character code at position nIndex
   nIndex -= 1
   OPERATOR = PEEK(USHORT, m_pBuffer + (nIndex * 2))
END OPERATOR


So for nIndex = 0, it returns nothing - as you can see in the screenshot...


This is what i use:

PRIVATE OPERATOR DWSTR.[] (BYVAL nIndex AS ulong) AS USHORT         
'***********************************************************************************************
' Returns the corresponding ASCII or Unicode integer representation of the character at
' the position specified by the nIndex parameter. allows to use the [] syntax, e.g. value = cws[1].
' Can't be used to change a value!
'***********************************************************************************************
  IF nIndex > m_BufferLen \ 2 THEN EXIT OPERATOR
  OPERATOR = PEEK(USHORT, m_pBuffer + (nIndex * 2))
END OPERATOR



Please tell me, what´s wrong


Attached are two screenshots of the two runs with different code (marked bold above)


JK
  •  

Juergen Kuehlwein

José,


put the blame on me! I just downloaded your latest version of WINFBX and there the offending line has been removed. My version of your WINFBX i used for testing was outdated. So we have the same opinion here - it´s zerobased.


JK
  •  

José Roca

  •  

Juergen Kuehlwein

@Jeff,


forget the question about GIT, i think i managed it myself. My fork is at: https://github.com/jklwn/fbc, the latest code is in "JK-USTRING" branch. I didn´t make a pull request yet, because i want to be sure, that bugs have been found and fixed as far as possible before.


@all,


no bugs ? I´m going to add some more tests and string handling functions to ustring.inc. Next i will try to get it running for LINUX and then will be the time for a pull request, i think.



JK
  •  

Jeff Marshall

Hey Juergen, that's correct for git, putting your changes on a separate branch.

I'm probably not going to be able to keep up with your posts. =)

So in earlier posts, I was talking about a variety of issues, and I know, some are not specific to what you are focused on.  Basically, we don't want to break user code now or later; sometimes it happens, and if it does, we try to have a good reason.  It's OK to have a temporary fix, but we should at least have an idea where the end goal is in the future, and that the goal is a possibility, even if it doesn't get done right away.  Really just leaving options open for ourselves for the eventual compiler feature or fix, whatever that is.   And hopefully we don't need to break user code now or in future.  The 3 issues I think are, adding the built-in dynamic wstring type, fixing UDT implicit casting for built in [w]string rtlib functions, solving ambiguous implicit UDT casting in general.  All pretty much independent of each other for implementation, but have to work together eventually in user code.

---

I only briefly looked at your branch:

You should generalize your test:

     if env.clopt.ustring then
       if (*jk__zz = "JK_CWSTR") or (*jk__zz = "CWSTR") or (*jk__zz = "CBSTR") or (*jk__zz = "DWSTR")then

to something like if( astMaybeUDTisaWstring( expr ) ) then or something similar, then just modify the AST expr.  Avoid the GOTO's, some of them are skipping over DIM statements.

Better would be to use #pragma push/pop, and attach the "UDT-is-a-Wstring" flag to the UDT's type-def.  That way any UDT NAME could work as a "wstring" for any UDT, now or future, other users can write their own classes if they want.  And if the type is declared in a namespace, or is a define, or a type alias, etc, it won't matter because the information that the UDT should be handled as a wstring is attached to the UDT typedef itself.

There seems to be a few extra files, not sure what's going on there in the ./contrib directory, the .jkp file, uppercaseing filenames in .gitignore.  If you really mean to make changes to build process, etc, that should be a separate pull request.  In otherwords, just include what is needed for the change you are proposing.

Writing tests means adding automated tests to the ./tests directory in the test-suite.

Sorry, that's all I have time for just now. =)
  •  

Juergen Kuehlwein

Jeff,


regarding GIT, i (at least intentionally) didn´t change anything else than some files in the compiler folder. The .jkp file is the project file for my IDE i obviously forgot to add to the ignore list. So please turn down all other changes.


QuoteAnd hopefully we don't need to break user code now or in future.
I absolutely agree that this should not happen, this is one reason why i want to be able to use the already existing syntax for the new dynamic wide string type - make (keep) it consistent.


I also agree on your listing of issues. Currently we have #1 solved externally, having it built-in would require a lot of extra work. #2 works specifically for the class names we have, generalizing it would be a next possible step. I could care for #3, if you want, if all we have now is tested and implemented.

 
Quotethen just modify the AST expr.
I actually don´t change the expr, i just redirect the code flow, based on the fact that one of our UDTs is recognized (by it´s name for now). The runtime library functions can deal with it without any changes. But in case of an UDT the code flow never reaches them.


QuoteAvoid the GOTO's, some of them are skipping over DIM statements.
A compiler is a very complicated piece of software, and as long as i don´t understand in depth, what happens, and why exactly it happens, i´m extremely careful with changes and try to isolate my new code as much as possible. In other words i know that "goto" is often considered to be bad coding style. But the alternative would be to make major changes in code structure and thus potentially breaking things, i even don´t know of. I took care that skipping DIM statements is safe, in that the skipped variables are not needed anymore in places i´m jumping to.


QuoteBetter would be to use #pragma push/pop
I understand the concept of #pragma, but what for is #pragma push/pop? As already explained above, it isn´t really necessary, i currently use it to isolate my new code from what´s already there. So, just in case there is unexpected behavior of the compiler, i have a means of switching my changes off, and i can test if this behavior persists or not. It´s more or less a debugging thing.


Quoteattach the "UDT-is-a-Wstring" flag to the UDT's type-def
Which would require some new syntax for specifying this flag. Why not just check for "can-cast-as-wstring", which would have the same effect without the need for any syntax change/addition.


For a schedule i would propose these steps:

1.) make sure, that, what we have now, really works FLAWLESSLY for Windows, LINUX and maybe other targets, which can deal with wide strings.

2.) integrate this into the next official release, maybe with #2 of your list, maybe without it. This is a timing thing: when will be the next release, and how much time will it need to get #2 tested and implemented?

3.) solve #2 and #3 of your list.

4.) make USTRING a built-in type (#1 of your list). Personally i can live with fact of having to include an extra file for dynamic wide strings, so it is last in my personal priority and my schedule. What do you think?


All of this doesn´t establish anything which breaks existing code or sets standards we must keep later on except for one single new reseved word: "USTRING" as new name for a dynamic wide string type.


I´m a great fan of doing things step by step (especially in coding and especially when applying changes to an otherwise working software), make sure that the current step really works before taking the next step. Everything else will lead to a big mess sooner or later.


JK

  •  

Juergen Kuehlwein

#145
Just made a push to my fork at: https://github.com/jklwn/fbc

This is still work in progress, so the code doesn´t look very tidy in various places. As far as i can tell, there seem to be no errors anymore. I did tests on 32 bit and 64 bit with a test app (test.bas - in attached ustring.zip) i wrote.

- ustrings now work with all fb intrinsic string functions without the need for "**"
- "strptr" returns a zstring ptr for zstring and string, and it now returns a wstring ptr for wstring and ustring
  (this requires a small change to José s code - see below)
- i added string (helper) functions (included in ustring.inc) similar to those available in PowerBASIC, this is based on José´s work in WINFBX, but i made it work with the same syntax as in PB. E.g. u2 = Extract_(u, any u1), u1 = Pathname_(path, u)


Changing the behavior of "strptr" shouldn´t break existing code: previously strptr returned a zstring ptr for a wstring, which was in effect useless unless casted to a different ptr. Now strptr returns a wstring ptr, and casting a wstring ptr to another ptr (or itself) doesn´t raise problems - so existing code should run anyway.

To get this to work with José´s code a small change is necessary: for "strptr" the compiler relies on the fact that for the "string" type (which is an UDT internally too) the data buffer is the first member variable. So for ustring i changed it like this:

TYPE CWSTR
    m_pBuffer AS UBYTE PTR        ' Pointer to the buffer, moved in first place

   Private:
      m_Capacity AS UINT            ' The total size of the buffer
      m_GrowSize AS LONG = 260 * 2  ' How much to grow the buffer by when required

   Public:
'      m_pBuffer AS UBYTE PTR        ' Pointer to the buffer (removed here)
      m_BufferLen AS UINT           ' Length in bytes of the current string in the buffer




José, do you see any problems here ? As m_pBuffer is public anyway, moving it in first position shouldn´t change or break anything else. "Strptr" doesn´t work for your CBstr yet, but i think i could fix this too. This is still somehow experimental, but would you change it for a final (release) version of all of this ?


I think there is a problem with the overloaded functions "CLNGINT" and "CULNGINT", please run this code with your code and then again use mine (ustring.inc):


  dim w as wstring * 50 = wstr(70000000001)
  dim u as ustring = wstr(70000000001)

  print "Test: CLNGINT"
  print "  u: -" & CLNGINT(u) & "  -  " & u
  print "  w: -" & CLNGINT(w) & "  -  " & w

  if CLNGINT(w) <> CLNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


  print
  print "Test: CULNGINT"
  print "  u: -" & CULNGINT(u) & "  -  " & u
  print "  w: -" & CULNGINT(w) & "  -  " & w

  if CULNGINT(w) <> CULNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


  sleep


end




I might be offline for a few days - Merry Christmas to all !


JK



PS:
@Jeff,

Quotefixing UDT implicit casting for built in [w]string rtlib functions

I think i have a quite simple solution for this, just have a look at "Parser-Compound-Select.bas", line 119. We could make can-cast-to-(w)string functions out of it and insert tests, in all places i currently check for the UDT´s name. This would be a generic approach then.
  •  

José Roca

Quote
José, do you see any problems here ? As m_pBuffer is public anyway, moving it in first position shouldn´t change or break anything else. "Strptr" doesn´t work for your CBstr yet, but i think i could fix this too. This is still somehow experimental, but would you change it for a final (release) version of all of this ?

I don't see any need to add support for STRPTR. Why to use the verbose STRPTR(cws) if you can use *cws, or simply cws if you're passing it to a procedure? Besides, I don't think that the FB team is going to accept a change that uses a hack.

  •  

Juergen Kuehlwein

Hi José,


I primarily wanted to know, if you see technical problems (codewise) with that change. You are the expert for your work and you proved me wrong more than one time in the past - so i don´t see a problem here, but nevertheless i´m just asking, if you see a problem.


I didn´t ask so much, if you actually would want to apply this change, but let´s discuss this topic too.


This is the way i see it:

I´m not working against you! I want to make it easier for other people to benefit from your work. José, your WINFBX is outstanding, and you did your very best to integrate it with the compiler. My goal is to adapt the compiler for a seamless integration without breaking your work.

And i know, you don´t need it for yourself, but if you want to have other people using your work, it should integrate seamlessly, which it currently doesn´t. This is not a shortcoming of your work, but a problem with the compiler. You found workarounds doing the job perfectly well, but nevertheless people will find workarounds difficult and discouraging. So in order to encourage people a seamless integration would be big step forward.

The more, you told me that you are not interested in Linux, but others are. They could benefit from your work as well, at least for the dynamic wide string part.

So the idea (as long as we don´t have an intrinsic dynamic wide string type) is to have an include file (ustring.inc) for this. This include file enables dynamic wide strings in Windows and Linux (and maybe others). Obviously your Framework works only for Windows. In Windows adding "ustring.inc" adds it´s dynamic wide string type, unless your CWstr is included. In other words: "ustring.inc" makes your type (CWstr) the standard dynamic wide string, if present. If CWstr is not present for whatever reason, it offers a fallback for a dynamic wide string. For Linux this fallback would be standard then.


Of course there is no need to use "strptr" (you supplied an even shorter workaround) and nobody forces you or everyone else to do so. But the task here is to make it consistent, so not being able to use "strptr" just like with the other (instrinsic) string types, would break consistency. Maybe there are other´s wanting to use strptr, because it´s the regular way for getting a pointer to the string´s data. FB´s intrinsic "STRING" dynamic string type can do that!

And in fact, what i do is not a hack, it is exactly how the compiler does it for it´s intrinsic dynamic string ("STRING") type, which internally is an UDT too. This UDT holds 3 members: the first one is a pointer to the data (just like m_pBuffer in CWstr), the second one holds the current length of the data, and the third one holds the the maximum size of the current buffer.

Your layout is similar but different in order. By making m_pBuffer the first member variable of CWstr you get the same layout in that the first member variable points to the data. So in fact a pointer to the type is a pointer to the pointer of the data. You only need to dereference the pointer to the type in order to get a pointer to the data. This means the same code in the compiler can be used to get a valid pointer to the data of CWstr (aka "strptr"). This simple change (and of course some small changes in the codeflow, to account for an UDT being passed to it) enable the compiler to work with it, just like with an intrinsic string type.

The very best of it is: It doesn´t cost nobody anything! It doesn´t break existing code, but it makes "Strptr" possible for CWstr without a hack!

Besides i think you didn´t write code, which depends on the order of members in CWstr, you always use operators or functions to access tha data, that´s what they are for. You always use it´s name, you never use the offset for retrieving the data pointer. So far for the theory, in practice, as far as i can tell, moving m_pBuffer into first position has no side effect on my applications depending on your WINFBX (amongst others a 32/64bit debugger for FB).


Why would you refuse to apply a change that doesn´t cost you anything, but will be a gain for others ? Please think about it. Of course such a change would become necessary only, if the FB team accepted my compiler changes - but this is my problem.


Did you run the test code i posted (CLNGINT ...)? When run with CWstr there is a difference in result between WSTRING and CWstr.


In the meantime i made "ustring.inc" independent of my IDE and independent of additional include files. The current state of the compiler is at: https://github.com/jklwn/fbc, and attached is what i currently have for "ustring.inc" + test code


JK
  •  

José Roca

I'm not refusing to apply that change in my header. What I'm saying is that I don't think that the PB team will accept your hack, but I may be wrong.

> Did you run the test code i posted (CLNGINT ...)? When run with CWstr there is a difference in result between WSTRING and CWstr.

An also if you use ustring. They will work correctly if you use **.

> The more, you told me that you are not interested in Linux, but others are. They could benefit from your work as well, at least for the dynamic wide string part.

You may need to change the code because on Linux wstrings are encoded in UCS-4 and a character takes up 4 bytes.

> but nevertheless people will find workarounds difficult and discouraging

In fact, there are only a couple of Chinese guys that use CWSTR, and these doesn't seem to be easily discouraged. Most of the other users use STRING and rely on automatic conversions.

My goal was to provide a framework that works both with ansi and unicode strings without having to provide "A" and "W" versions.

To guys wanting to work with unicode with FreeBasic, dynamic strings aren't its only problem. None of the FB intrinsic functions that deal with files accept and unicode string for the file name. My framework includes several classes to deal with files.

  •  

Juergen Kuehlwein

QuoteAn also if you use ustring. They will work correctly if you use **.

When using the new compiler version, i get  a correct result for ustring and i get a correct result for **CWstr, but the result is wrong for CWstr without "**"! Why is that?

Prepending "**" makes the compiler see a wstring while it is in fact an UDT. For a wstring it uses an internal wstring conversion function, for an UDT is must use an overloaded function. These overloaded functions differ.

ustring.inc:

PRIVATE FUNCTION Valint OVERLOAD (BYREF cws AS jk.DWSTR) AS long
   RETURN .VALINT(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValLNG OVERLOAD (BYREF cws AS jk.DWSTR) AS longint
   RETURN .VALLNG(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValUint OVERLOAD (BYREF cws as jk.DWSTR) AS ulong
   RETURN .VALUINT(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValULNG OVERLOAD (BYREF cws AS jk.DWSTR) AS ulongint
   RETURN .VALULNG(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION



CWstr.inc


' =====================================================================================
' Converts the string to a 32bit integer
' =====================================================================================
PRIVATE FUNCTION ValLng OVERLOAD (BYREF cws AS CWSTR) AS LONG
   RETURN .ValLng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================
' =====================================================================================
PRIVATE FUNCTION ValInt OVERLOAD (BYREF cws AS CWSTR) AS LONG
   RETURN .ValInt(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to an unsigned 32bit integer
' =====================================================================================
PRIVATE FUNCTION ValULng OVERLOAD (BYREF cws AS CWSTR) AS ULONG
   RETURN .ValULng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================
PRIVATE FUNCTION ValUInt OVERLOAD (BYREF cws AS CWSTR) AS ULONG
   RETURN .ValUInt(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to a 64bit integer
' =====================================================================================
PRIVATE FUNCTION ValLongInt OVERLOAD (BYREF cws AS CWSTR) AS LONGINT
   RETURN .ValLng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to an unsigned 64bit integer
' =====================================================================================
PRIVATE FUNCTION ValULongInt OVERLOAD (BYREF cws AS CWSTR) AS ULONGINT
   RETURN .ValULng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================



in fact the compiler implements these overloaded functions for the corresponding "c.." conversion functions for UDTs. As long as you prepend "**" the compiler sees a wstring and everything will be fine, because it is masked - the overloaded functions will never be used.

In the new compiler it is passed as what it is, an UDT, then the overloaded functions come into play and the difference becomes obvious.

According to the help file "VAL(U)LNG" should return a "(U)LONGINT". Please re-check this.


...and yes, there still was an error in "test.bas" not revealing this for "VAL(U)LNG as well. The number was not big enough to show the difference it should have been "wstr(12345678900)" to catch it.


JK
  •