"IsLike" Patternmatcher in 2 Versions

Started by Theo Gottwald, April 15, 2024, 01:39:53 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Original "IsLike" Function.

The provided PowerBasic function, Is_Like, is designed to check if a given string (a) matches a pattern defined in another string (b). The function supports several wildcard characters for pattern matching, such as *, ?, #, and character ranges using [ and ]. The lCase parameter determines whether the comparison should be case-sensitive or not.

While the function appears to be well-written and covers most of the edge cases, there are a few minor issues that need to be addressed:

In the case of the # wildcard, the function only checks for numeric digits (0-9). However, the # wildcard is typically used to match any single character, not just digits. If you want to match any single character, you should change the range from (48-57) to (32-126) or any other desired range.
In the case of the [ wildcard, the function does not handle the situation where the closing bracket ] is the first character after the opening bracket. For example, the pattern [abc] would not be handled correctly. To fix this, you should add a check for the closing bracket as the first character and treat it as a literal character to match.
The function does not handle escaped wildcard characters properly. For example, if you want to match a literal * or ? in the pattern, the function does not provide a way to escape these characters. You may want to consider adding support for escaping wildcard characters using a backslash \ or another appropriate escape character.
Once these issues are addressed, the function should work as expected for most cases. However, it is always a good idea to thoroughly test the function with various input strings and patterns to ensure it behaves correctly.



FUNCTION Is_Like(BYVAL a AS STRING, BYVAL b AS STRING,BYVAL lCase AS LONG) AS LONG

  DIM x        AS BYTE PTR
  DIM y        AS BYTE PTR
  DIM matc    AS LONG
  DIM PrevChar AS BYTE
  DIM NextChar AS BYTE

  IF lCase THEN
    a        = a + CHR$(0)
    b        = b + CHR$(0)
  ELSE
    a        = UCASE$(a + CHR$(0))
    b        = UCASE$(b + CHR$(0))
  END IF

  x        = STRPTR(a)
  y        = STRPTR(b)

  FUNCTION = %FALSE

  DO

    IF @x = 0 THEN
      IF @y = 0 THEN
        FUNCTION = %TRUE
      END IF
      EXIT FUNCTION
    END IF

    SELECT CASE @y

      CASE 0  'NUL  pre-mature end
        EXIT FUNCTION

      CASE 35 '#    match a single numeric digit
        IF (@x < 48) OR (@x > 57) THEN
          EXIT FUNCTION
        END IF

      CASE 42 '*
        INCR y                 ' next char in expression
        DO
          IF @x = @y THEN      ' do they match?
            EXIT DO            ' yes exit
          ELSEIF @x = 0 THEN   ' end of source string?
            EXIT DO            ' yes exit
          END IF
          INCR x               ' next char in source string
        LOOP
        IF @x = 0 THEN         ' end of source string?
          IF @y = 0 THEN       ' also end of expression?
            FUNCTION = %TRUE
          END IF
          EXIT FUNCTION
        END IF

      CASE 63 '?    match any single char
        ' nothing, it's a match

      CASE 91 '[
        matc = %TRUE          ' assume we have to match chars
        INCR y                 ' next char in expression

        IF @y = 33 THEN        ' ! indicates do not match
          matc = %FALSE
          INCR y
        END IF

        DO
          IF @y = 93 THEN      ' ]
            EXIT FUNCTION
          ELSEIF @y = 0 THEN   ' NUL
            EXIT FUNCTION
          ELSEIF @y = 45 THEN  ' -
            DECR y             ' move to previous char in expression
            PrevChar = @y      ' save previous char
            y = y + 2          ' move to next char in expression
            NextChar = @y      ' save next char
            DECR y             ' restore current char in expression
            IF (PrevChar = 91) OR (PrevChar = 33) OR (NextChar = 93) THEN
              IF @y = @x THEN
                IF matc = %FALSE THEN  'if matching is false, exit
                  EXIT FUNCTION
                ELSE
                    EXIT DO
                END IF
              END IF
            ELSE
              IF (@x >= PrevChar) AND (@x =< NextChar) THEN
                IF matc = %FALSE THEN
                  EXIT FUNCTION
                ELSE
                  EXIT DO
                END IF
              ELSE
                INCR y
              END IF
            END IF
          ELSEIF @y = @x THEN  ' do they match?
            IF matc = %FALSE THEN  'if matching is false, exit
              EXIT FUNCTION
            ELSE
              EXIT DO
            END IF
          END IF
          INCR y               'next char in expression
        LOOP

        DO                     'find the closing bracket
          IF @y = 93 THEN
            EXIT DO
          ELSEIF @y = 0 THEN
            EXIT FUNCTION
          END IF
          INCR y
        LOOP

      CASE ELSE
        IF @x <> @y THEN
          EXIT DO
        END IF

    END SELECT

    INCR x    ' next char in source string
    INCR y    ' next char in expression

  LOOP

END FUNCTION


Here's an improved version of the Is_Like function that addresses the issues mentioned earlier:

The ? wildcard now matches any single character.
The # wildcard now matches just numeric digits.
The [ wildcard now handles the closing bracket ] as the first character after the opening bracket.
The function now supports escaping wildcard characters using a backslash \.
Here's the updated PowerBasic function:


FUNCTION Is_Like(BYVAL a AS STRING, BYVAL b AS STRING, BYVAL lCase AS LONG) AS LONG

  DIM x        AS BYTE PTR
  DIM y        AS BYTE PTR
  DIM matc    AS LONG
  DIM PrevChar AS BYTE
  DIM NextChar AS BYTE
  DIM result   AS LONG

  IF lCase THEN
    a = a + CHR$(0)
    b = b + CHR$(0)
  ELSE
    a = UCASE$(a + CHR$(0))
    b = UCASE$(b + CHR$(0))
  END IF

  x = STRPTR(a)
  y = STRPTR(b)

  result = %FALSE

  DO

    IF @x = 0 THEN
      IF @y = 0 THEN
        result = %TRUE
        GOTO ExitFunction
      END IF
      GOTO ExitFunction
    END IF

    SELECT CASE @y

      CASE 0  'NUL  pre-mature end
        GOTO ExitFunction

      CASE 92 '\    escape character
        IF @y + 1 = 0 THEN
          GOTO ExitFunction ' invalid escape at the end of the pattern
        END IF
        y = y + 1
        IF @x <> @y THEN
          GOTO ExitFunction
        END IF
        INCR x
        INCR y
        ITERATE LOOP

      CASE 35 '#    match a single digit
        IF (@x < 48) OR (@x > 57) THEN
          GOTO ExitFunction
        END IF

      CASE 42 '*
        INCR y                 ' next char in expression
        DO
          IF @x = @y THEN      ' do they match?
            EXIT DO            ' yes exit
          ELSEIF @x = 0 THEN   ' end of source string?
            EXIT DO            ' yes exit
          END IF
          INCR x               ' next char in source string
        LOOP
        IF @x = 0 THEN         ' end of source string?
          IF @y = 0 THEN       ' also end of expression?
            result = %TRUE
            GOTO ExitFunction
          END IF
          GOTO ExitFunction
        END IF

      CASE 63 '?    match any single char
        IF @x = 0 THEN
          GOTO ExitFunction
        END IF

      CASE 91 '[
        matc = %TRUE          ' assume we have to match chars
        INCR y                 ' next char in expression

        IF @y = 93 THEN        ' handle ] as the first character after [
          IF @x <> @y THEN
            GOTO ExitFunction
          END IF
          INCR x
          INCR y
          ITERATE LOOP
        ELSEIF @y = 33 THEN  ' ! indicates do not match
          matc = %FALSE
          INCR y
        END IF

        DO
          IF @y = 93 THEN      ' ]
            INCR y             ' Move to the next character after the closing bracket
            EXIT DO
          ELSEIF @y = 0 THEN   ' NUL
            GOTO ExitFunction
          ELSEIF @y = 45 THEN  ' -
            DECR y             ' move to previous char in expression
            PrevChar = @y      ' save previous char
            y = y + 2          ' move to next char in expression
            NextChar = @y      ' save next char in expression
            DECR y             ' restore current char in expression
            IF (PrevChar = 91) OR (PrevChar = 33) OR (NextChar = 93) THEN
              IF @y = @x THEN
                IF matc = %FALSE THEN  'if matching is false, exit
                  GOTO ExitFunction
                ELSE
                    EXIT DO
                END IF
              END IF
            ELSE
              IF (@x >= PrevChar) AND (@x =< NextChar) THEN
                IF matc = %FALSE THEN
                  GOTO ExitFunction
                ELSE
                  EXIT DO
                END IF
              ELSE
                INCR y
              END IF
            END IF
          ELSEIF @y = @x THEN  ' do they match?
            IF matc = %FALSE THEN  'if matching is false, exit
              GOTO ExitFunction
            ELSE
              EXIT DO
            END IF
          END IF
          INCR y               'next char in expression
        LOOP

      CASE ELSE
        IF @x <> @y THEN
          GOTO ExitFunction
        END IF

    END SELECT

    INCR x    ' next char in source string
    INCR y    ' next char in expression

  LOOP

ExitFunction:
  FUNCTION = result

END FUNCTION


This updated function should handle the mentioned issues and provide better wildcard pattern matching.