Levenshtein Distance and "find best Match" in Purebasic

Started by Theo Gottwald, September 16, 2023, 05:02:03 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

🌟 FindBestMatch Subprogram: Your Voice-Activated Command Center 🌟

📝 Description
The FindBestMatch subprogram is a powerful tool designed to find the best match for a given string within a list of strings. It uses the Levenshtein distance algorithm for string comparison, ensuring high accuracy. This is particularly useful when you're working with voice commands recognized by Whisper. The subprogram returns the number corresponding to the best match, making it easier to trigger specific actions or commands in your project. 🎯

📋 Parameters
U01: This is a comma or semicolon-separated list of strings. Each string can optionally start with a numeric value between 0 and 9999, followed by either a . or a - and any number of spaces. The numeric value can be missing. 📊

U02: This is the string you want to compare against all the cleaned-up elements in U01. 📝

📤 Returns
The subprogram returns the number corresponding to the best match in the list. If an error occurs, it returns -1. 🚀

📚 Important Variables
Ele(): This array holds the split elements from U01. 📦

Numbers(): This array holds the numbers associated with each element. 🔢

Element: This is an individual element from the list. 📜

CleanElement: This is the element after removing the numeric value, ".", "-", and spaces. 🧹

Number: This is the numeric value in front of each element, if present. 🎰

MinDistance: This is the minimum Levenshtein distance found so far. 📏

CurrentDistance: This is the Levenshtein distance for the current element. 📐

BestMatchNumber: This is the number corresponding to the best match. 🏆

ErrorCode: This is the error code for handling invalid inputs. ⚠️

🛠 Code Structure
The subprogram starts by initializing key variables like MinDistance, BestMatchNumber, and ErrorCode. It then proceeds to split the elements from U01 and store them in the Ele() array. The Levenshtein distance algorithm is applied to find the best match, and the corresponding number is returned.

🤔 Did You Notice?
If you're thinking, "Hey, this coding style looks like it's from ChatGPT," you're absolutely right! 🕵��♀️

🎉 Conclusion
The FindBestMatch subprogram is an invaluable tool for making your projects smarter and more interactive. Whether you're using voice commands or text-based inputs, this subprogram ensures that the most relevant action is triggered. 🌈

Happy Coding! 🎉👩�💻👨�💻

;--------------------------------------------------------------------
;           
;--------------------------------------------------------------------

ProcedureDLL.l LevenshteinDistance(string1.s, string2.s)
  Protected length1.l
  Protected length2.l 
  Protected i.l, j.l
 
  length1 = Len(string1)
  length2 = Len(string2)
 
  Dim matrix.l(length1, length2)
 
  For i = 0 To length1
    matrix(i, 0) = i
  Next i
 
  For j = 0 To length2
    matrix(0, j) = j
  Next j
 
  For j = 1 To length2
    For i = 1 To length1
      If Mid(string1, i, 1) = Mid(string2, j, 1)
        matrix(i, j) = matrix(i-1, j-1)
      Else
        matrix(i, j) = Min3(matrix(i-1, j) + 1, matrix(i, j-1) + 1, matrix(i-1, j-1) + 1)
      EndIf
    Next i
  Next j
  i=matrix(length1, length2)
  ProcedureReturn i
EndProcedure

;--------------------------------------------------------------------
; ----------------------------------------------------------------------------
; Procedure: FindBestMatch
; Description: Finds the best match for a given string in a list of strings.
;              Uses Levenshtein distance for string comparison.
;
; Parameters:
;  U01 - Comma or semicolon-separated list of strings. Each string can optionally
;        start with a numeric value between 0 and 9999, followed by either a "."
;        or a "-" and any number of spaces. The numeric value can be missing.
;  U02 - String to compare against all the cleaned-up elements in U01.
;
; Returns:
;  Number corresponding to the best match in the list. Returns -1 if an error occurs.
;
; Important Variables:
;  Ele() - Array to hold the split elements from U01.
;  Numbers() - Array to hold the numbers associated with each element.
;  Element - Individual element from the list.
;  CleanElement - Element after removing numeric value, ".", "-", and spaces.
;  Number - Numeric value in front of each element, if present.
;  MinDistance - Minimum Levenshtein distance found so far.
;  CurrentDistance - Levenshtein distance for the current element.
;  BestMatchNumber - Number corresponding to the best match.
;  ErrorCode - Error code for handling invalid inputs.
; ----------------------------------------------------------------------------

ProcedureDLL.l FindBestMatch(U01.s, U02.s)
  Protected Element.s, CleanElement.s, Number.s, Temp.s
  Protected MinDistance.i, CurrentDistance.i, BestMatchNumber.i
  Protected i.i, j.i, ErrorCode.l, EleCount.i, Pos.i
 
  ; Initialize
  MinDistance = 9999
  BestMatchNumber = 0
  ErrorCode = 0
  EleCount = 0
 
  ; Validate input
  If U01 = "" Or U02 = ""
    Set_Error("Invalid input: Both U01 and U02 must be non-empty.")
    ErrorCode = 1
    Goto ExitPoint
  EndIf
 
  ; Count elements to initialize array size
  EleCount = CountString(U01, ",") + CountString(U01, ";") + 1
 
  ; Initialize the arrays
  Dim Ele.s(EleCount)
  Dim Numbers.i(EleCount)
 
  ; Manually split the list based on comma or semicolon
  i = 0
  While Len(U01) > 0
    Pos = FindString(U01, ",", 1)
    If Pos = 0
      Pos = FindString(U01, ";", 1)
    EndIf
   
    If Pos > 0
      Temp = Left(U01, Pos - 1)
      U01 = Mid(U01, Pos + 1)
    Else
      Temp = U01
      U01 = ""
    EndIf
   
    Ele(i) = Trim(Temp)
    i+1
  Wend
 
  ; Loop through each element in the list
  For i = 0 To ArraySize(Ele()) - 1
    Element = Trim(Ele(i))
   
    ; Skip empty elements
    If Element = ""
      Continue
    EndIf
   
; Extract the numeric value if present
For j = 1 To Len(Element)
  If Asc(Mid(Element, j, 1)) >= 48 And Asc(Mid(Element, j, 1)) <= 57
    Number + Mid(Element, j, 1)
  Else
    Break
  EndIf
Next j
   
    ; Clean the element
    CleanElement = LTrim(Element, Number + ".- ")
   
    ; Assign a number if not present
    If Number = ""
      Number = Str(i + 1)
    EndIf
   
    ; Store the number
    Numbers(i) = Val(Number)
   
    ; Compare using LevenshteinDistance
    CurrentDistance = LevenshteinDistance(LCase(CleanElement), LCase(U02))
   
    ; Update the best match
    If CurrentDistance < MinDistance
      MinDistance = CurrentDistance
      BestMatchNumber = Numbers(i)
    EndIf
   
    ; Reset for the next iteration
    Number = ""
  Next i
 
  ; Single exit point
ExitPoint:
  If ErrorCode = 0
    ProcedureReturn BestMatchNumber
  Else
    ProcedureReturn -1
  EndIf
EndProcedure