NEW TEXT-Commands for the SPR.

Started by Theo Gottwald, October 18, 2020, 09:00:25 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Sometimes its all about processing LOG-Files. Or other Text-Files. Maybe large files with a Million lines. Maybe searching something or maybe comparing many files.

How it was before:
Originally you would simply have used the LFF. (Line from File) Command
LFF.(Linenumber|(Variable for Result)
and just get  "Line by Line" from any Textfile.

However internally the LFF. - Command will read the Textfile until it comes to the Line you have specified and then get this line for you.
Means if you want to read Line 500 the LFF. would read All Lines up to 500 and then give the Line 500 to you.
Going through a file like this works fine until the file-size exceeds some 100 Lines.
Then it will just get too slow. Because, before you reach Line 1000 internally the Robot has read nearly a Million Lines from the harddisk (Buffers).
Thats because the LFF.-Command  was designed for getting Single Lines, not going through a file.

Also there is the FEL.-Command. FOR EACH LINE. This was designed for "Going through a file".
It will hand you out "Line by Line" from any file.

However if you want to skip between Lines in the Textfile, you can not use it and you are back to LFF.

SO here we had a gap and this is now filled with the new TXT./LOF./LFT.commands.

First there are LOF. (Load Text File)  and LFT. (Line from Text-File).
These two commands serve for that purpose. They work "in memory".
The textfile is "loaded into Memory" and LFT. will give you any line from there.
So no harddisk involved. Maximum speed. No measurable difference wether you get Line 10 or Line 100000.

Generally here you will Load the Textfile with LOF. directly into Memory.
And then with LFT. you will get each Line "by number" directly "from memory" so this is maximum speed.
Processing Text Files in this way, the Script can handle several thousand Lines per second.
I have tested that with an Python Identation checker that was done with the SPR.
It processed 6000 Codelines in 4 seconds, correcting any Identation errors.

This command can store up to 1287 different files, which you reference with a number (from 1 to 128)
so you can compare the content of these files with Line-based Algorhytms.

This command can be used together with the LOF. - command to process Text files faster.
It has two Working-Modes:

1. Get Line from File
LFT. with a number from 1 to (Number of Lines), will return the Line with that number in $$RET:
LFT.$$IND|$$NUM|$RET

2. Get number of Lines in File
LFT. with a number -1, will return the number of stored Lines from that file with that Index-Number.
LFT.$$IND|-1|$$RET

If you call LFT. with a Number larger then the number of Lines in that File, the returned Variable will be empty.

Currently you can load up to 128 Textfiles into the Buffer at the same time, each having its own Index-Number from 1 to 128. This is very good if you need to work on multiple Text-Files at the same time, but you do not want to access the harddrive so often. Using LOF. the file is buffered and all further Operations using LFT. are done in memory.

Usage is simple:

VAR.$$FIL=?path\Test.txt
' Read the File into Memory
LOF.$$IND|$$FIL
'The Index Number for this file is now in $$IND
LFT.$$IND|-1|$$RET
'Using LFT. with " -1" as Linenumber, we get the Number of
'Lines in return (here in $$RET).
' Now we enumerate through the File without accessing
' the harddrive, at maximum Speed
FOR.$$LOP|1|$$RET
  LFT.$$IND|$$LOP|$$LIN
  PRT.$$LIN
NEX.
MBX.Ready
ENR.


But thats just the beginning of this topic.
Then we have the new TXT.-Commands. These Commands are specially designed for fast analyzing, changing and doing anything with a Textfile.
Just in case if there is more to do then just loading a file "Line by line".

These commands are the easy way to work through Textfiles. What you can do:


  • Load and Save Textfiles
  • Process Textfiles from an internal Cache without need to access the Filesystem
  • Store Textfile in Variable (with or without Filename)
  • Restore Textfile from Variable (with or without Filename)
  • Get the Linenumber of a Byte-Position in a textfile
  • Get the complete Line of specified a Linenumber
  • Get all Lines from specific  line to the start or end of text
  • Get the start or end of a Line from a specified Byte-Position
  • Change specfied Lines (replace with a specified line)
  • Get Textfile from the LOF./LFT.-Command or
  • Move a textfile to the LOF./LFT.-Command
  • several Replace Operations
  • and more.

Here is a Quick Overview on the Commands.
Unlike the LOF.-Command, the TXT.-Commands do only work with just ONE Textfile at the time.
Therefore you do not need to specify an "Index" Value like with LFT.
If you want to switch to another file this can easily be done in several ways.
You can just transfer the file into a variable, or get another file from a variable.
Or you can exchange the files with LOF. and this way access all files that are stored in the LOF.-Buffers.

  "ltf","load_textfile"
Load a Textfile int othe internal Buffer for Processing
  ' $$FIL - Filename
  ' $$RES (optional) Returns Number of loaded lines
  TXT.Load_Textfile|$$FIL[|$$RES]

  "clr","clear"
Clear the internal Buffer
  TXT.Clear
 
  "gfn","get_filename"
' Get Filename of File in Buffer
  TXT.gfn|$$FIL

  "gtl","get_lenght"
' Get lenght of loaded Text in Bytes
  TXT.gtl|$$LAC

  "tov","to_var" 
' Move Textfile including  Filename to Variable
  TXT.To_Var|$$VAR

  "toc","to_var_and_clear"
' Move Textfile including  Filename to Variable and clear internal Buffer
  TXT.To_Var|$$VAR

  "frv","from_var"
' Get Textfile including Filename from Variable
  TXT.From_Var|$$VAR

  "lpt","load_pure-Text"
' Load Pure-Textfile and Filename from 2 Variables
  TXT.lpt|$$TXT[|$$FIL]
 
  "gpt","get_pure_text"
' "Get Pure Text". Only Textfile into $$TXT. If available, $$FIL returns the Filename
  TXT.gpt|$$TXT[|$$FIL]]

  "glc","get_line_count"
' "Get Line Count". Get number of loaded lines.
  TXT.glc|$$CNT

"gel","get_line"
' Get Line Number $$NUM into $$LIN
' TXT.gel|$$NUM|$$LIN

  "gll","get_line_lefttrim"
' Return Line $$LIN, like "gel" but left side trimmed (all ASC 0-32 removed)
  TXT.gll|$$LIN|$$RES

  "glt","get_line_trimmed"
' Return Line $$LIN, like "gel" but both sides trimmed (all ASC 0-32 removed)
  TXT.glt|$$LIN|$$RES

"slt","set_line_to"
' Overwrite Line $$NUM with new Line $$NEW (Replace Line)
  TXT.slt|$$NUM|$$NEW

  "glp","get_line_position"
' Get position of first and last character of line $$NUM into $$POS and last into $$POE
  TXT.glp|$$NUM|$$POS|$$POE

  "lnp","linenumber_from_position"
' Get linenumber from Byte-Position $$POS into $$LIN
  TXT.lnp|$$POS|$$LIN

  "lsp","line_Start_end_by_position"
' Get start/end of Line by Byte-Position in Text
  TXT.lsp|$$LIN|$$STR|$$END

  "flw","find_line_with"
' Find Line with String. Search first occurence of $$SEA in Textfile, return the Line-Number which has it in $$LIN.
' Falls angegeben suche ab (inkl.) Zeile $$STA
  TXT.flw|$$SEA|$$LIN[|$$STA]

  "fit","Find_Text"
' $$SEA - Searchstring
' $$FRO - Search from (B)yte or (L)ine
' Returns:
' |$$POT - Postion in Text
' |$$POL - Postion in Line
' |$$LIN - Complete Line
' |$$LNN - Linenumber
  TXT.fit|$$SEA|$$FRO[|$$POT][|$$LIN][|$$LNN]


  "gbl","get_before_line"
' "Get Before Line". Get all Text that is before and without the specified line into $$BLT.
  TXT.gbl|$$LIN|$$BLT

  "gal","get_after_line"
' "Get After Line". Get all Text that is After and without the specified line into $$ALT.
  TXT.gal|$$LIN|$$ALT

  "gts","get_to_start"
' Get all Text from and including the specified line until the start of the text
  TXT.gts|$$LIN|$$TXT
 
  "gte","get_to_end"
' Get all Text from and including the specified line until the end of the text
  TXT.gte|$$LIN|$$TXT

  "sav","save_file"
' Save Text under the Filename with which it was loaded from, or if specified with the specified Filename
' Leaves a result on the TOS, NTFSD-Enabled
  TXT.Save[|$$FIL]

  "gfl","get_from_lof"
' Get from LOF # will load Textfile from LOF. Index Number $$IND
  TXT.get_from_lof|$$IND

  "tol","to_lof"
' To lof cache # will put Textfile into LOF. Cache Index Number $$IND
  TXT.To_Lof|$$IND

  "rit","replace_in_text"
' "replace_in_text". Do a replace over the complete text using a Equalcase-Replace Algo.
' TXT.rit|$$OLD|$$NEW

  "ril","replace_in_line"
' "replace_in_line". Do a replace only in the specified Line using a Equalcase-Replace Algo.
  TXT.ril|$$LIN|$$OLD|$$NEW[|$$STA]

"rep","Replace"
Search a Text in the TXT.-internal Buffer and replace it with a given other sequence. Will replace all occurrences.

"tra","Translate_Chars"
Using TXT.tra there is a 1:1 relation between the characters in $$OLD and the characters in $$NEW.
So if a character in $$OLD is found its replaced with the corresponding Character in $$NEW
Therefore both - $$OLD and $$NEW - MUST have the same lenght, esle there will be no changes and you will get the Timeout Flag set.
  TXT.tra|$$OLD|$$NEW[|$$STA][|$$END]


  "luf","load_unicode_file"
' Load and convert Unicode-File to ANSI
  TXT.luf|$$FIL

  "sau","save_as_unicode"
' Convert ANSI-File to Unicode and save (using default codepage)
  TXT.sau[|$$FIL]

  "ctu","convert_to_unicode"
' Convert ANSI-String to Unicod with default Codepage
  TXT.cta|$$ANS|$$UNI

  "cta","convert_to_ansi"
' Convert Unicod with default Codepage to ANSI-String
  TXT.cta|$$UNI|$$ANS