An RPN Language

Started by Charles Pegge, September 14, 2007, 09:25:35 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Charles Pegge

Over the past few weeks following our discussion of Reverse Polish Notation, I have been looking at the logistics of using it in a dynamic language. After tracking through the '$' source code, which is geared to fairly conventional syntax, I decided that an RPN project would be better implemented by coding from scratch.

Very briefly, This is a notation that will accomodate both high and low-level syntax, supporting TYPEs and direct DLL calls as well as Object-Oriented and Markup expressions.

Disregarding many of the coding conventions used by contemporary  languages, this is definitely on the radical side of programming.

I have been modelling the language Using the simplest of script interpreters, (without any tokenising), and the results so far have been very promising with a minimum of complexity. The language adheres strictly to left-to-right execution and a stack for all variables, which are therefore all local.

Typically the notation uses 1/3 less words or lexical elements than BASIC.

here are some examples and the BASIC equivalent:

"Hello World!"?
PRINT "Hello World!"

"Hello World" len ?
PRINT LEN("Hello World")

long 42 : a
DIM a AS LONG: a=42

a 1 + 2 * to a

long 0 : d 100 array
DIM d(100) as long

a 5 idx to d

There is no punctuation / statement delimiters.

Only one kind of bracket is used. (), marking the start and end of a DO block and
defining the scope of any variables created within it.

A simple loop counting down from 5 to 1:
dec i if GE repeat endif


As a fuller example, here is a Find-and-Replace.
The 'ref' word allows variables to be passed onto the stack by reference.

def find_and_replace            | sub find_and_replace _
3 args                         | ( _
:s                             | s as string, _
:f                             | f as string, _
:r                             | r as string )
f long  len    : lf            | dim lf as long  :  lf=len(f)
r long  len    : lr            | dim lr as long  :  lr=len(r)
long    1      : i             | dim i as long   :  i=1
(                              | do
ref s f i pos to i             |  i=instr(i,s,f)
if null exit endif             |  if i=0 then exit do
  ref s r i i lf + insert       |  s=insert (s,r,i,i+lf)
  i lr + to i                   |  i=i+lr
  repeat                        |
)                              | loop
end                            | end sub
""? "find and replace mk3 "?   | print: print "find and replace"
string "ab cd e fg" : s        | dim s as string :  s="ab cd e fg"
?                              | print s
ref s " " ", " find_and_replace| find_and_replace(s,",",", ")
s?                             | print s

Kent Sarikaya

I never really used RPN as I didn't have an HP Calculator but Texas Instruments, I did get an HP palmtop though later, but used it for the built in DOS and ran turbo C in it.
Anyways in my looking at languages recently, I have seen languages that had this feel... Forth being one for sure.
It is efficient for sure, very clever, thanks!!

Eros Olmi


to me that syntax is quite hard to understand and to deal with even if I can see advantages. It seems hard to write also even if more simple from the parser point of view to manage. To me computers should evolve into a more human direction and not the other way round.

Meybe it is just me but I would like to see more high level languages rather than languages close to the hardware way of thinking.
Sorry about this post. I'm not against this particular one but against the idea my little brain should spend time in trying to understand something so complex for so trivial tasks.

thinBasic Script Interpreter - |
Win7Pro 64bit - 8GB Ram - Intel i7 M620 2.67GHz - NVIDIA Quadro FX1800M 1GB

Petr Schreiber

Hi Charles,

this is interesting experiment, but it is a bit unnatural for both RPN addict and people used to high level programming.

For example IF stuff. On my calc I am used to:

LBL 01  ' ~ DO
  RCL i ' put I variable on stack
  X=0?    ' test to zero, if yes, continue on next line, if no skip next line
  GTO 02  ' this is emulation of EXIT LOOP
  ... here continues code of loop
GTO 01  ' ~LOOP; to perform DO/LOOP we jump like a mad to label 01 again

LBL 02
... here continues proggie

This I would like to see as this, but no more changes :D:

  RCL i ' put I variable on stack
  X=0?    ' test to zero, if yes, continue on next line, if no skip next line
  EXIT  ' this is emulation of EXIT LOOP
  ... here continues code of loop

... here continues proggie

I think using RPN on PC could be useful as form of higher level assembly, but not yet for higher level language.
Maybe some command providing evaluation of variable using RPN, but I cannot imagine writing something more complex using RPN ( on PC, on calc I had working 3D engine rendering points ;D )

AMD Sempron 3400+ | 1GB RAM @ 533MHz | GeForce 6200 / GeForce 9500GT | 32bit Windows XP SP3

Charles Pegge

Yes, this language is certainly not what we are used to. RPN languages are not widely used because they do not address the need to interface a complex operating system, nor are they geared to handling complex data structures, despite being inherently extensible.

With this experimental language, I will attempt to explore the fusion of Forth-like concepts with Basic, C++ and XML. It's going to be an exotic hybrid. I'm keeping the source code very simple at the expense of efficiency to make it easy to remodel at this stage.

Programming is a feat of mental gymnastics, and some of the things you have to do in coding may look more natural but make the task more complex than it needs to be. Roman numerals are fine for tallying legions but no good for multiplying and dividing. Similarly the rules of syntax and operator precedence used in BASIC etc often get in the way of clear coding.

Writing code with this notation feels very strange at first, but because the syntax rules are so simple, there are less sources of error and tracing bugs is also much easier.

Bridging the gap between high level and low level operations is one of the biggest challenges. The ultimate test of competence in low-level coding will be to handle COM events!

Thanks for your comments.

Charles Pegge

The High level syntax

The high level syntax supports strings and numbers. These do not have to be 'Typed'. All numbers are handled as double-precision floating point, while all strings are dynamic.

41 : v
v 1 + ?
"hello world" : greet
greet " hello sky!" + upper ?

Strings can contain anything including objects with their properties and methods. Even the non-OOP procedural code is accessible in a string and can (if so desired) modify itself.

| example of self modifying code   |
"Hello " : greeting                | set up variable
@0 " greeting 'World' + ?" + to @0 | append code to the end of this code which is contained in @0.

Objects are expressed in a simplified Markup scheme, which imposes no practical limitations on the number of elements or the hierarchy of the structure.

| Example of an object  | containing a property and a method
"                      | start string containing the object
  <greeting> Hello </>  | set up a property
  <greet>!              | set up a method (using '!' to denote executable code)
   1 args : s           | this method takes 1 string argument from the stack
   this.greeting s + ?  | print the greeting defined in this object with the argument
  </>                   | end of method. (In this case, nothing is returned on the stack)
"                      | end of string
  : I                   | assign to variable 'I'
  "Eros" I.greet        | invoke the method

A kernel of BASIC-like functions are used to manipulate all strings including executable code, and Markup expressions.

Parameters can be passed to functions by reference or by value. Parental variables are visible to functions called from them when their names have not been redefined. This is possible because the stack also carries the variable name in parallel with the variable value. (variable names may be left undefined). This shows how variables are scoped on the stack:

| Variable scoping     |
1 : a                 | define a
2 : b                 | define b
3 : c                 | define c
4 : c                 | define a new c
(c ?)                 | result 4
pop                   | dispose of the new c
(c ?)                 | now shows the original c (result 3)
| Referencing          |
ref a : d             | d references a
d ?                   | result 1
42 to d               | whatever is done to d is done to a
a ?                   | result 42

A function may return zero or more parameters, leaving them on the stack.

| Example taking 3 arguments and |
| leaving 3 results on the stack |
def addxyz                      |
  6 args :x :y :z :a :b :c       |
  x a + to x                     |
  y b + to y                     |
  z b + to z                     |
  3 rets                         |
end                             |
1 2 3 4 5 6 addxyz              | results 5 7 9

Unlike many languages, nested functions, or functions within functions are supported. These are local to their parent functions and go out of scope when the parent function terminates. Functions, may be passed like normal parameters to other functions.

def main
   def sub
     def subsub
      "Hello World"?

Petr Schreiber

Hi Charles,

will you publish some interpeter / compiler for this soon ?

AMD Sempron 3400+ | 1GB RAM @ 533MHz | GeForce 6200 / GeForce 9500GT | 32bit Windows XP SP3

Charles Pegge

Yes Petr, the kernel which implements the high level syntax is just about ready, but I need to add a few more functions to make it usable for trying out with a complete program. Hopefully next week.

The low level syntax is more of a challenge but I see light at the end of the tunnel. Description to follow:

Charles Pegge

The Lowest:

To make it possible to code absolutely anything with R$ I am putting in machine code expressions. This will allow functions to call machine code, in the form of hexadecimal and octal opcodes directly,  much as we used to do  in pre-PC days.

For those familiar with x86 assembler, the interface is really simple with the call address in EAX, the numeric stack pointer in ESI and the string pointer to the last param in EDI. This will enable functions to be developed which are 100% machine code.

I hope that using raw opcodes for very short code sequences will be a benign programming experience and avoid the need to embed an inline assembler.

Here is how a function for returning a Square Root might be written:

machine  dd 06  d9 fa dd 1e  c3   : sqrt

|    then to use sqrt:

2 sqrt ?

| result:  1.41....

DD 06  loads the current param into the FPU
D9 FA  does the square root
DD 1E  stores the result  and pops the FPU (indexed by the ESI register)
C3  returns.

Petr Schreiber


thanks for new info.

This is definitely powerful feature, on other side it is brutal jump out of world of user friendlyness.
I would be probably very unhappy to revise code who somebody else wrote in machine language, one line syntax makes it little bit difficult to place remarks.

As an option it is perfect, but as total replacement for inline assembly it seems to be to be just for supermansTM :)
Is there any special reason you dislike inline assembly ?

Nevertheless please continue, I know my reactions seem to be quite negative in last days, but this is because I am trying to understand this new thing completely :)

AMD Sempron 3400+ | 1GB RAM @ 533MHz | GeForce 6200 / GeForce 9500GT | 32bit Windows XP SP3

Charles Pegge

I've got nothing against assembler, but its a completely different language and when I looked at the syntax in detail yesterday, I realised this would have to be treated as a whole new project. So to get things going quickly, I have reverted to my old hardware habits.

Actually for small amounts of code, which one would use as building-blocks, opcodes are really very friendly and can always be relied upon  :) . To really appreciate x86 opcodes, they need to be viewed in octal and you will see how the instructions addressing modes and registers are encoded.

I see this facility as a way of building libraries of functions, and not generally used for regular programming, but when the need arises a collection of well annotated examples, will assist in deriving new functions.

Donald Darden

Speaking of RPN Languages, there is RPL, devised by HP for some of their programmable calculators.  A few related links:

According to the discussions, RPN is a stack-oriented language, a fact that I knew, but in the HP world, portions of the stack also are recognized by names, jst as registers are.  You have X, Y, Z, S, and T if I recall correctly, each positioned one step higher on the stack.  So you can explain or discuss actions on the stack by the effects on each pseudo register.  For instance, 1 2 + could be explained this way:

    1                       1 -> X
    2                       X -> Y : 2 -> X
    +                       X + Y -> X : Z -> Y

Since we never assigned anything specific to the stack at startup, the calculator design was that each location had a zero value.  And though all the registers are automatically pushed up as new quantities are added, you would actually see this happening:

                              X -> Y -> Z  -> S -> T ->

In other words, each pseudo register appears to move up one place, to the limits allowed by the stack size.  Or actaully, the pointer reference for the stack would be deflected downwards one place as a new value is inserted at the bottom.

Any computation would normally be with regards to X alone, or be between X and Y.  However, HP allowed the ability to push and pop the contents of the stack, or to swap the contents between X and Y.  Those capabilities could be extended further.

RPL is described as a combination of RPN, FORTH, and LISP.  Writing an interpreter or compiler version would mean not having to devise a whole syntax from scratch.  But some people prefer trying to invent something even rounder than a wheel.

Charles, your idea of a persistant pre-value for C (C is now 4, but discard C and it reverts to 3) looks very unsettling from my perspective.  Imagine seeing C encoded somewhere, but not knowing how many times it has been reassigned or discarded, and then trying to contemplate its current value.  If you want persistency, you can always create an array for C(), then use the index to define which value of C is current.


Charles Pegge

3 : c  ( 4 : c )

Code contained between brackets behaves like a sort of inline function, so anything defined there has a limited scope. The C that is defined inside the brackets masks the c that was defined before the brackets because R$ always searches backwards for a variable on the stack.

On the closing bracket, the stack pointer reverts to the value it had when the bracket opened. (There is a separate stack for brackets.)

This allows chunks of code to be appended to a programme or inserted anywhere, without having to change the names of variables to avoid a name clash. You could achieve the same thing by putting the blocks of code into functions and calling them but this is more efficient and makes inventing new function names unnecessary for one-off operations.

But if you need them, This scoping mechanism will also support local functions, which are disallocated on the closing bracket.

def myfunction
  "My General Function" ?
def myfunction
   "My Local Function" ?
myfunction    | result: My Local Function
myfunction   | result: My General Function

The bracket also serves as a block control structure which serves as as DO..LOOP and SELECT CASE.
This means that things do not pile up on the stack during a loop, and persistent variables must be defined outside the block

Charles Pegge

The Proposed Low Level Syntax

To support DLL calls etc, variables have to be passed in specific formats, which the processor uses directly. The high level code implicitly uses only 2 types of variable: Double precision (8 byte) floating point and Dynamic strings, so these, usually have to be converted before making low-level calls to a function. Similarly returned values have to be turned back into an appropriate format for further high level operations.

Structures composed of multiple intrinsic types must also be supported, including the ability to pass their addresses to the function.

Distilling down the syntax to specify these structures:

intrinsic types:


n ( a number to denote a string with a fxed number of bytes)

Defining Types and variables

To define a variable of specified type:

int4 : a 
real8 : b

To define any new type in terms of previously defined types

type int4 : long
type real4 : single
type real8 : double

then build further:

type single r single g  single b single a  : vector4d

and define a variable with this structure (with optional values) :

vector4d  0.5 0.2  0.6 1.0  : myvector

Accessing Variables

accessing individual elements:

myvector.g myvector.a

obtaining the address of the variable:


Calling DLL / SO functions



"opengl32.dll"  libload :  opengl | obtains handle for this library

opengl "glColor4fv"  libfun  : glColor4fv

glColor4fv  ?  | displays the call address for this function

glColor4fv &myvector call1 | possible calling method

opengl libfree | frees the library

Kent Sarikaya

Charles this would be awesome for cell phone and hand held device development as it seems very efficient. Will the compiler allow you to work in those environments?