Token Stream Technology

From NHI1
Jump to: navigation, search

What is token stream?

Token-Stream is a technology that allows a compiler to be written for any computer language that meets the following conditions:

  • The interpreter calls the procedure (p-unknown) for each undefined function.
  • The interpreter calls the procedure (v-unknown) for each undefined variable.

This condition is met especially for script languages. The code location for the function is to be found at the point where the error is usually produced.

  • main clause: The Tcl-Compiler is nothing more than a slightly modified Tcl-Interpreter

What does the compiler do?

The script compiler consists of 3 phases

  • parsing : Conversion of the input into a compiler-internal META-Code
  • analyze : Recognize and optimize the code fragments
  • writing : Translating the META-Code back into a target language

What is "parsing" ?

Take an interpreter for the desired language (tcl, perl, python, ...) and eliminate all commands, functions and definitions:

  • At this point in time only the two error handlers exist, p-unknown and v-unknown.

Then read the source code into the interpreter. It produces a lot of errors, but all of them are covered by the two error handling procedures above.

  • main clause : A program is compiled by removing all errors

This very simple logic applied to computer languages ​​leads to the Token-Stream-Compiler.

Example: compile Tcl-Code

The following Tcl-Code:

while {expr} {code}; ## TCL

creates a call to the p-unknown function in the Tcl-Compiler:

p-unknown while {expr} {code}

and p-unknown will create the META code:

while §Link-Expr§ §Link-Code§

What is §Link-Expr§ and §Link-Code§ ?

§Link-Expr§ and §Link-Code§ are the generated tokens of the code fragments for expr and code.

  • The compiler analyzes the code fragments and writes the result into an internal data structure.

E.g. the code of §Link-Code§ is analyzed again with the 'Tcl-Compiler.

How does the compiler know that a while is followed by an expression and then code ?

The Tcl-Compiler doesn't know at first because the Tcl-Interp is empty and the standard handling for something unknown is:

  • Don't do anything, write back the tokens as strings.

This means that all comments are deleted because the Tcl-Interp knows the Tcl-Syntax and its comments.

  • Comments are not forwarded to the p-unknown function and are deleted by the Tcl-Interp

This admittedly very simple treatment also leads to a good target code that is "simpler" than before.

  • The aim is to train the Tcl-Compiler with so-called prototypes

And how is the Tcl compiler trained ?

First, the existing standard commands (e.g. while) are implemented as prototypes.

  • The logic is not of interest, only the nature of the arguments.
  • This is similar to the definition of procedures in C header files.

The more prototypes there are, the better the Tcl-Compiler can analyze the source code.

What do I achieve with it ?

All commands defined by prototypes can be tested for correct parameter transmission / type without implementing the local logic.

  • The Tcl-Compiler recognizes many errors that are otherwise only noticed at runtime

After processing the source code with the Tcl-Compiler, the result code is much more compact.

  • This affects the execution time (less memory => less malloc) and the bandwidth (WWW).

Finally, there is the option of generating a completely different target code.

  • For example, the C-Back-End creates C code from Tcl