Navigate

Overview

Vocabulary

Getting Started

Technical

Extra

MikeOS Forth Handbook

For kernel version 4.4, 20 March 2014, or newer - (C) MikeOS Developers

Forth version 1.5.3

If you have any questions, see the MikeOS website for contact details and mailing list information.

Click the links on the left to navigate around this guide.



Introduction

In the early 1960s a computer programmer named Charles Moore developed what he considered to be a 4th generation programming language (4GL), Forth. Forth can be considered to be an application, compiler, language or operating system depending upon how the program is being used at any particular time.

A Forth system consists of a dictionary of words that can be compiled or executed depending on the machine's current operating state. The dictionary is further divided into vocabularies. The same word may be present in more than one vocabulary and perform differently in each one. The machine's current context determines which definition is used. The primary vocabulary where most words are linked is FORTH.

Forth is modeled as a two stack virtual machine: one stack for passing parameters and one to follow the execution of a program (list of pointers to currently executing 'words'). It is tempting to think of each word as a computer subroutine and a program as a collection of subroutines, however, Forth is defined as a direct threaded language. Languages that are subroutine threaded may be considered to be Forth-like, but not true Forth.

Over the years several standards have been developed that describe the meaning of the words that comprise the kernel of a Forth system. This document will rely heavily on three of these standards: Forth-79, Forth Interest Group (FIG) and Forth-83. In 1994 a national standards group developed an all-encompassing standards document; it will be referenced occasionally. The standards are very similar, but there are significant differences, in spots, among them. The Forth-79 standard will be used preferentially to resolve most conflicts.

Forth has the reputation of being 'write-once' code, i.e. it is not possible to understand, change and debug an existing program. This is the original programmer's problem and not inherent in the language. If the code is laid out well and documented using the built-in tools, it will be no harder to change than any other well-written code.

This document is a brief description of a Forth system that has been developed to work with MikeOS and is not meant to 'stand alone.' Much information concerning Forth is available on line. Two books that are very helpful are Starting Forth by Leo Brodie (a tutorial that takes one from basic concepts to advanced use of the language) and FORTH: A Text and Reference by Mahlon Kelly and Nicholas Spies (a text book in a tutorial style with an exceptional glossary of most common Forth words).



Features

  • Based on a 16-bit virtual architecture.
  • Two stacks: parameter stack, S, and return stack, R.
  • Data is divided into bytes (8 bits), cells (16 bits) and double words (32 bits).
  • This implementation is limited to 16-bit CPU operations and one 64k Intel segment.
  • Self-contained: a program is built by extending the Forth system with new words.
  • Fixed point math. (Words that manipulate floating point numbers may be defined, but this kernel only supports fixed point.)

This version of Forth is set up to use 7-bit ASCII characters for input and output.

Although strings are input as null terminated (ASCII-Z), strings are stored in the dictionary as 'counted:' one byte count followed by the (counted) number of characters. The maximum size counted string is 255 characters. We further limit this to 32 characters for a name to keep the word list from becoming unwieldy.

Logic values: false is numerically equal to zero. Any non-zero value tests as true. Logic functions return 1 for true in Forth-79.

Forth uses post-fix (sometimes called 'reverse Polish') notation, e.g. 2 3 * = 6. Note that the operation comes after the variables, not before or between.

The primary vocabulary is called FORTH. This is usually where a program is developed, however other vocabularies, including custom ones, are possible. Other common vocabularies are ASSEMBLER and EDITOR. Because the host OS normally handles these functions they are not included in this implementation.

[Custom] Forth usually divides a disk into logical blocks of 1k bytes; except at the driver level, the hardware disk structure is ignored. This implementation uses the host built-in file functions to access information from the host file system rather than read/write raw disk data.

[Custom] For faster searching during compilation the dictionary is split into 16 hash chains. The current vocabulary modifies the hash function so that only words in the vocabulary(ies) of interest are found. Typically, a dictionary search starts in the current vocabulary and continues in the FORTH vocabulary if the word is not found in the current one.



Dictionary

Traditionally, Forth words are documented by showing the before and after parameter stack condition inside a comment. In-line comments start with '(' and end with ')'; before and after are separated by two dashes. E.g. addition could be documented as '( n1 n2 -- n3 )'. 'n' represents a signed 16-bit value, 'd' a 32-bit value, 'c' an 8-bit character or byte value, 'u' unsigned value, 'a' a 16-bit address and 'f' a logic flag. Additional information may be placed after the final stack value to describe the operation in more detail. The format of the additional information is highly dependent on the actual programmer. Consult the recommended references and the included source code for details of the behavior of individual words.

Note: the author's grouping of words is somewhat different than either the standards or the text books mentioned above.



Forth Vocabulary

Math functions

*

Forth-79

'times'

*/

Forth-79

'times divide'

*/MOD

Forth-79

(star slant) 'times divide mod'

+

Forth-79

plus

+!

Forth-79

'plus store'

-

Forth-79

minus

/

Forth-79

'divide'

/MOD

Forth-79

(slant) 'divide mod'

1+

Forth-79


1+!

79 uncontrolled

Inc memory contents

1-

Forth-79


1-!

79 uncontrolled

Dec memory contents

2*

79 reserved

'two times'

2**

Custom

2^n

2+

Forth-79


2-

Forth-79


2/

Forth-83


4*

Custom


4/

Custom


@+

Custom

( n1 a -- n )

@-

Custom

( n1 a -- n )

ABS

Forth-79

(absolute)

AND

Forth-79


C+!

Custom

( c a -- )

COM

79 uncontrolled

One's complement

D+

Forth-79


D+!

Custom

( d a -- )

D-

79 double


DABS

79 double

'd-abs'

DMAX

79 double


DMIN

79 double


DNEGAGE

Forth-79

Two's complement

LSHIFT

Forth-94

( n c -- n<<c )

M*

Forth-94

( n1 n2 -- d )

M*/

94 double

( d1 n1 n2 -- d )

M+

94 double

( d1 n -- d )

M/

Custom

( d n1 -- n )

MAX

Forth-79


MIN

Forth-79


MOD

Forth-79


NEGATE

Forth-79

Two's complement

OR

Forth-79


RSHIFT

Forth-94

( n c -- n>>c )

S>D

Forth-94

's to d' ( n -- d ) sign extend

SQRT

Custom

( d -- n )

T*

Custom

( d n -- t )

T/

Custom

( t n -- d )

U*

Forth-79


U/MOD

Forth-79


XOR

Forth-79


Use U* and U/MOD in place of Forth-83 UM* and UM/MOD.

Data and stack manipulation functions

!

Forth-79

'store'

+@

Custom

( a1 a2 -- n ) 'a' may be offset

-LEADING

Custom

( addr cnt -- addr' cnt' )

-ROT

Custom

( n1 n2 n3 -- n3 n1 n2 )

-TEXT

79 uncontrolled

( a1 n a2 -- f,t=different )

-TRAILING

Forth-79


2!

79 double

'two store'

2@

79 double

'two fetch'

2>R

Forth-94

'two to r'

2DROP

79 double

'two drop'

2DUP

79 double

'two dup' (duplicate)

2OVER

79 double

'two over'

2R>

Forth-94

'two r from'

2SWAP

79 double

'two swap'

3DUP

Custom

( n1 n2 n3 -- n1 n2 n3 n1 n2 n3 )

<CMOVE

79 reserved

(backwards) 'reverse c-move'

><

79 uncontrolled

'interchange bytes'

>R

Forth-79

'to r'

?DUP

Forth-79

'question dup'

@

Forth-79

'fetch'

BLANK

79 reserved


C!

Forth-79

(byte) 'c-store'

C@

Forth-79

'c-fetch'

CMOVE

Forth-79

'c-move'

COUNT

Forth-79


DEPTH

Forth-79

Parameter stack depth

DROP

Forth-79


DUP

Forth-79

(duplicate)

ERASE

Forth-79


FILL

Forth-79


L!

Custom

( n seg off -- ) long, intersegment

L@

Custom

( seg off -- n )

LC!

Custom

( c seg off -- )

LC@

Custom

( seg off -- c >> zero extended byte )

LOWER>UPPER

Custom

( c -- c' )

OVER

Forth-79


PICK

Forth-79


R>

Forth-79

'r from'

ROLL

Forth-79


ROT

Forth-79

(rotate)

S0

79 uncontrolled

Report TOS

SEGMOVE

Custom

( fs fa ts ta #byte -- )

SP@

79 reserved


SWAP

Forth-79


XFER

Custom

( a1 a2 -- >> transfers contents of 1 to 2 )

2ROT, MOVE (use CMOVE) and R@ (use I) are not included in this implementation.

Keyboard input and CRT output

#

Forth-79

'sharp'

#>

Forth-79

End number conversion

#S

Forth-79

Convert numbers

#TIB

Forth-83

System variable => characters left in current input stream

$

Custom

Temporary (next number input only) base 16

%

Custom

Temporary base 2

(D.)

Custom

Format a signed double

.

Forth-79

'dot'

.BASE

Custom

'dot base' = BASE @ .

.R

79 reserved

Right-justified number

.S

Forth-94

Show parameter stack

0

Custom

System constant for speed and size

0.

Custom

System constant

1

Custom

System constant

1.

Custom

System constant

2

Custom

System constant

<#

Forth-79

Begin number conversion

>IN

Forth-79

System variable => offset into input stream

?

Forth-79

'question' = @ .

BASE

Forth-79

System variable

BASE!

Custom

'base store'

BINARY

Custom


BELL

79 uncontrolled


BL

79 reserved

System constant = space = 32

CONVERT

Forth-79


CR

Forth-79


D.

79 double

'd-dot'

D.R

79 double

'd dot r'

DECIMAL

Forth-79


EMIT

Forth-79


EXPECT

Forth-79


HEX

79 reserved


HLD

Custom

System variable; address for HOLD

HOLD

Forth-79


KEY

Forth-79


NUMBER

79 reserved

Counted string to double

OCTAL

79 reserved


OK

Custom

Say 'ok'

Q

Custom

Temporary base 8

SIGN

Forth-79


SPACE

Forth-79


SPACES

Forth-79


SPAN

Forth-83

System variable

TIB

Forth-83

Returns address of input buffer (text or disk)

TYPE

Forth-79


U.

Forth-79

(unsigned) 'u dot'

U.R

79 reserved

'u dot r'

Conditional structures

IF ... ELSE ... THEN

C@SWITCH ... ENDSWITCH

SWITCH ... ENDSWITCH

0<

Forth-79

'zero less than'

0=

Forth-79

'zero equal'

0>

Forth-79

'zero greater than'

<

Forth-79


=

Forth-79

'equal'

>

Forth-79


?CELL

Custom

( n -- n f,t=word )

?PRINTABLE

Custom

( c -- f,t=printable )

D0=

79 double


D<

Forth-79


D=

79 double


DU<

79 double

'd u less than'

FALSE

Forth-94

( -- 0 )

FALSE!

Custom

( a -- >> stores 0 in address )

NOT

Forth-79

Alias for 0=

STAY

Custom

( f -- >> exit if false )

TRUE

Forth-94

( -- t )

U<

Forth-79


WITHIN

Forth-94

( n n2 n3 -- f >> true if n2 <= n < n3 )

{

Custom

Start option compile

}

Custom

End option compile

NOT may also be used for 0= .

Loop structures

BEGIN ... AGAIN [Custom] Only an abort terminates the loop.

BEGIN ... UNTIL

BEGIN ... WHILE ... REPEAT

DO ... LOOP

DO ... +LOOP

DO ... /LOOP [Custom] Unsigned limit test.

2LEAVE-EXIT

Custom

Leave 2 loops and exit word

I

Forth-79

'eye'

I'

79 reserved

'I-prime'

J

Forth-79

'jay'

J'

Custom

'j-prime'

K

79 reserved

'kay'

LEAVE

Forth-79


LEAVE-EXIT

Custom

Leave loop and exit word

Defining (dictionary) words

'

Modified 79

'tick'; returns CFA, state smart

(

Forth-79

Start comment

)

Not actual word

End comment

,

Forth-79

'comma'

."

Modified 79

'dot quote', state smart

2CONSTANT

79 double


2VARIABLE

79 double


:

Forth-79


;

Forth-79


;CODE

79 assembler


;code

Forth-94

Run-time header for development

>BODY

Forth-83

CFA → PFA

ABORT"

Forth-83

'abort quote', state smart

ALLOT

Forth-79


ARRAY

Custom

Array of bytes

ASSEMBLER

79 assembler

vocabulary

C,

79 reserved

'c comma' (compile)

CFA

FIG

PFA → CFA

CVARIABLE

Custom

Byte variable

COMPILE

Forth-79


CONSTANT

Forth-79


CONTEXT

Forth-79

System [double] variable

CREATE

Forth-79


CURRENT

Forth-79

System [double] variable

DCLIT

Custom

( c1 c2 -- )

DEFINITIONS

Forth-79


DOES>

Forth-79

'does'

EDITOR

79 reserved

vocabulary

EMPTY

Custom

Go back to last protected dictionary

FENCE

Custom

System variable

FORGET

Forth-79


FORTH

Forth-79


H

Custom

System variable contains 'here'

H-LIST

Custom

Print dictionary hash chain

HEADS

Custom

System array of hash pointers

HERE

Forth-79


ID.

Custom

( lfa -- >> prints name of word at link addr )

IMMEDIATE

Forth-79


L>CFA

Custom

LFA → CFA

L>NFA

Custom

LFA → NFA

L>PFA

Custom

LFA → PFA

LAST

79 uncontrolled

System variable = last word created

LITERAL

Forth-79


PAD

Forth-79


PROTECT

Custom


SMUDGE

FIG


STATE

Forth-79

System variable

UNSMUDGE

FIG


VARIABLE

Forth-79


VLIST

FIG

(vocabulary) 'v list'

VOCABULARY

Forth-79


[

Forth-79

'left bracket' stop compiling

]

Forth-79

'right bracket' restart compiling

Since ' and ." are state smart, ['] and .( are not needed.

VLIST replaces Forth-83 WORDS.

Execution control words

'ABORT

Custom

Vectored abort address

@EXECUTE

Custom

For vectored execute

ABORT

Forth-79


EXECUTE

Forth-79


EXIT

Forth-79


INTERPRET

79 uncontrolled


QUERY

Forth-79


QUIT

Forth-79


WORD

Forth-79


Miscellaneous

!CURSOR

Custom

(set) 'store cursor'

.AZ

Custom

Print a null terminated string (ASCII-Z)

/0

Custom

Divide 0 interrupt

?MEM

Custom

Amount of memory left for new dictionary entries

@CURSOR

Custom

(get) 'fetch cursor'

ASCII

79 uncontrolled

Numerical value of next word; state smart

CLS

Custom

Clear screen

FNAME

Custom

System byte array: the file name used for file access

FORTHSEG

Custom

( -- seg ), Intel segment system currently resides in

FIRSTSEG

Custom

( -- seg ), first available full segment

ROWS

Custom

Rows available in display

SYSTEM

Custom

Return to host system

VERSION

Custom

Print version string

\

Custom

Comment to end of line



Getting started

Forth is a very 'lean' system. There is no terminal prompt; only the default blinking cursor. When starting the system the version string will be displayed, followed by the command completed satisfactorily message (ok).

Start the system and press the <Enter> key a couple of times. Each time the system should say 'ok' and the cursor move to the next line for more input. To return to the host system at any time type SYSTEM<Enter>.

The system can do much in its interpretive mode. Try (pay particular attention to the spaces between each 'word':

5 2 + .<Enter>

Remember, Forth uses post-fix notation. The system should have responded with '7 ok'. The 'dot' tells Forth to print the top number on the parameter stack.

Now, let's create the traditional first program; type:

: HW ." Hello World!" ;<Enter>

Execute the program (word) by typing HW<Enter>. ':' creates a new dictionary entry. 'HW' is the name we gave this word that was compiled into the Forth vocabulary; you could have used any other legal name. ' ." ' compiles a literal string that will be displayed when the word is executed. Finally, ';' completes the dictionary entry and makes it findable.

Here's another short 'program' to try.

: LP 5 0 DO I . LOOP ; <Enter>

Execute the word by typing LP<Enter>. Did the system respond with '0 1 2 3 4 ok'?

Alternately you could use : LP [ 5 0 ] DCLIT DO I . LOOP ; <Enter>

Forth uses a 2 stack virtual machine model. 'I' retrieves the working loop counter of the outermost loop and places it on the parameter stack and 'dot' prints out a signed number. Loop counters and limits are stored on the return stack so that the parameter stack can still be easily accessed inside of loops.

DCLIT requires that the 2 literals (constants) each fit into a signed byte. The '[' stops compilation and allows the numbers to be placed on the parameter stack. The ']' resumes compilation and the definition completes as before. This construct is a little quicker and saves a little space in the dictionary.

Experiment and enjoy.



Technical

Register usage (generally):

  • ax general purpose arithmetic
  • bx general purpose and BIOS access
  • cx counter and general purpose
  • dx general purpose arithmetic
  • bp return stack (R)
  • sp parameter stack (S)
  • si execution pointer
  • di (temporary) pfa pointer when processing a colon definition

The system follows the Forth-79 standard as much as possible. Although it is permissible to specify the system as, "FORTH-79 Standard Subset," the author has chosen not to do so.

Bytes have a numeric range of -128 to +127. Cells (16-bit words), -32768 to +32767. And double words (32 bits), -2,147,483,648 to +2,147,483,647.

Fixed point math can be remarkably precise. E.g. 355 113 */ is excellent approximation of multiplying by π.

Forth words may perform 2 different operations: one during word compilation and one during execution. In the "Getting Started" section it was seen how ' ." ' compiled a literal string into the dictionary during compilation and then printed that string during execution.

This system uses the following dictionary entry format:

LFA = Link Field Address: a pointer to the previous definition in this hash list

NFA = Name Field Address: a counted string that represents the name and flags. The maximum number of characters in a name is 31 (bits 4 to 0). Bit 5 is used to 'smudge' an entry so that later definitions can replace early ones. Bit 6 is reserved. And bit 7 indicates immediate execution rather than compilation. The name field may be 2 to 32 bytes long.

CFA = Code Field Address: pointer to actual code to execute. For assembler (code) definitions this is usually the beginning of the next cell. (The mathematical operators at the start of the dictionary are typical.) For colon definitions this points to the colon run-time code.

PFA = Parameter Field Address: the parameters needed by this definition. For code definitions this would be machine code. For colon definitions this would be a list of CFAs that make up this word's definition (description). For constants and variables this would be actual data.

When using an external compiler the link and name fields may be omitted (entry is considered headerless) to conserve space. Words using these headerless entries will still execute, but the name will not be found for use in future definitions. Alternatively, to save space, the headers may be placed in a separate data segment during compilation: only the final word needs to be able to be found to execute an intricate program.

In the original specification of Forth higher level definitions consisted of either code or colon types. There are Forth-like systems that provide for inline code in colon definitions. This construct has rarely been advantageous in a true Forth system; the increased size and complexity out weighs any speed gains.

CONTEXT and CURRENT are double variables (32-bits) that contain lists of VOCABULARIES. A vocabulary is designated by a nibble, 1-15, with null being 'none.' A 32-bit variable has 8 nibbles and thus may designate up to 8 vocabularies. The dictionary hash function uses the vocabulary designator and the 7-bit ASCII value of the first letter of the word to reduce the search to only one of the 16 chains. The least significant nibble of CURRENT specifies where new words are to be compiled. CONTEXT specifies which vocabularies and in what order they are to be searched for words making up the current word being compiled.

This implementation follows the extended FIG model for preventing inadvertent tampering with the kernel. There are two arrays that describe the dictionary. One, GOLDEN, contains the variables that map the protected portion of the dictionary. The second array, HEADS, contains the variables that map the working dictionary. The dictionary may be returned to its golden state by using the word EMPTY. Alternately, the FENCE can be moved to the current working position with the word PROTECT.

This implementation was written to take advantage of many of the input and output functions available in an IBM compatible BIOS. All keyboard entry and screen output, specifically, goes through the BIOS. The main OS (PC DOS or MikeOS) is used to gain access to operating files on the disk.

Two of the most basic and important operations in the kernel are separating a word from the input stream and finding a word in the dictionary. Many other operations cannot be completed unless these two function properly. To separate a word the system needs a delimiter. The most common delimiter is space (or BL). Generally, with the input stream pointer set, 'BL WORD' will separate the next word from the stream and transfer the string to HERE + 2. This sets it up to compile a new word -- link field goes at HERE -- or place a literal string in the dictionary -- CFA of defining word goes HERE. Although there are standards, many programmers (as does the author) prefer non-standard stack conditions for dictionary searches. In this implementation 'FIND' is left headerless to prevent confusion.

Most macro processors would have difficulty building the dictionary hash lists during the assembly of the code. This system uses a slightly different approach: the NASM macro processor builds one long chain during the assembly, then the Forth start-up code splits the single chain into the desired hash lists. This can take a significant amount of time on an 8-bit, 1 MHz microprocessor, but is not noticeable on modern processors. All words in the initial code must be in the FORTH vocabulary. The start up penalty can be saved by saving the after start-up modified code as an appropriate '.bin' or '.com' executable. The Forth word 'write_exec <file-name>' will do this for the user. Note that 'write_exec' is one of the few Forth words in lower case; this helps prevent inadvertent writes to the disk.

To do any serious work with a language it should be possible to develop source as a text file and load it into the system. Preferably, a way to store the updated information would also be available. The 'write_exec' word of this implementation provides a unique way to do the latter; once the source is loaded and compiled by the Forth kernel a new executable can be written to disk, which includes the newly compiled code. To 'seal' the code it is only necessary to tell the start-up code to go to a word that will not exit nor abort. The former desire is met by the word 'INCLUDE <file name>'. As an example GEN.4TH is included in this package. At the Forth blinking cursor type INCLUDE GEN.4TH <enter>. GEN.4TH looks like a normal text file and may be opened with any text processor. The 'write_exec' word is contained within this file so that the new executable is generated, as well. [INCLUDE cannot currently use nested disk access, i.e. the first INCLUDEd file cannot have an INCLUDE in its script.]



Extra

Help

If you have any questions about MikeOS, or you're developing a similar OS and want to share code and ideas, go to the MikeOS website and join the mailing list as described.


License

MikeOS is open source and released under a BSD-like license (see doc/LICENSE.TXT in the MikeOS .zip file). Essentially, it means you can do anything you like with the code, including basing your own project on it, providing you retain the license file and give credit to the MikeOS developers for their work.