The TRF House of Software

symboltable blurb

The idea here is to create a set of routines for managing a symbol table for a compiler. A C compiler, for example, might be implemented with several such symbol tables, one for each name space (a structure and a variable can both have the same name, for example).

Symbols in most programming language have "scope rules", which define when the symbol is available to the programmer and when it is not. Typically, "nested scoping" is used, in which new symbols are defined in nested syntactic constructs. When the code ends a particular nesting, those symbols go away, and any symbols of the same name in the outer containing construct become visible again.

These routines implement such a scheme, by creating a separate symbol table for each nesting level, and maintaining these symbol tables on a stack, with the symbol table for the current level at the top. When a nesting level is exited, it is then only necessary to remove the symbol table from the top of the stack.

When writing routines of this type, the programmer is faced with a number of decisions such as "how many levels deep can the stack become?"; "how many symbols can there be in a symbol table?", and so forth. It can be very frustrating, as a user, to hit one of these limits, particularly in a situation where, without sources to the compiler, you're forced into some potentially major reworking of your code.

These routines, therefore, answer all such questions with "no limit". only when memory allocation fails will these routines fail to carry out their functions.

In order to provide maximum flexibility, these routines do not determine what information is kept in a symbol table entry, beyond the symbol name itself. The remainder of the information is provided by a per-symbol user-defined structure which these routines merely carry around on behalf of the user.

Each nesting level's symbol table is kept as a binary tree, keyed on the symbol name, for rapid entry and retrieval. Symbol deletion is rather more difficult, as a consequence, but most programming languages do not even require a deletion facility, and I suspect it's rather infrequently done even in languages that do (I know of no examples).

These routines maintain a stack of symbol tables containing the symbol name and a pointer to a user-supplied value; this value will generally be a pointer to the user's own per-symbol data structure.