A nested-procedure programming language to Java translator using object-oriented design patterns.
Abstract
Whereas some programming languages allow procedures (functions, methods)
to be declared and nested within procedures, the Java programming language
does not. This causes complications when constructing a tool to
automatically translate source code written in such languages to Java.
This demonstration shows how a language translator/compiler can
be created using object-oriented design patterns and how the
inner classes feature of the Java
programming language can be used to translate nested procedures in other
languages.
Acknowledgements
This project was initiated at the instruction of
Iwona Wojciechowska
as an assignment for the "Compiler Construction" course at
West Virginia University.
The "Mini" programming language grammar was provided as
part of the course materials.
The demonstration was built using the
CUP Parser Generator for Java
and
JLex: A Lexical Analyzer Generator for Java.
The software is available as licensed under the terms of the
Mozilla Public License.
The distribution file, mini.zip, includes
the Java source code, compiled bytecode, and documentation.
Any further updates to the source code will be released via the
CroftSoft Code Library.
To save the translated Java source code to a file, add an
additional output filename argument to the command:
The
JLex tool
converts
MiniScanner.jlex
to
MiniScanner.java.
This is used to scan the original Mini source code for tokens.
The
CUP tool
converts
MiniParser.jcup
to
MiniParser.java.
This is used to parse the tokens retrieved from MiniScanner
into an abstract syntax tree of
MiniNode objects.
See the
MiniNode package documentation for a listing of the abstract syntax
grammar objects.
Note that in the
"Mini" programming language syntax, a "block" may declare
a "procedure" and a "procedure" contains a "block". Thus,
procedures may be of nested scope.
The final step of creating the abstract syntax tree is to construct
a ProgramMiniNode object.
Within its constructor, the ProgramMiniNode object calls the
checkSemantics(parentMiniNodeStack) method upon itself. Using the
"Composite"
design pattern, semantics are checked throughout the entire tree
by propagating the call to each of the child nodes.
In checking semantics, each MiniNode object must be able to access its
ancestor MiniNodes in the object graph. For example, to determine
whether a variable NameMiniNode has been declared before use, it must
traverse its lineage to determine whether there is a DeclarationMiniNode
or ParameterMiniNode which initially declares it.
To potentially support the use of the "Flyweight" or
Conservator design pattern in the future, MiniNode objects do not
reference their parent MiniNodes. That is, instead of a tree data
structure which is currently used, a directed acyclic graph (DAG) may
be used in the future where each MiniNode, or just its intrinsic data,
may have multiple parents. Thus, a Stack object is passed as the argument
to the checkSemantics() method. Prior to calling this method on any
children, a MiniNode first pushes itself onto the parentMiniNodeStack.
After the method has been called on all of its children with this Stack
as the argument, the MiniNode then pops itself off of the Stack.
In this manner, a MiniNode receives and passes the lineage during the
semantics checking phase without requiring that explicit references be
to the parents be stored within the MiniNode objects themselves.
If there is a semantic error, a SemanticErrorException will be thrown
and parsing will abort. At this time, a SemanticErrorException may be
thrown for any one of the following reasons:
The "Composite" and "Visitor" design patterns are used to generate
code from the abstract syntax tree intermediate representation (IR).
Each MiniNode object implements the method generate(MiniNodeCodeVisitor).
When this method is called the abstract syntax node generates code
for itself including its descendents. Within the generate() method
of a parent MiniNode object, calls are made to generate() method of the
children MiniNodes in the style of the Composite design pattern used
above during semantic checking.
To allow for the translation of the abstrax syntax tree IR to any
arbitrary language of choice, with the decision being made at runtime
without the need to redesign or recompile the IR classes, the "Visitor"
design pattern is exploited.
As an argument to the generate() method, an object that implements
the MiniNodeCodeVisitor
interface is provided. The MiniNodeCodeVisitor interface defines
methods which are capable of generating code for each of the different
MiniNode types. The single argument to each of the methods is an
instance of the specified type of MiniNode which contains the data
and accessor methods required to generate the code.
True to the style of the Visitor design pattern, whenever the
generate(MiniNodeCodeVisitor) method of a MiniNode object is called,
it in turns calls the appropriate generate<MiniNode-type>() method
on the MiniNodeCodeVisitor with itself as an argument.
A concrete implementation of a MiniNodeCodeVisitor is created
to generate the source code for a particular programming language
for each MiniNode class, such as a
JavaSourceMiniNodeCodeVisitor. Since the IR is fairly abstract,
it should be feasible to implement without redesign a
C source code MiniNodeCodeVisitor, a Java bytecode MiniCodeVisitor, or
a platform-specific tuple, assembly, or machine language
MiniNodeCodeVisitor. Thus, by using the Visitor design pattern,
the IR MiniNode classes do not need to be changed to support translation
to a novel programming language nor ported to support code generation
for a new machine. Instead, only the creation and addition of a single
new class, a concrete MiniNodeCodeVisitor implementation for a
particular new language or platform, is needed.
The
JavaSourceMiniNodeCodeVisitor.java implementation translates
a MiniNode IR to Java source code. The constructor takes a
PrintStream argument so that the output may be directed to the
console or a file.
The Mini grammar defines a "to E do S end" loop structure which
loops the statement sequence S a fixed number of times determined
by the evaluation of the expression E determined at initiation.
Since Java does not support this type of loop structure, it is
converted to a "for" loop within the generateDefiniteLoopStatement()
implementation. As the initial evaluation of the expression and
assignment to a "count" variable requires the creation of a new
variable which was not originally in the Mini source code, a new
variable must be created. It must be ensured that this new variable
not conflict with any variable currently in the namespace of the
Mini source code. For this reason, such new variables use underscore
characters which are not used within the Mini syntax for variable
and procedure names. Additionally, as any new variable introduced
in the Java source code must not conflict with any other new variable
introduced, the nextTemp() method is used to generate a new integer
which is guaranteed to be unique to be used as part of the new variable
name.
The Mini grammar defines "read()" and "write()" statements which
read and print a variable number of integers from and to the standard
input and output respectively. As Java does not support a variable
number of arguments to its methods, the generateInputStatement() and
generateOutputStatement() implementations simply call the standard
i/o methods within a loop. I have chosen to inline code in place
for each read() and write() method rather than redirect to a static
Java library for the Mini equivalents to avoid the necessity to
import such a class with each translation or to add additional
methods that did not exist as procedures within the original Mini
source code.
Example translations:
Download
Installation
Example:
mkdir C:\mini
copy C:\windows\desktop\mini.zip C:\mini\mini.zip
C:
cd C:\mini
unzip mini.zip
Execution
Example:
C:
cd C:\mini
java -jar Mini.jar test/Test.mini
The source code, as translated to Java, will be displayed
on your console.
java -jar Mini.jar test/Test.mini test/Test.java
Scanning
Parsing
Generation
Java
Ex1.mini | ==> | Ex1.java |
Ex2.mini | ==> | Ex2.java |
Ex3.mini | ==> | Ex3.java |
NestTest.mini | ==> | NestTest.java |
Note that in the third example, the Mini source code defines a procedure which is then translated into a method in the Java source code. This is fairly simple so long as the Mini source code does not use nested procedures. Note that this version of Mini does not check to see if any of the Mini procedure or variable names are reserved words in Java and will thus need to be renamed.
The fourth example, NestTest.mini, has nested procedures. Procedure "sum3()" is nested within procedure "sum4()" which itself is nested with procedure "average4()". Note that statements within the innermost procedure "sum3()" can call any procedures and access any variables in the encompassing scopes.
The fourth translation, NestTest.java, shows how the semantics of nested procedures can be emulated by using inner classes. Each procedure which contains a nested procedure in translated into an inner class. Where the call to the nested procedure would be, a new instance of the inner class is created and the inner class procedure is then executed.
Java inner classes must not have the same name as any encompassing class. The current version of this translator assigns the inner class the name of the Mini procedure it is translated from which may lead to namespace conflicts. An improvement would be to assign the inner classes new names which were guaranteed to be unique, perhaps using the nextTemp() method described above, and using the procedure names for the inner class methods names without change.
This project demonstrates how a translator/compiler written in
Java may be created using object-oriented design patterns.
Through the run-time selection of a concrete MiniNodeCodeVisitor
implementation, it is suggested that a single abstract syntax IR
for a given language may be used to translate one language into
many other possible languages or different platform-specific
machine code operations without the need to redesign the IR.
Finally, a method of automatically translating the semantics of
nested procedures found in some languages to the Java programming
language, which does not permit nested methods, by using inner
classes was presented.
Conclusion
Feedback
Send your questions and comments regarding this program to
David Wallace Croft.