For most languages, implementation of a language interpreter in Java is perhaps the most straight-forward method of porting that language to the JVM. This approach was used by the Tcl  and Python  ports. Since there are number of different compilers that can convert Java source into JVM bytecode, an interpreter written in Java can run easily on the JVM.
When a program from the source language needs to be run on the JVM, this new interpreter must take as input the source program, as well as the input that the source program is expecting. An eval construct using these two input sets is then invoked, and in that manner, the source program is run.
This approach has a number of advantages. First, if the source language has a well-written specification, or is a language with few constructs, based around a single paradigm (e.g., a relatively pure object oriented or functional paradigm), then implementing an interpreter for the language is often a simple matter of implementing the specification. Design issues are often already decided by the specification or by the paradigm, greatly easing the burden on the implementor.
A second advantage is that real-time, on-the-fly code evaluation (i.e., eval($string)) is always available. The Java program that implements the interpreter simply needs to instantiate a new instance of the interpreter, and feed it $string as input.
However, this approach has two disadvantages, one of which is particularly problematic for a Perl port. The first disadvantage is speed. Since hardware devices that have JVMs on a chip are only a subset of the useful deployments of the JVM, considerations for JVM implementations in software are important. When a JVM is implemented in software, JVM bytecodes are typically interpreted by this software. Thus, as Per Bothner notes, ``if your interpreter for language X is written in Java, which is in turn interpreted by a Java VM, then you get double interpretation overhead'' . Such a situation is unacceptable for Perl, which has always prided itself on speed.
Another disadvantage that might be acceptable for some languages, but is completely unacceptable for Perl is code divergence. If a language has a well-defined specification that describes precisely the syntax and semantics of the language, code divergence is not an issue. An implementation must adhere to the specification. However, it has often been noted in the Perl community that ``the specification is the implementation''. The community cannot tolerate divergent implementations. Indeed, much work in the mid-1990s was done to stop the divergence of the Microsoft and Unix-like Perl implementations.
While the Perl community has plans to change this approach, by developing a language specification for newer versions of Perl, this work is still speculative. In addition, a good port of Perl to the JVM should support older versions as well as newer ones.
Therefore, if this interpreter approach were to be taken for current versions of Perl, it would require compiling perl, the existing C implementation of Perl, with a C compiler targeted to JVM. Experimental compilers of this nature do exist , but they are far from ready for production. In addition, such a port of Perl would undoubtedly be slower than any of the other approaches. Indeed, given the relatively large size of perl, such a port would most likely be completely inappropriate for JVM implementations embedded in small hardware devices or those embedded in larger software programs.
Therefore, simply waiting for a C compiler to be targeted to the JVM is not a reasonable approach for porting Perl to the JVM. Other methods must be investigated and attempted.
Copyright © 2000, 2001 Bradley M. Kuhn.
Verbatim copying and distribution of this entire thesis is permitted in any medium, provided this notice is preserved.