Static type-safety of Lava, related work

Before we can proceed to the outline of a proof of the static type-safety of Lava we have to introduce a number of terms, to clarify several pertinent notions, and to explain a number of general restrictions that we impose on the admissibility of assignments in order to achieve static type-safety for Lava.

1. Semantics of "type conformance"

In the presence of multiform types and type parameters the semantics of the statement "object x conforms to type Y" has to be reconsidered.

For instance, if x has been conceived and created as a List[Car], is it then admissible to assign x to a variable vList whose type is List[Vehicle]?

On the one hand, if you look only at the structure of x then x could clearly be re-interpreted as a List[Vehicle]. In this sense x conforms to the type List[Vehicle].

On the other hand we want to be able to apply all methods of a List[Vehicle] to vList . E.g., we would like to append any Vehicle to a List[Vehicle]. But if we would append a Bicycle to vList (after assigning x to vList) then the value of vList would no longer be a List[Car]: an obvious violation of the originally intended type of the object x.

So we may opt between two possibilities:

  1. Either we insist that assignment of an object x to a variable vList should imply that everything that we may do with a variable of vList's type (here List[Vehicle]) may also be applied to vList after the assignment of x,
  2. or we allow the assignment of a List[Car] or any other object of a type derived from List[Vehicle] to vList but prevent every potentially dangerous usage of vList by rigorous static type-checks.

In the first case we could not assign a List[Car] to vList, but the message send "vList.add(aBicycle)" would pass the static type-check.

In the second case a List[Car] could be assigned to vList, "vList.add(aBicycle)" would be rejected by the static check, but other interesting message sends would become possible and statically acceptable, for instance "vList.f(a1,...,an)", where "f" is any member function whose formal parameter types do not depend on any virtual types.

Lava does not make a one-sided decision in favor of one of these two alternative views but leaves the choice to the programmer for every single variable and VT declaration. "Subtype-substitution forbidden" is the default in Lava (if the sub-type isn't derived in the classical sense but by specialization of virtual types). It may be allowed explicitly for individual occurrences of a type T. The name of the type is enclosed in braces: {T}, and we speak of the "open" or "substitutable type" {T} corresponding to the "closed" type T. {T} stands in a sense for the set of types that can be c- or p-derived from T. Cf. our substitutable type samples.

The consequences of this refined notion of subtype-substitutability are considerable: On the one hand it enables more fine-grained and effective static type-checks, since it prohibits potential sources of type errors, like "vList.add (aBicycle)" above, that otherwise could be detected only at run time.

On the other hand it complicates the recognition of the admissible cases: In the presence of virtual or multiform types and in the absence of full subtype-substitutability you typically have to make sure that the same "effective pattern context" (EPC, see below) applies to the source and target side of assignments involving virtual or other multiform types (cf. section 4.1). So you have to determine and to compare the EPCs of the affected variables or expressions. In some cases you have to check whether an EPC is "static". The following sections deal with these problems and end up in a more detailed specification of assignment compatibility in Lava.

2. Classic vs. pattern-induced derivations

Pattern-induced derivation:

If vt1 is a virtual type of pattern P1 and vt2 overrides it in pattern P2 then vt2 is said to be "p-derived" from vt1. Likewise, if the value of vt1 is a concrete class A1 and A2 is the corresponding derived class in P2 then A2 is said to be p-derived from A1. We speak of a pattern-induced derivation or p-derivation in this case.

Classic derivation:

In all other cases, particularly in the absence of virtual types, or in the case of class derivations within the same pattern, we speak off "classic", or class, or c-derivations. (A single class is also considered to be c-derived from itself.)

3. Static vs. effective types, pattern contexts

First, we have to clarify which type restrictions have to be obeyed by run time objects in the presence of multiform types:

If an object is the value of a constant then its type is determined by the type of the constant and the implementation of the language guarantees that such an object has the correct type.

If the object is the output of the built-in object creation expression "new <type>" then the pattern context of the "receiving" variable to which the new object is assigned is applied on object creation.

In all other cases the object can be viewed as the value of some variable var (having some declared type):

If the declared type T of a variable var is multiform and depends on virtual types then the actual run time meaning of T depends on the actual pattern context that has to be applied at run time at this place where var is referenced.

Therefore, we have to lay down which pattern context shall take effect in such cases. Let us define several auxiliary notions before:

Static type ST:

The static type ST(expr) of an expression occurring in executable Lava code is defined as follows:

Note 1: In all cases the ST of an expression is reduced to the ST of a variable, viz. the (only) formal output parameter of the top-level function or operator of the expression. Therefore it is sufficient to consider the static (and also the effective) type of variables in the following. The same applies to ET, LPC, EPC, ... to be defined below.

Final value FV and final virtual value FVV:

The value of a virtual type may be another virtual type, etc..

FV(T) = if T is non-virtual then T else FV(VAL(T)) endif.

FVV(T) = if T is non-virtual then T
elsif VAL(T) is non-virtual then T
else FVV(VAL(T))
endif.

NULLPC and lexical pattern context LPC:

The special pattern context NULLPC is used to express that we are outside any pattern.

The LPC of a class interface ITF or of an initiator INI is the pattern that contains the declaration of ITF or INI (and possibly comprises ITF), or it is the "empty" context NULLPC if ITF or INI is not contained in any pattern).

The LPC of any piece of executable code that is contained in a member function or invariant of a class interface ITF is the pattern that contains (and possibly comprises) the declaration of ITF (or the "empty" context NULLPC if ITF is not contained in any pattern).

For an initiator INI the LPC is the pattern containing the initiators declaration (or the "empty" context NULLPC if INI is not contained in any pattern).

Self context SELFPC of a member function:

SELFPC is a special symbolic pattern context that stands for the pattern context of the "current call object", i.e., the (run time) self/this object of the respective member function to which SELFPC pertains. Although it can be effectively determined only at run time it plays an important role as a means to express that the types of two variables are interpreted using the same pattern context, viz. the SELFPC.

Effective pattern context EPC:

A multiform type is a type whose declaration is contained in a pattern. It may (but need not) depend on the virtual types of the pattern and thus will become a real, effective type only from the perspective of a concrete pattern context that is effective in the context of a concrete variable reference. We speak of a "static" EPC if it is the LPC of some variable or the NULLPC and thus can be determined already at check time (rather than run time).

(L) EPC of local variables:

We distinguish three cases as to the nature of the static type ST(var) of a local variable var:

(L1) ST(var) is non-multiform.

Then EPC(var) = NULLPC ("EPC is static").

(L2) ST(var) is multiform and contained in the LPC(var).

Then if var is referenced from within a local or member function or invariant
then EPC(var) = SELFPC
else EPC(var) = LPC(var) ("EPC is static").

In particular: EPC(self) = SELFPC.

Note: If var is a local variable of a member function having the "callable only from this VT context" attribute then EPC(var) = SELFPC = LPC(var) ("EPC is static").

(L3) ST(var) is multiform and its declaration is not contained in the LPC(var).

Then EPC(var) = NULLPC ("EPC is static").

(F) EPC of formal parameters:

We distinguish four cases as to the nature of the formal parameter var:

(F1) var is a formal parameter of a member function of some class interface ITF
or of a local function of the implementation of ITF
and its declared type ST(var) is contained in LPC(ITF).

Then EPC(var) = SELFPC.

Note 1: From the "interior" perspective of a function, we cannot say any more about SELFPC. However, from the caller's point of view EPC(var) = SELFPC is the same as the EPC(callvar.mem) of any member variable mem of the variable callvar with that the function is called.

Note 2: If var is a formal parameter of a member function having the "callable only from this VT context" attribute then SELFPC = EPC(var) = LPC(var) ("EPC is static").

(F2) ST(var) is not contained in LPC(ITF).

Then EPC(var) = NULLPC ("EPC is static").

(F3) var is a formal parameter of an initiator INI
and its declared type ST(var) is contained in LPC(INI).

Then EPC(var) = LPC(INI) ("EPC is static").

(F4) ST(var) is not contained in LPC(INI)

Then EPC(var) = NULLPC ("EPC is static").

(M) EPC of member variables:

Assume that var is a member variable a1.a2. ... .an (with n>1). Then a1 is a local variable or a formal parameter. (Note that a member x of the "self/this" object is designated by "self.x" in Lava.) The EPC of such a variable is then defined recursively as follows:

If EPC(a1. ... .an-1) = NULLPC
then EPC(a1. ... .an) = LPC(ST(an-1)) ("EPC is static")
else if ST(an) is contained in EPC(a1. ... .an-1)
then EPC(a1. ... .an) = EPC(a1. ... .an-1),
and EPC(a1. ... .an) is static if EPC(a1. ... .an-1) is static
else EPC(a1. ... .an) = NULLPC ("EPC is static").

Mapped virtual type MVT:

The function MVT maps a given virtual type vt1 contained in some pattern dp1 to its overridden counterpart vt2 in a pattern dp2 that has been derived from dp1:
MVT(vt1,dp2) = vt2,
provided that vt1 has been overridden by vt2 in dp2; otherwise we have
MVT(vt1,dp2) = vt1.

If the multiform but non-virtual type pt is contained in pattern dp1 then MVT(pt,dp2) causes pt and the types of all its member variables to be interpreted in the context of pattern dp2: The effect on the type of a member variable is the same as if we had a variable a1 of type pt in pattern context EPC(a1) = dp2 and would interpret the type of a member a1. ... .an in the context EPC(a1. ... .an-1). Note that MVT(pt,dp2) need not be declared explicitly in dp2 (by overriding pt) but may be an unnamed, implicit type.

For formal reasons we define MVT also for uniform types npt and arbitrary patterns p as follows:
MVT(npt,p) = npt.

Note: If the value of a virtual type vt1 is again a virtual type vt2 and vt1 has been overridden by vt1m in a derived pattern dp then the Lava override mechanism automatically guarantees that

This in turn enforces the validity of the following equation in this case:

MVT(VAL(vt1),dp) = MVT(VAL(MVT(vt1,dp)),dp).

In other words: The order in which you perform the following operations does not matter:

(A) You first proceed to the value of vt and then map the latter to the derived pattern dp.

(B) You first map vt to the derived pattern dp, then proceed to the value of the mapped VT, and finally you map it to the derived pattern, if it has also been overridden. (If not, then the final MVT operation has no effect.)

The effective type ET:

If the static type ST(var) of a variable var is not multiform then ET(var) = ST(var).

If ST(var) is multiform but non-virtual then ET(var) = MVT(ST(var),EPC(var)).

If ST(var) is virtual then we have to compute its final virtual value FVV, which may require a further mapping into EPC(var), before you can take the final value of the result. These three operations can be combined into a single formula as follows:

ET(var) = FV(MVT(FVV(ST(var)),EPC(var)))

which also applies to the two before-mentioned cases since MVT, FVV, and FV have been properly defined for the "degenerate" cases of uniform and non-virtual types.

ET(self) is the effective type of the variable "callvar" with that the function containing the "self" reference has been called.

Note 1: It can be concluded from sections 2.3 and 3.2.2 that the effective type ET(self) of "self" can in fact be computed from ST(self) in the same way as described above for all other kinds of variables. (The decisive argument is that if ST(self) is multiform then in pattern EPC(self) there is essentially no other type than MVT(ST(self),EPC(self)) that qualifies as the type of callvar).

Note 2: If EPC(var) = NULLPC then ET(var) = FV(ST(var)), i.e., the declared values of the virtual types apply.

Note 3: If a concrete object obj is created using the "new <type>" expression then ET(obj) = type. If obj is the value of a constant then ET(obj) is the type of the constant, for instance Integer for the constant 123.

4. Type-Checking Rules for the Admissibility of Assignments

Type-checking has to do with assignments. Parameter passing to a function or from a function must also be considered as a special kind of assignment: An actual input parameter is assigned to the corresponding formal input parameter, a formal output parameter is assigned to a receiving variable.

In the following section 4.1 we assume that all types are "closed", non-substitutable types. The case of "open", substitutable types is treated in section 4.2.

The value of a virtual type may be another virtual type, which again has a value, etc.. In the following the term "indirect value" designates any value that one encounters when backtracking this sequence of values.

4.1 Type-Checking in the Absence of Subtype-Substitutability

An expression "expr" may be assigned to a variable "var" if

  1. Both ST(var) and ST(expr) are non-virtual

    and ST(expr) is c-derived from ST(var)
    and if both are multiform then EPC(var ) = EPC(expr),
  2. or both are virtual types

    and EPC(var ) = EPC(expr)
    and they are equal,
    or one of them overrides the other (directly or indirectly),
    or one is the immediate
    or an indirect value of the other
    or of an override of the other,
  3. or one of ST(var) and ST(expr) is virtual and the other is non-virtual

    and the EPC of the virtual side is "static" (see the definition of EPC above)
    and (if the ST of the non-virtual side is multiform then EPC(var ) = EPC(expr) )
    and ET(expr) is c-derived from ET(var).

In the third case ET(expr) and ET(var) are statically computable (at "check time") since the EPCs of both sides are static then, and thus we can check statically whether ET(expr) is c-derived from ET(var). Taking this into account, we may state that all of the foregoing three conditions can be evaluated statically.

The third case covers in particular situations where a virtual type with "closed bound" ([16] p. 189) would be definitely undesirable whereas the "callable only from this VT context" mechanism of Lava is still applicable and acceptable.

For simplicity's sake we have omitted the trivial rule that every type is assignment compatible with the common base type "Object" of all types.

The Lava programming environment LavaPE makes it very easy for the programmer to obey these rules since references to types are not keyed in as textual identifiers in Lava but selected from proper type-selection combo-boxes whose contents would depend on the current insertion point. These combo-boxes simply will not offer inadmissible types for selection in the foregoing situations.

4.2 Type-Checking in the Presence of "Substitutable" Types

Again we consider the question whether an expression expr may be assigned to a variable var, but this time we assume that the type of at least one of these is declared "substitutable" explicitly.

An expression expr may be assigned to a variable var if

  1. ST(var) = {Tvar} is substitutable
    and ST(expr) is some Texpr or {Texpr}
    and Texpr is c-derived or p-derived from Tvar.

The safeness of this kind of assignments is obvious. The actual problem is,

  1. The first problem may cause a premature termination of the context-tracking process for var, and thus EPC(var) counts as "undefined". In this case our static checks have to reject all assignments whose left-hand or right-hand side is var, and for a message send "var.func(ap1,...,apn)" the assignment compatibility of actual and formal parameters cannot be assessed, either, and must yield a negative result therefore, except if actual and formal parameter types are declared outside any pattern context: then the rules of section 4.1 may be applied.
  2. If the first problem does not occur but var's type {Tvar} is substitutable then it is still impossible to statically determine a fixed call context for the above message send. Since we insist on purely static type-checking, we simply reject such a message send if its actual or formal parameter types are contained in the pattern context P of Tvar or in a context derived from P. Otherwise we attempt to apply rule (a) above or the rules of section 4.1.

5. Semantics of "type-safety"

Type-safety has to do with assignments. An assignment has a left-hand side, which is a variable, and a right-hand side, which may be a variable, a constant, or an operator or function expression.

Now the static type-safety of Lava can be stated as follows:

6. Proof outline: Lava is statically type-safe

The places where the run time system injects objects into the program must be made type-safe by the run time system: Constants, built-in creation, copy, and cloning functions, other built-in operators and functions, component interfaces. Likewise, all cases of missing initialization of variables or exceptional operations leading to undefined results have to be detected by the run time system and will never cause type violations.

Now assume that we had a "bad", non-type-safe Lava program that performs a "bad" assignment A. If the right-hand side of A is ok, i.e., the object delivered by the right-hand side expression conforms to the type of that expression, then we stay with A. Otherwise we can prove that there yet must exist a "bad" assignment whose right-hand side is ok:

We backtrack the data flow, beginning with A, by walking from the left-hand side (lhs) to the right-hand side (rhs) of A and of all predecessor assignments until we arrive at a "bad" assignment whose rhs is ok.

This is guaranteed to occur: Either we arrive at a "good" assignment after a "bad" assignment; then both of its sides are ok and its lhs is the rhs of its predecessor assignment in that sequence, which thus would be a "bad" assignment whose rhs is ok.

Or we never arrive at a "good" assignment. But then we must finally end up after finitely many steps at a "bad" assignment whose rhs is a constant, built-in creation function, etc. (see above), which we have assumed to be ok.

So in any case we can find a "bad" assignment A from an expression expr to a variable var whose right-hand side expr is ok. The run time type of the result object obj of expr conforms to the effective type ET(expr) since the rhs of our "bad" assignment is ok. In the following it suffices to prove that ET(expr) is c-derived from ET(var). We may conclude then that the run time type RT(obj) conforms also to ET(var).

Since A must nevertheless conform to the assignment compatibility rules of section 4.1 we can distinguish the following cases:

Case 1: Both ST(var) and ST(expr) are non-virtual
and ST(expr) is c-derived from ST(var)
and if both are multiform then EPC(var ) = EPC(expr).

(A) Assume that not both of ST(var) and ST(expr) are multiform.

If ST(var) were multiform then ST(expr) would be multiform, too, since c-derivation from a multiform base type is only allowed within the containing pattern and thus yields a multiform derived type. So we may conclude that ST(var) is uniform. Hence ET(var) = ST(var).

If ST(expr) is uniform, too, then ET(expr) = ST(expr) and thus ET(expr) is c-derived from ET(var).

If ST(expr) is multiform then ET(expr) arises from ST(expr) by replacing the values of certain virtual types on which ST(expr) depends. Nevertheless the result ET(expr) of this mapping still "contains" ST(var) as a base type. Hence ET(expr) is c-derived from ST(var) = ET(var), what had to be proven.

(B) Assume both ST(var) and ST(expr) are multiform (but non-virtual).

Then EPC(var) = EPC(expr). ST(expr) is c-derived from ST(var) and therefore just adds a number of additional members to ST(var). ET(var) arises from ST(var) by replacing the values of certain virtual types on which ST(var) depends. Transition from ST(expr) to ET(expr) applies the same replacements to the ST(var) base and retains the additional features of ST(expr). So ET(expr) is still c-derived from ET(var),.

Case 2: Both ST(var) and ST(expr) are virtual
and EPC(var ) = EPC(expr)
and they are equal, or one of them overrides the other (directly or indirectly), or one is the immediate or an indirect value of the other or of an override of the other.

(A) Assume ST(var) = ST(expr):

In this case EPC(var ) = EPC(expr) implies that ET(var) = ET(expr), and consequently ET(expr) is c-derived from ET(var).

(B) Assume that ST(var) overrides ST(expr) directly or indirectly:

(An analogous argumentation applies to the opposite case.)

Then we may conclude from EPC(var ) = EPC(expr) that MVT(ST(var),EPC(var)) = MVT(ST(expr),EPC(expr)) is true, i.e., ST(var) and ST(expr) are mapped to the same virtual type in EPC(var). Consequently their effective types ET(var) and ET(expr) are equal, too.

(C) Assume that ST(var) is the immediate or an indirect value of ST(expr) or of an override of ST(expr):

(An analogous argumentation applies to the opposite case.)

A quite similar argumentation as in (B) leads us to the conclusion that MVT(FVV(ST(var)),EPC(var)) = MVT(FVV(ST(expr)),EPC(expr)), and hence ET(var) = ET(expr),.

Case 3: One of ST(var) and ST(expr) is virtual and the other is non-virtual
and the EPC of the virtual side is "static" (see the definition of EPC above)
and if the non-virtual side is multiform then EPC(var ) = EPC(expr)
and ET(expr) is c-derived from ET(var).

In this case nothing remains to be proven, in fact, but what matters is the observation is that both ET(var) and ET(expr) can be computed at check time under these conditions.

7. Related Work

Since a few years there is a broader discussion on "generic and virtual types" and particularly on desirable Java enhancements concerning genericity. A particular outcome of this discussion was a notion of "virtual type" that is based on the following idea: A virtual type is described by a special kind of member variables of classes. The value of such a virtual type member is a concrete (conventional) type or defined (in more recent approaches like [16] below) by a recursive type expression that may involve this same or other virtual types.

In order to reference such a virtual type in variable declarations you need an object of the respective class containing this virtual type field. (In most cases the "this" object is used, which may be omitted usually, or the examples make use of "inner classes" in the specific sense of Java and thus inherit the virtual type fields of the containing "outer" class.)

Lava takes a rather different approach, primarily for the following reasons:

  1. We felt that the type of a variable should be declared in a more static way. It is a rather crude disruption of tradition that should be avoided if you have to provide some executable code and to produce some object having virtual type fields before you can declare the types of further variables of your actual application.
  2. We preferred to maintain the strict traditional view on "type conformance" (type = class describing the mandatory attributes and methods that may be applied to every object conforming to that type), while other proposals, for instance [16], seem to allow assignments from List[Car] to List[Vehicle]; cf. section 1.
  3. The special relation of inner classes to their container classes in Java has been invented in the likewise special context of the revised event handling / callback concept of Java 1.1. We have a different callback concept in Lava that doesn't need inner classes in the sense of Java.

    Packages with virtual type parameters appeared to be more appropriate for the representation of patterns consisting of several interrelated classes (like model/view frameworks).
  4. It is a very undesirable restriction if you have to declare a virtual type's bound to be "closed" (= non-overridable) in order to insert a constant or a newly created object into a structure that expects this object to be of the respective virtual type.

    Example: You want to append a "new String" object to a List[String]. The List[String] pattern should nevertheless remain open for further specializations of the String parameter.

The most recent developments w.r.t. Java genericity enhancements (cf. [3], [6]) seems to center exclusively around modernized versions of "parametric polymorphism" (a successor concept to C++ templates in a sense), which, unlike virtual types, is not particularly suited for coping with collective specialization of families of interrelated classes.

In our section on counter-counterexamples we show that the examples in reference [4] below that are presented there in order to demonstrate inherent flaws of virtual types don't cause any problems in Lava. Therefore we are still convinced that virtual types represent the more advantageous, promising and modern approach to genericity in object-oriented languages.

References

  1. Agesen, O., Freund, S., Mitchell, J.: Adding parameterized types to Java. Proceedings OOPSLA'97.
  2. Bank, J.A., Myers, A.C., Liskov, B.: Parameterized Types for Java, Proceedings POPL.97, p.132-145.
  3. Bracha, G., Odersky, M., Stoutamire, D., Wadler, Ph.: Making the future safe for the past: Adding Genericity to the JavaTM Programming Language. Proceedings OOPSLA'98.
  4. Bruce, K.B., Odersky, M., Wadler, Ph.: A Statically Safe Alternative to Virtual Types. Proceedings ECOOP.98, p.523-549.
  5. Bruce, K.B., Vanderwaart, J.: Semantics-Driven Language Design: Statically Type-Safe Virtual Types in Object-Oriented Languages. Proceedings MFPS'99.
  6. Cartwright, R., Steele, G.J.: Compatible Genericity with Run-time Types for the JavaTM Programming Language. Proceedings OOPSLA'98.
  7. Chambers, C., Harrison, B., Vlissides, J.: A Debate on Language and Tool Support for Design Patterns. Proceedings POPL.00, p.277.
  8. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995, ISBN 0201633612.
  9. Gosling, J., Joy, B., Steele, G.: Java Language Specification. Sun Microsystems, 1996, ISBN 0201634511.
  10. Java: http://www.javasoft.com.
  11. Lava: http://lavape.sourceforge.net/Lava.
  12. Meyer, B.: Eiffel: The Language. Prentice Hall Europe, 1992, ISBN 0132479257.
  13. Odersky, M., Wadler, Ph.: Pizza into Java: Translating Theory into Praxis. Proceedings POPL.97, p.146-159.
  14. Stroustrup, B.: The C++ Programming Language, Special Edition. Addison-Wesley, 2000, ISBN 0201700735.
  15. Thorup, K.K.: Genericity in Java with Virtual Types. Proceedings ECOOP.97, p.44-471.
  16. Thorup, K.K., Torgersen, M.: Unifying Genericity (Combining the Benefits of Virtual Types and Parameterized Classes). Proceedings ECOOP.99, p.186-204.
  17. Torgersen, M.: Virtual Types are Statically Safe. 5th Workshop on Foundations of Object-Oriented Languages, San Diego, CA, January 1998.