Complete static initialization checks

Inadvertent use of uninitialized variables or propagation of null values (Ø) to mandatory variables frequently cause programs to crash in ways that are hard and time-consuming to disentangle, and this gives rise to immense costs in turn. The attempt to prevent such situations has therefore been a constitutive design goal of the Lava development (see also Principal orientation of Lava).

More precisely, our goal was to prevent missing initialization and inadvertent use of null values by static checks (i.e., at programming time) wherever possible. The Lava releases prior to 0.9 didn't fully achieve this. The introduction of the notion of closed objects together with new expressive means (the ifdef conditional statement and the else expression) in release 0.9 have changed Lava into an object-oriented programming language (the first one?) that completely prevents missing initialization and inadvertent use of null values already at programming time, i.e., by purely static checks.

This could be achieved only

  1. by imposing a much more stringent discipline on Lava programs as to where the mandatory member variables of Lava objects are initialized: they must be initialized in the body of an initializer method of the respective Lava class, and
  2. by a major revision of the expressive means for executable code, as compared to more traditional wide-spread procedural and object-oriented programming languages.

In particular, we have

A) Missing initialization

Missing initialization has to be prevented with respect to

  1. local variables,
  2. member variables of classes, and
  3. Output parameters of function calls.

(Note that global variables have been abolished anyway in Lava in favor of a purely local, explicit data flow, in order to facilitate the analysis and comprehension of data flow in complex programs and to avoid the obfuscating and confusing effects of "far-distance dependencies" within complex applications.)

Before you proceed, you should have understood how Lava replaces traditional loop constructs and that this implies:

For every read access to a Lava variable you can easily find the origin of the current value of this variable by following the containing program branch in upward direction (unless the variable is an input variable and therefore a value is assigned to it already outside this function).

Now let's return to the three kinds of initialization problems. Note that LavaPE 1. shows executable code in separate windows, every function and initiator body and every invariant in a window of its own, and 2. after every single editing step in the LavaPE structure editors LavaPE performs a static check on all the Lava code that is currently being presented in any open Window.

1. Local variables

Local variables may be introduced in several different ways, i.e., by several constructs:

which all assign completely initialized objects to the respective local variables that they introduce, except for the declare construct. So we need to consider only declared variables.

For every read access to a declared local variable LavaPE checks (statically, = at programming time) if a value is assigned to this variable above the place of reference in the same program branch and reports an error otherwise. A somewhat unusual consequence of this is that a newly declared local variable is always displayed bold and red initially, marking an error, since there has not yet been assigned a value to it.

2. Member variables

In order to guarantee that all non-optional member variables of Lava objects are properly initialized, Lava enforces an unusually stringent initialization discipline:

  1. New Lava objects are created using the new expression, which requires the specification of an "initializer" associated with the new object's class.
  2. LavaPE makes sure (by purely static checks) that every initializer assigns a value to every non-optional member variable of this class in each of its branches.
  3. You can modify these values within the (optional) but clause of the new expression and in this way customize the object before its completion.
  4. When a new object leaves the new construct Lava marks it as finished. Lava makes sure (by run time checks) that you cannot pass an unfinished object (from within an initializer or a but clause) as an input parameter to any function.

    This enforces essentially a strict bottom-up construction of new objects from member objects that have to be constructed first.

Two concessions will (to some degree) soften the "pain" caused by such a strict bottom-up construction discipline:

From within an initializer

In more detail, point 2 above means that from every "return point" of the function body the check follows the respective branch in upward direction and makes sure that a value is assigned to every non-optional output parameter in this branch.

A "return point" may be the end of the function body or a succeed or fail/throw statement. The throw statement specifies an expression whose value designates an exception that is to be thrown.

If a function has output parameters (see below) then it must not be left without throwing an exception (checked at programming time). This applies also to initializers of classes.

An exception-throwing throw statement is the only way to exit from a class method or initializer if you cannot assign a value to every non-optional output parameter. (Checked at programming time.)

3. Output parameters

First note that Lava doesn't support parameter passing "by reference". In Lava, function parameters are either input or output parameters. For reference parameters it wouldn't be clear whether or not the function assigns a value to them, whereas a Lava function must assign a value to every non-optional output parameter, and this is checked already at programming time (= statically).

This is an essential prerequisite for all kinds of Lava initialization checks since only then you can also cover those cases where a variable is initialized by being an actual output parameter of a function call.

(Note: Though function parameters aren't passed "by reference" in Lava, they are yet passed "by address" and never "by value" (= copied)).

Much like the member initialization checks (see above), the initialization checks for output parameters are performed at all return points of the function body.

An exception-throwing throw statement is the only way to exit from a function if you cannot assign a value to every non-optional output parameter. (Checked at programming time.)

In this way Lava makes sure that undefined outputs cannot be used inadvertently by the caller of a failing function but only if the resulting exception is caught and the respective output parameters are used nevertheless thereafter.

(The worst thing that may happen then is that such an output is Ø: then a null-pointer-exception will be thrown in turn, or you use the output although its value may be meaningless in this case: that's your own risk and should be avoided without reliable knowledge about the function's implementation.)

B) Preventing Ø-to-mandatory assignments

In Lava, an expression that yields an optional result (i.e., one that may assume the special value Ø), cannot be unconditionally assigned to a mandatory variable in Lava, but you must either enclose the assignment in the then clause of an ifdef statement (which tests whether the optional variable in the ifdef-condition has a non-null value), or you must use an else expression or a chain of else expressions which finally provides an alternative expression that yields a non-optional value.

Cf. the option types of the Nice language, a Java derivative. Note: In Lava we prefer to assign the optional attribute to variables rather than types, since types actually are object types, and objects aren't optional, but variables may be optional, which means that the special null value (Ø) may be assigned to them.


Initialization in Lava compared to C++ / Java / C#

The definite assignment checks of Java and C#, like those of Lava (see above), ensure merely that local variables are assigned before they are used. But the absence of the optional/mandatory distinction and of a clear distinction between input and output parameters makes it impossible to have a greater benefit from the analytical capabilities of the compiler. E.g., look at the following little Java program:

public class Hello {
public static void main(String[] args) {
A a, a2;

a = new A();
a2 = a.func();
System.out.println(a2.toString());
}
}

class A {
public int x, y;

public A func() {
return null;
};
};

"a2 = a.func();" assigns the Java null object to a2; so "a2.toString()" in the next line will trigger a NullPointerException at run time. But the Java compiler doesn't recognize and report this error.

In contrast to this, LavaPE will allow the assignment of Ø (null) to an output parameter of a function only if the corresponding formal parameter is declared optional. As a consequence, the assignment "a2 <== a.func()" in the main program would be rejected at programming time since the assignment target a2 is mandatory. If a2 were declared optional instead, then "a2.toString()" would now be rejected since the use of the optional variable a2 isn't secured by an ifdef statement or else expression here.

The constructor notions of C++ / Java / C# can only guarantee that an object is initialized anyhow: Its member variables are set to default values (numbers to 0 or 0.0, reference types to null, etc.) if a more meaningful value isn't assigned explicitly. In most cases this won't prevent the application from crashing, or, even worse, from silently delivering faulty results.

In contrast to this, Lava informs the programmer already at programming time if an initializer is missing or an existing initializer fails to initialize all mandatory member variables. The enforced distinction between optional and mandatory variables is of crucial importance in this context.

Moreover, Lava provides an additional way to prevent meaningless/faulty object initialization: You can add "invariants" to class interfaces and class implementations, i.e., logical conditions that must hold true for newly created objects of the respective class (and also after any non-read-only method has been applied to such an object).

The "merciless" object initialization discipline of Lava ("objects must be explicitly and completely and solely initialized within initializers") has another highly desirable and beneficial consequence that is perhaps even more important as the complete prevention of hard to trace run time crashes due to missing (or meaningless automatic) initialization:

Initialization of object member variables cannot be scattered any longer in obscure, arbitrary, and inextricable ways over large portions of code.

This makes it much easier to quickly locate the place where a value is assigned to a member or other variable, and thus to understand the data flow and the entire structure of applications: an enormous advantage that cannot be overestimated.

In some cases this may, indeed, force the programmer to redesign the class structure of the respective application. E.g., it may be necessary to subdivide objects into parts corresponding to several very different stages of the application run.

Example:

Consider a tree structure of actual application data objects which perhaps is completely constructed in a first stage of the application. A corresponding widget tree for an external GUI representation might be completely constructed in a second stage, before it is possible to compute the sizes and positions (i.e., a layout tree) for the widget tree in a final third stage.

In a traditional language without stringent initialization discipline you would perhaps combine all these three pieces of information (actual application data + widget data + layout data) or at least the last two (widget + layout data) in a single object. But you would still need those three stages to successively complete the object structure. The constructor used in the initial object creation would produce only a very incomplete, preliminary object, and solely the programmer would have to make sure that the still uninitialized member variables aren't referenced prematurely.

In Lava this wouldn't work, except perhaps if you declare all those member variables that are to be initialized during the second and third stage to be "optional", although they are not really optional. This would not only be an abuse of the "optional" notion but would entail the rather unpleasant "penalty" that you would have to protect all references to those (not really optional) variables by ifdef or else constructs.

The only appropriate Lava solution would be to separate the application data from the widget data and the latter from the layout data, and to construct an application object tree in the first stage, then a widget tree in the second stage, and finally a layout object tree.

This would by no means be a disadvantage, but quite to the contrary: We expect that he stringent object initialization discipline of Lava will in many cases lead to a very desirable and potentially advantageous separation of concerns, for instance in our example, if you intend to present the same application objects in two or more different views either by different widgets, or the same widget tree according to different layout strategies, say, e.g., in one view as a left to right tree and in another view as a top to bottom tree.


Summary

The most fundamental differences between Lava and other languages w.r.t initialization are:

  1. Lava enforces complete initialization of objects within their respective initializers.
  2. Lava provides a way to safely deal with still unfinished objects in the initialization phase.
  3. Lava makes sure that optional-to-mandatory assignments are always secured by specific constructs ifdef and else.

This stringent and seamless initialization discipline of Lava completely and reliably prevents all kinds of inadvertent access to uninitialized variables or null objects by purely static checks, i.e. already at programming time.

Two further kinds of erroneous data access that can be recognized only at run time remain in Lava:

  1. You may catch an exception thrown by class method and subsequently ignore the undefined, null state of output parameters that normally would have well-defined non-null values if the exception had not occurred. Access to such an output parameter will in turn trigger a null-object exception.
  2. If you (by mistake / prematurely) finalize/zombify objects (see the base class Object of all Lava classes) and then try to access these, a specific access-to-zombie exception will be thrown.

See also

Unfinished/closed/opaque/quarantined objects

Multi-phase and recursive initialization

Principal orientation of Lava