Index: /reasoner/performance.tex
===================================================================
--- /reasoner/performance.tex	(revision 227)
+++ /reasoner/performance.tex	(revision 228)
@@ -1,36 +1,42 @@
 \section{Performance Considerations}\label{sectPerformance}
 
-In this section, we discuss some observations made during the revision of the reasoner. We document these observations here as a general explanation of the implementation, but also to avoid that future revisions accidentally violate undocumented performance improvements. Baseline for this discussion is again the version from January 2018. Although the considerations may at a glance appear rather specific, they typically stem from more general Java performance optimization knowledge typically hidden behind constraints, reasoner or model data structures.
+In this section, we discuss some observations made during the revision of the reasoner and the preparation of this report. We document these observations here as a justification for implementation decisions, but also to avoid that future work accidentally violates undocumented performance improvements. Baseline for this discussion is the version from January 2018. Although some considerations may appear rather specific, they typically stem from more general Java performance optimization knowledge typically hidden by the context of the reasoner, e.g., in constraints or model data structures. 
+
+The performance measures reported in this section are just illustrative, i.e., we did not perform systematic measurements to isolate the individual effects while revising the reasoner.
 
 \begin{itemize}
-    \item Typically, array-based lists shall be initialized with (an estimate of) the required size~\cite{Shirazi02}. For lists of unknown size, we use \emph{linked lists} instead of array-based lists. Although some initial sizes could be obtained, this appears infeasible as it would require a further traversal of the input model. Main operations required for reasoning are adding elements, iterating over all elements, lookup of elements and removing elements to clean up memory incrementally. Here, linked lists can avoid performance-impacting resizing operations, although linked lists in Java typically require explicit use of iterators that shall be avoided~\cite{Shirazi02}. 
+    \item Create/translate a \emph{constraint in one step}, in particular, avoid multiple variable substitutions for the same constraint. Initially, some constraint types such as  default values assignments or derived type constraints were first substituted partially, collected and, finally, substituted again. This implies iterating multiple times over the same constraint base as well as re-creating and throwing away constraint instances. We managed to avoid such multiple substitutions and, moreover, improved the variable substitution so that new constraint instances are only created if variable substitutions are actually applied.
 
-   \item For the central list-based data structures, namely constraint lists, index-based access is not needed. Constraints are added to (temporary) lists and a single iteration is needed to transfer the constraints into the (initial) constraint base. To \emph{avoid iterators} here, we created an own container class (\class{ConstraintList}), which performs typical traversal operations such as element transfers internally, using basic loops~\cite{Shirazi02}. Moreover, we equipped these operations with the ability to directly transfer the internal list elements instead of only the payload to avoid unnecessary (peak) memory allocation during the transfer operations. 
+    \item Determine whether a certain \emph{constraint (type) is needed} for reasoning and avoid its translation/creation wherever feasible. While this is easy for some constraint types, e.g., annotations in incremental reasoning mode, checking for frozen constraints would require tests in various places of the constraint translation. So far, kept these tests in a central place for better maintenance ($add$ function), but probably considering such properties earlier would allow for better performance.
 
-   \item The \emph{constraint base} is used for storing the constraints to be processed. Initially, during reasoning a single project, a pointer-based iterator traversed the array-based constraint list and the algorithm kept all processed constraints in memory. Constraint re-scheduling just added more constraints. Basically, using a linked list avoids re-allocation of big memory blocks. Moreover, using the list as a queue enables dynamic memory usage according to the needs, i.e., the actual constraint can be obtained by removing the head element and the main reasoning loop in Algorithm \ref{algEvaluateConstraints} can just operate until the constraint base is empty (if no further re-scheduled constraints are added). To also speed up lookup and removal for re-scheduling, we extended the \class{ConstraintList} and integrated it with a hashtable (\class{ConstraintBase}). This avoids the initially quadratic complexity in constraint re-scheduling for forward reasoning. On large models, the difference in ad ho performance comparisons was roughly 600 ms, i.e., around 25\% of the reasoning time, even if we do not use JIT supported data structures.
+    \item Usually, array-based lists shall be initialized with (an estimate of) the required size~\cite{Shirazi02}. For lists of unknown size, we now use \emph{linked lists} instead of array-based lists. Although some initial sizes could be estimated by analyzing the model, this is infeasible as it would require another traversal of the IVML input model. Important model operations used in reasoning are adding elements, iterating over all elements, lookup of elements and removing elements to clean up memory incrementally. Here, linked lists can avoid expensive resizing operations, although Java linked lists typically require explicit use of iterators that also shall be avoided~\cite{Shirazi02}. 
 
-    \item In general, maintaining unneeded \emph{partitions of the constraint base} shall be avoided. Although we were able to speed up the creation of the constraint base by special transfer operations, it is better to directly add constraints to the constraint base where possible. Thus, we removed several unneeded temporary lists. However, adding constraints directly is not always possible due to the incremental construction while traversing the model and the prioritization of certain constraints, such as default values or constraints in eval-blocks. As an alternative, we could use a numeric prioritization stored along with the individual constraints (like the attached type information) and use a sorted data structure. So far, we tried to avoid additional information stored in the constraints or to investigate specific management records for each individual constraint that would be required when avoiding constraint translation by changing variable contexts as discussed in Section \ref{sectApproach}.
+   \item For the central list-based data structures, namely constraint sequences, index-based access is not needed. Constraints are added to (temporary) lists and a single iteration is needed to transfer the constraints into the (initial) constraint base. To \emph{avoid iterators} here, we created an own linked container (\class{ConstraintList}), which performs typical traversal operations such as element transfers internally, using loops~\cite{Shirazi02}. Moreover, we equipped these operations with the ability to directly transfer the internal list elements instead of only the payload to avoid peak memory allocation during the transfer. 
 
-    \item Determine whether a certain \emph{constraint (type) is actually needed} for reasoning and avoid its translation/creation wherever feasible. While this is easy for some constraint types, e.g., annotations in incremental reasoning mode, checking for frozen constraints would require tests in various places of the constraint translation. So far, we opted for keeping these tests in a central place for better maintenance ($add$ function), but probably testing constraint properties more early would allow for better performance.
+   \item The \emph{constraint base} is used for storing the constraints scheduled for evaluation. In the initial version, two nested index iterators traversed the array-based constraint base. Using a linked queue facilitates dynamic memory usage, i.e., constraints can be obtained incrementally without memory re-allocations by removing the head element until the constraint base is empty. To also speed up lookup and removal for re-scheduling and to avoid the initially quadratic complexity, we extended the \class{ConstraintList} by integrating a hashtable (\class{ConstraintBase}). On large models, the difference in ad hoc  performance comparisons in Eclipse was about 25\% of the reasoning time.
 
-    \item Create/translate a \emph{constraint in one step}, in particular, avoid multiple variable substitutions for the same constraint. Initially, some constraint types such as  those for default values and derived types were substituted partially, then collected and finally substituted again. This implies iterating multiple times over the same constraint base as well as re-creating and throwing away constraint instances. We managed to avoid such multiple substitutions and, moreover, improved the variable substitution so that new constraint instances are only created if variable substitutions are actually applied.
+    \item Maintaining unneeded \emph{partitions of the constraint base}, i.e., constraint sequences, shall be avoided. Although we were able to speed up the creation of the constraint base by special transfer operations, it is better to directly add constraints to the constraint base where possible. Thus, we removed several temporary lists from the initial version. However, adding constraints directly is not always possible due to the incremental transformation and the prioritization of certain constraints. As an alternative, we could apply a numeric prioritization stored in the constraints and use a sorted data structure. So far, our strategy was to avoid additional information stored in constraints wherever possible due to additional memory consumption.
 
-    \item \emph{Avoid output} during reasoning. The individual functions of the reasoner are called repeatedly for larger models and so emitting output, e.g., to the console, may sum up quickly to large runtime amounts. We experienced this problem for some models where the overall reasoning was considerably slow, while intuitively complexer models were processed significantly faster (approx. factor 7.5). The difference between these models was that the respective test case accepted for the slower models a huge number of reasoning errors (around 2300) while the faster ones did not cause reasoning errors. As the reasoner was emitting reasoning errors to the console, the difference in response time was spent on emitting the errors. In other words, while translation and constraint evaluation times for the slower models was significantly faster (approx. factor 13), the default output inverted the result. 
+    \item Incremental reasoning using a \textbf{stored constraint base} for incremental reasoning may speed up reasoning, but only achieves this through significant memory use. Thus, where possible, irrelevant constraints such as default values and frozen constraints shall be avoided. Moreover, we experience a ramp-up time of around 40 ms in contrast to some few milliseconds evaluation time. This is probably due to the required memory transfer operations on linked lists, i.e., for leaving the stored constraint base untouched and for copying the stored constraints to the active constraint base
 
-To prevent such behavior, the reasoner configuration contains a logger instance, that can be overridden to prevent output during reasoning. This logger instance shall define specific logging methods for complex types, such as reasoning messages, to prevent that independent of the concrete logging instance, complex types are turned by default into Strings (consuming superfluous reasoning time) and emitted through the basic logging method. Further, we moved the default output code from the reasoner to the reasoning result class allowing that such time consuming output is definitively performed outside the reasoner and that only executed if intended by the caller. 
+    \item \textbf{Reuse visitor instances} if the same visitor is applied more than once (specific case of instance reuse~\cite{Shirazi02}). This is particularly beneficial if the visitor maintains complex internal data structures such as maps or lists that are allocated when creating a visitor instance. If possible, even avoid internal storage structures that are consulted infrequently, e.g., for just transferring the results of a traversal into another data structure. Here, our approach relies on callbacks for the result values, i.e., results are directly handed over to the caller when they become available instead of utilizing intermediary data structures. Informal performance observations in Eclipse showed that replacing a single visitor (of five different visitors used in the initial implementation) improved the reasoning time on large models by roughly 3-4\%.
 
-    \item \textbf{Reuse visitor instances} if the same visitor applied more than once (as also for other types of instances~\cite{Shirazi02}) and the visitor maintains complex internal data structures such as maps or lists. If possible, even avoid internal storage structures that are consulted only once, e.g., for transferring the results of a traversal into another data structure. To avoid this, we created an abstract visitor which leaves the handing of the results open. A concrete visitor replaces the original visitor still storing the results as before in an internal datastructure. In contrast, the visitor using by the reasoner stores the results directly in the target datastructure. Informal performance observations showed that replacing a single visitor (out of 5 used by the implementation) improved the performance on large models by roughly 80 - 100 ms, i.e., 3-4\% of the reasoning time.
+    \item \textbf{Use instance pooling.} If temporary data structures are used, pooling rather than instance re-allocation can be beneficial~\cite{EichelbergerSassSchmid16, Shirazi02}. However, pooling requires explicit memory management, i.e., obtaining and releasing instances. The revised version pools frequently used sets of compounds in the translation of compounds, which helps saving about 1\% reasoning time on larger models. For this reason, the expression evaluator (already in the initial version) utilizes pools of frequently used internal data structures.
 
-    \item \textbf{Use instance pooling} if temporary use of datastructures is required, pooling of instances can be rather helpful~\cite{EichelbergerSassSchmid16, Shirazi02}, but requires explicit memory management, i.e., obtaining and releasing. For example, we pool frequently used sets of compounds in the translation of compounds, which helps saving 20-30 ms on larger models. Moreover, we pool frequently used internal structures of the expression evaluation (but don't have measurements available as this was part of the original implementation).
+    \item In general, \textbf{avoid temporary instances}, e.g., iterators, temporary constraint creation, or temporary data structures wherever possible. Possible strategies are maintaining a singleton instance in single-thread processing, using instance pools or just using index-based loops if possible. Sometimes, applying a (reusable or constant) functor to a collection is more efficient than an explicit loop. %If permitted by the compilation settings (EASy-Producer usually is compiled against JDK 1.6), also a lambda-function instead of a heavy-weight functor class may be appropriate.
 
-    \item In general, \textbf{avoid temporary instances}, e.g., iterators, temporary constraint creation, or temporary data structures wherever possible. Possible strategies are maintaining a singleton instance in single-thread processing, using instance pools or just using different types of loops. Sometimes, also applying a (reusable) functor to a collection is more efficient than an explicit loop. If permitted by the compilation settings (EASy-Producer usually is compiled against JDK 1.6), also a lambda-function instead of a heavy-weight functor class may be appropriate.
+    \item \textbf{Avoid instanceof.} We experienced that a (reusable) visitor can be significantly faster than a series of alternatives with \IVML{instanceof} comparisons (or similar operations provided by the IVML type system). Moreover, the Java \IVML{instanceof} operation is known to be rather indeterministic regarding its performance readings.  However, implementing a visitor class may affect the human code reading flow.
 
-    \item Avoid \textbf{tests with instanceof}. We experienced that a (reusable) visitor can be significantly faster than a series of alternatives with instanceof comparisons (or similar operations of the IVML type system). However, implementing a visitor class can interrupt the code reading flow.
+    \item During the revision, we replaced an originally \emph{flat hash table} realizing $\variableMapping$ by a stack of hash tables (cf. Section \ref{sectVarMapping}) to support nested IVML types and to enable the incremental quantification. Although the new data structure may require more memory, the overall reasoning speed after this modification was roughly the same.
 
-    \item Incremental reasoning using a \textbf{stored constraint base} may speed up reasoning, but only achieves this through significant memory use. Thus, where possible, irrelevant constraints such as default values and frozen constraints shall be avoided. Moreover, we experience a ramp-up time of around 40 ms when using the stored constraint base, which is probalby due to the required constraint transfer operations, i.e., for keeping the stored constraint base and copying the stored constraints to the active constraint base.
+    \item \emph{Avoid output} during reasoning. The reasoner algorithms are called repeatedly for larger models and so emitting simple output in one algorithm, e.g., to the console, can sum up quickly to large runtime amounts. We experienced this for models where the overall reasoning was considerably slow, while intuitively complexer models were processed significantly faster (approx. factor 7.5). The problem was shadowed  by a test case, which intentionally accepted a huge number of reasoning errors (for the slower models around 2300), while the faster models did not cause any reasoning error. As the reasoner emitted the messages to the console, it consumed additional time when reasoning errors occurred. While constraint translation and evaluation times for the slower models was significantly faster (approx. factor 13), the default output reverted the result for inconsistent models. 
 
-    \item During the revision, we replaced an originally \emph{flat hash table} realizing $\variableMapping$ by a stack of hash tables (cf. Section \ref{sectVarMapping}) due to correctness issues and to enable the incremental creation of nested container constraints. Although the stacked structure may require more memory, the overall reasoning speed was roughly the same after the exchange of the data structures.
+To prevent such behavior, the reasoner configuration defines a logger instance, that can be specialized to customize the output during reasoning. However, such a logger shall not only consist of a String-based logging method, forcing complex types such as reasoner messages (including causing constraints) to be turned into a String, rather than specific methods for logging (parts of) complex types. If possible, such output may also be deferred to an adequate point in time so that the caller can decide whether and how to communicate the information, in particular if the underyling information is stored anyway in memory as this is the case for reasoner messages. We addressed this by moving such output functionality to the reasoning result class.
+
+Moreover, this problem was further shadowed by the continuous integration, because in release code parts of the logging were automatically removed. Although such an approach can be helpful and convenient if applied consistently, it also hides causes for performance issues as the reasoner behaves differently in production than in testing. 
 
 \end{itemize}
 
-Although we improved the performance of the reasoner significantly, there are still opportunities for further speeding up the reasoning operations. \TBD{On an Intel core i7, the reasoner reaches around 15-43 constraint evaluations in average per millisecond - runtime vs. full reasoning on the latest QualiMasster test model, currently reasoning time includes translation and evaluation time. This is already an improvement over the initial version, which operated at around 10 evaluations per millisecond in average (also with a higher number of constraint evaluations due to unneeded and uneffective constraints).} Potential improvements are filtering out irrelevant constraints as early as possible, evaluating certain constraints immediately and storing them only in the constraint base in case of failures, or even inefficient implementation of several functions and structures in the IVML object model.
+The considerations summarized in this section allowed to significantly speed up the implementation. As a rough indication, on an Intel core i7 laptop with 16 GBytes main memory, the IVML revised reasoner executes on our largest models around 23 constraint evaluations per millisecond in average, while the initial version operated at around 10 evaluations per millisecond. Also the overall number of constraints to be evaluated for these models dropped, as a large number of unneeded and ineffective constraints are avoided by the revised version. 
+
+Further improvements so far not considered in the revised version are filtering out irrelevant constraints earlier, evaluating (default) constraints immediately, using a type-based cache for accessor expressions or optimizing functions and data structures of the IVML object model.