Index: /reasoner/evaluation.tex
===================================================================
--- /reasoner/evaluation.tex	(revision 248)
+++ /reasoner/evaluation.tex	(revision 249)
@@ -66,5 +66,7 @@
 \subsubsection{Analysis}\label{sectEvalSetupAnalysis}
 
-After collecting the runtime measurements, we execute an R script which combines the results of all test executions for one complete run. The script produces various plots, all relating runtime and model complexity (cf. Section \ref{sectModelComplexity}). Some plots are inteded to provide an overview on the reasoning time for different modes, some how the reasoning time is composed (in terms of model translation time and constraint evaluation time) and some indicating the deviations across all series of test suite execution.
+After collecting the runtime measurements, we execute an R script which combines the results of all test executions for one complete run, calculates statistical measures and draws summarizing diagrams. To analyze the initial runs, the script reads the individual files created for the test suites from Section \ref{sectEvalSetupTreatments}, removes the first three runs per test case intra-experiment repetition, calculates average, mean, variance and confidence interval per test case (cf. \cite{BulejHorkTuma17} for validity) and join all results into one large table. Although the inter-experiment repetitions can be seen as individual experiments, they are executed in direct sequence on the same machine. Thus, we also follow \cite{BulejHorkTuma17} here and create in this case a statistical summary table over all runs resulting in a similar large table.
+
+Finally, zhe script produces various plots, all relating runtime (reasoning time, translation time or evaluation time) and model complexity (cf. Section \ref{sectModelComplexity}). These plots are inteded to provide an overview on the reasoning time for different modes, i.e., to discuss how the reasoning time is composed and some indicating the deviations across all series of test suite execution.
 
 \subsection{Model Ranking and Complexity}\label{sectModelComplexity}
Index: /reasoner/reasoner.bib
===================================================================
--- /reasoner/reasoner.bib	(revision 248)
+++ /reasoner/reasoner.bib	(revision 249)
@@ -493,9 +493,27 @@
 }
 
+@inproceedings{BulejHorkTuma17,
+ author = {Bulej, Lubom\'{\i}r and Hork\'{y}, Vojtech and T\r{u}ma, Petr},
+ title = {Do We Teach Useful Statistics for Performance Evaluation?},
+ booktitle = {Proceedings of the ACM/SPEC on International Conference on Performance Engineering Companion (ICPE '17), Companion},
+ _series = {ICPE '17 Companion},
+ year = {2017},
+ isbn = {978-1-4503-4899-7},
+ _location = {L'Aquila, Italy},
+ pages = {185--189},
+ numpages = {5},
+ url = {http://doi.acm.org/10.1145/3053600.3053638},
+ doi = {10.1145/3053600.3053638},
+ acmid = {3053638},
+ _publisher = {ACM},
+ _address = {New York, NY, USA},
+ keywords = {confidence interval, performance evaluation, performance evaluation education, statistical testing},
+} 
+
 @inproceedings{KnocheEichelberger18,
  author = {Knoche, Holger and Eichelberger, Holger},
  title = {Using the Raspberry Pi and Docker for Replicable Performance Experiments: Experience Paper},
- booktitle = {Proceedings of the ACM/SPEC International Conference on Performance Engineering},
- series = {ICPE '18},
+ booktitle = {Proceedings of the ACM/SPEC International Conference on Performance Engineering (ICPE'18)},
+ _series = {ICPE '18},
  year = {2018},
  isbn = {978-1-4503-5095-2},