Empirical O (Oemp) and the Knuth-Statistician debate
Published in Mathematics
Consider a triple nested loop with nothing inside, each loop being executed n times. The complexity is theoretically O(n3). However, when n is small, you can represent the time complexity by a polynomial of second degree because the innermost loop will execute very fast (as if it is independent of n). In nxn classical matrix multiplication (AnxnBnxn =Cnxn), this innermost loop is not empty but contains the compound operation c(i,k) = c(i, k) + a(i, j)* b(j,k).
Chakraborty and Choudhury (1999) showed that it is possible to work with a second degree polynomial by sacrificing a tolerable amount of predictive power and hence that for small sized matrices they have “achieved” an empirical O(n2) complexity. The quadratic fit would obviously not work for higher n, as pointed out by Prof. Donald Knuth (Stanford university) in a private communication to the corresponding author (Choudhury). Yet the paper by Chakraborty and Choudhury (1999) is important because it shows that the operation multiplication which is one of the operations in the compound operation mentioned above and which is claimed by computer scientists as the dominant operation in classical matrix multiplication is perhaps not as dominant, as theoretically presumed, on implementation because modern computers are fast. This means, if the classical matrix multiplication is modified and another operation is introduced which is more dominant than multiplication, and further if the complexity of this new more dominant operation is O(n2), it will be interesting to see where the bound goes! Such an operation which is more dominant than multiplication is comparison and a matrix multiplication algorithm with exactly n2 comparisons is Amir Schoor’s algorithm [Schoor (1982)].
Chakraborty being a statistician accepted Knuth’s challenge and defended the claim that an empirical O(n2) complexity is achievable even for higher n in nxn matrix multiplication successfully using Amir Schoor’s algorithm in an aggressively written paper by Chakraborty and Sourabh (2007a), keeping sufficient number of zeroes (in the pre-factor matrix) to reduce the dominance of multiplication. The n2 comparisons did the rest by pulling the complexity of the algorithm towards their own complexity, i.e., O(n2). The cubic fit collapsed on the quadratic fit and became an overfit. Overfits are not allowed in statistics. One cannot include a single extra term in the model unless it has something to contribute. Chakraborty claimed that he has finally achieved an empirical O(n2) complexity. His programmer and co-author (also his PhD scholar) Suman Kumar Sourabh was instructed to smash the cubic fit over a range as wide as possible using dynamic memory, thus throwing Knuth’s criticism, that the quadratic fit won’t work for higher n, out of gear. In the process, two new concepts were thrown in- a statistical complexity bound and its empirical estimate, which Chakraborty called “empirical O”. Chakraborty was working directly on time and mixing computing operations of different types. Assuming the time consumed by a computing operation as its weight (philosophy 1 in Chakraborty and Sourabh (2007a)), he claimed that he has in fact estimated a non-trivial weight based statistical bound (which is different from a count based mathematical bound that is operation specific and does not allow mixing of different operations) over a finite range by supplying numerical values to the weights which he and his co-author obtained by running computer experiments with program run time as the response variable of interest. As mentioned earlier, Chakraborty called this estimate empirical O and introduced the notation Oemp.
Applications of Empirical O:
- Statistical bound was initially created to make average complexity a more meaningful science. While average complexity is important as many algorithms with bad worst case complexity perform better on the average (e.g. Quick sort), the way average complexity is assessed makes it a brittle science. Average complexity is determined as the mathematical expectation of the dominant operation. If the following questions are raised, this science falls flat:-
Q1: In a complex code how do you know which operation is dominant? You need to know that before applying the mathematical expectation!
Observe that Amir Schoor applied mathematical expectation on the wrong operation (multiplication) taking it as dominant. The dominant operation in Schoor’s algorithm is comparison provided there are sufficient number of zeroes to reduce the dominance of multiplications even if multiplications are more in numbers wherein lies the concept of a statistical bound![Chakraborty and Sourabh (2007a)] Don’t forget the algorithm was developed for sparse matrices.
Q2: Expectation is a probabilistic concept. How do you know the probability distribution over which expectation is taken is realistic over the problem domain?
Observe that one of Donald Knuth’s proofs on the average interchanges in selection sort has been rejected in [Chakraborty and Sourabh (2007b)] for all continuous inputs, where ties are absent, which includes Normal distribution which is realistic (UPSHOT: if Normal distribution had been unrealistic, the entire literature on parametric statistical inference would vanish!)
Q3: What will be the average complexity of an algorithm if the input comes from a distribution for which expectation does not exist (e.g. Cauchy distribution)?
Each of these questions can be effectively tackled using empirical O. For the first question, empirical O can be used to detect the dominant operation and once this is identified, mathematical expectation can be applied accordingly. For the second question, one can use empirical O to cross check whether the average complexity measure which is generally derived over uniform distribution is true for both discrete and continuous cases where ties are present and absent respectively and also whether the measure holds for non-uniform distribution inputs, both discrete and continuous.
For the third, empirical O is the only remedy as theoretical analysis is simply clueless.
- Later on, as indicated in section 7.2 of [Chakraborty et. al. (2023) chap. 7], it is found that statistical bound has several other applications. In worst case complexity for sequential computing, empirical O gives a certificate on the level of conservativeness of the guarantee giving mathematical bounds (quite often the algorithm performs better than what the mathematical bound says making the bound conservative).
- Statistical bound can also refute a tall mathematical claim statistically in best case (this means, for example, the best case complexity in bubble sort is O(n) for an already sorted array as you have the provision to stop the algorithm abruptly declaring the array to be sorted if a single pass does not produce any interchange but have you ever checked what % of the array should arrive already sorted for a linear fit to be a good fit and not be an underfit?).
- In parallel computing, especially in a distributed system, statistical bound is arguably the ideal bound. This is because the constants associated with the mathematical bounds depend on the processors on implementation. In parallel computing, there are multiple processors. In a distributed system, even the types of the processors are different. It is only when the bound itself is based on weights, that it will take care of the change of processors including changing their type, and hence would be meaningful. The desired bound is our statistical bound.
Remark: A recent application of the science of weights can be found in the RSA and AES-128 algorithms in cryptography [Pranav, Dutta and Chakraborty, (2021)].
References:-
Chakraborty, S. and Choudhury, P. P., Can Statistics Provide a Realistic Measure for an Algorithm’s
Complexity? Applied Mathematics Letters 12(7), 1999, p.113-118 (Elsevier Sc.)
Chakraborty, S., Pranav, P., Khatoon, N. and S. Dutta, A Guide to Design and Analysis
of Algorithms, Nova Science Publishers Inc., New York, 2023 (chap 7, sec 7.2)
Chakraborty, S. and Sourabh, S. K. , On Why an Algorithmic Time Complexity Measure can be
System Invariant rather than System Independent, Applied Mathematics and Computation, Vol.
190(1), 2007a, p. 195-204 (Elsevier Sc.)
Chakraborty, S. and Sourabh, S. K., How Robust Are Average Complexity Measures? A Statistical
Case Study, Applied Mathematics and Computation, vol. 189(2), 2007b, 1787-1797 (Elsevier Sc.)
Schoor, A., Fast algorithm for sparse matrix multiplication, Information Processing Letters, 15(2), 1982, 87–89 (Elsevier Sc.)
Pranav, P., Dutta S. & Chakraborty, S., Empirical and statistical comparison of intermediate
steps of AES-128 and RSA in terms of time consumption. Soft Computing (2021). Vol 25, issue
21, p.13127-13145 (Springer)
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in