Life in Research

Empirical O (Oemp) and the Knuth-Statistician debate

The note gives the exact genesis of the Knuth-Statistician debate. Statistical bound and its empirical estimate, Empirical O (Oemp), are fully explained with applications.

Published in Mathematics

May 03, 2026

Soubhik Chakraborty

Professor, Department of Mathematics, Birla Institute of Technology, Mesra

Liked by Alice Taylor

Consider a triple nested loop with nothing inside, each loop being executed n times. The complexity is theoretically O(n³). However, when n is small, you can represent the time complexity by a polynomial of second degree because the innermost loop will execute very fast (as if it is independent of n). In nxn classical matrix multiplication (A_nxnB_nxn =C_nxn), this innermost loop is not empty but contains the compound operation c(i,k) = c(i, k) + a(i, j)* b(j,k).

Chakraborty and Choudhury (1999) showed that it is possible to work with a second degree polynomial by sacrificing a tolerable amount of predictive power and hence that for small sized matrices they have “achieved” an empirical O(n²) complexity. The quadratic fit would obviously not work for higher n, as pointed out by Prof. Donald Knuth (Stanford university) in a private communication to the corresponding author (Choudhury). Yet the paper by Chakraborty and Choudhury (1999) is important because it shows that the operation multiplication which is one of the operations in the compound operation mentioned above and which is claimed by computer scientists as the dominant operation in classical matrix multiplication is perhaps not as dominant, as theoretically presumed, on implementation because modern computers are fast. This means, if the classical matrix multiplication is modified and another operation is introduced which is more dominant than multiplication, and further if the complexity of this new more dominant operation is O(n²), it will be interesting to see where the bound goes! Such an operation which is more dominant than multiplication is comparison and a matrix multiplication algorithm with exactly n² comparisons is Amir Schoor’s algorithm [Schoor (1982)].

Chakraborty being a statistician accepted Knuth’s challenge and defended the claim that an empirical O(n²) complexity is achievable even for higher n in nxn matrix multiplication successfully using Amir Schoor’s algorithm in an aggressively written paper by Chakraborty and Sourabh (2007a), keeping sufficient number of zeroes (in the pre-factor matrix) to reduce the dominance of multiplication. The n² comparisons did the rest by pulling the complexity of the algorithm towards their own complexity, i.e., O(n²). The cubic fit collapsed on the quadratic fit and became an overfit. Overfits are not allowed in statistics. One cannot include a single extra term in the model unless it has something to contribute. Chakraborty claimed that he has finally achieved an empirical O(n²) complexity. His programmer and co-author (also his PhD scholar) Suman Kumar Sourabh was instructed to smash the cubic fit over a range as wide as possible using dynamic memory, thus throwing Knuth’s criticism, that the quadratic fit won’t work for higher n, out of gear. In the process, two new concepts were thrown in- a statistical complexity bound and its empirical estimate, which Chakraborty called “empirical O”. Chakraborty was working directly on time and mixing computing operations of different types. Assuming the time consumed by a computing operation as its weight (philosophy 1 in Chakraborty and Sourabh (2007a)), he claimed that he has in fact estimated a non-trivial weight based statistical bound (which is different from a count based mathematical bound that is operation specific and does not allow mixing of different operations) over a finite range by supplying numerical values to the weights which he and his co-author obtained by running computer experiments with program run time as the response variable of interest. As mentioned earlier, Chakraborty called this estimate empirical O and introduced the notation O_emp.

Applications of Empirical O:

Statistical bound was initially created to make average complexity a more meaningful science. While average complexity is important as many algorithms with bad worst case complexity perform better on the average (e.g. Quick sort), the way average complexity is assessed makes it a brittle science. Average complexity is determined as the mathematical expectation of the dominant operation. If the following questions are raised, this science falls flat:-

Q1: In a complex code how do you know which operation is dominant? You need to know that before applying the mathematical expectation!

Observe that Amir Schoor applied mathematical expectation on the wrong operation (multiplication) taking it as dominant. The dominant operation in Schoor’s algorithm is comparison provided there are sufficient number of zeroes to reduce the dominance of multiplications even if multiplications are more in numbers wherein lies the concept of a statistical bound![Chakraborty and Sourabh (2007a)] Don’t forget the algorithm was developed for sparse matrices.

Q2: Expectation is a probabilistic concept. How do you know the probability distribution over which expectation is taken is realistic over the problem domain?

Observe that one of Donald Knuth’s proofs on the average interchanges in selection sort has been rejected in [Chakraborty and Sourabh (2007b)] for all continuous inputs, where ties are absent, which includes Normal distribution which is realistic (UPSHOT: if Normal distribution had been unrealistic, the entire literature on parametric statistical inference would vanish!)

Q3: What will be the average complexity of an algorithm if the input comes from a distribution for which expectation does not exist (e.g. Cauchy distribution)?

Each of these questions can be effectively tackled using empirical O. For the first question, empirical O can be used to detect the dominant operation and once this is identified, mathematical expectation can be applied accordingly. For the second question, one can use empirical O to cross check whether the average complexity measure which is generally derived over uniform distribution is true for both discrete and continuous cases where ties are present and absent respectively and also whether the measure holds for non-uniform distribution inputs, both discrete and continuous.

For the third, empirical O is the only remedy as theoretical analysis is simply clueless.

Later on, as indicated in section 7.2 of [Chakraborty et. al. (2023) chap. 7], it is found that statistical bound has several other applications. In worst case complexity for sequential computing, empirical O gives a certificate on the level of conservativeness of the guarantee giving mathematical bounds (quite often the algorithm performs better than what the mathematical bound says making the bound conservative).
Statistical bound can also refute a tall mathematical claim statistically in best case (this means, for example, the best case complexity in bubble sort is O(n) for an already sorted array as you have the provision to stop the algorithm abruptly declaring the array to be sorted if a single pass does not produce any interchange but have you ever checked what % of the array should arrive already sorted for a linear fit to be a good fit and not be an underfit?).
In parallel computing, especially in a distributed system, statistical bound is arguably the ideal bound. This is because the constants associated with the mathematical bounds depend on the processors on implementation. In parallel computing, there are multiple processors. In a distributed system, even the types of the processors are different. It is only when the bound itself is based on weights, that it will take care of the change of processors including changing their type, and hence would be meaningful. The desired bound is our statistical bound.

Remark: A recent application of the science of weights can be found in the RSA and AES-128 algorithms in cryptography [Pranav, Dutta and Chakraborty, (2021)].

References:-

Chakraborty, S. and Choudhury, P. P., Can Statistics Provide a Realistic Measure for an Algorithm’s
Complexity? Applied Mathematics Letters 12(7), 1999, p.113-118 (Elsevier Sc.)

Chakraborty, S., Pranav, P., Khatoon, N. and S. Dutta, A Guide to Design and Analysis
of Algorithms, Nova Science Publishers Inc., New York, 2023 (chap 7, sec 7.2)

Chakraborty, S. and Sourabh, S. K. , On Why an Algorithmic Time Complexity Measure can be
System Invariant rather than System Independent, Applied Mathematics and Computation, Vol.
190(1), 2007a, p. 195-204 (Elsevier Sc.)

Chakraborty, S. and Sourabh, S. K., How Robust Are Average Complexity Measures? A Statistical
Case Study, Applied Mathematics and Computation, vol. 189(2), 2007b, 1787-1797 (Elsevier Sc.)

Schoor, A., Fast algorithm for sparse matrix multiplication, Information Processing Letters, 15(2), 1982, 87–89 (Elsevier Sc.)

Pranav, P., Dutta S. & Chakraborty, S., Empirical and statistical comparison of intermediate
steps of AES-128 and RSA in terms of time consumption. Soft Computing (2021). Vol 25, issue
21, p.13127-13145 (Springer)

Soubhik Chakraborty (He/Him)

Professor, Department of Mathematics, Birla Institute of Technology, Mesra

Primarily a statistician, I am currently a Professor and formerly Head in the Department of Mathematics, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India. My research interests are algorithm analysis. music analysis and statistical computing. I have guided several research scholars in these areas leading to PhD and authored many books, research monographs and papers in peer reviewed journals. My book Computational Musicology in Hindustani Music, published by Springer in 2014 has been declared as the first book dedicated to the topic by Springer (https://link.springer.com/book/10.1007/978-3-319-11472-9).

My research paper titled On Why an Algorithmic Time Complexity Measure can be System Invariant rather than System Independent published in Applied Mathematics and Computation (Elsevier Sc.), Vol. 190(1), 2007, p. 195-204 reflects my hot debate with Prof. Donald E. Knuth (Stanford university). My final views on this debate are available in section 7.2 of my recently published book A Guide to Design and Analysis of Algorithms published by Nova Science publishers Inc., N. Y. in 2023.

Some of my other books worth a mention include Music and Medicine: Healing Brain Injury Through Ragas, CBH pub, 2016 (with Audio CD) which reveals my successful application of Hindustani classical music for speedy recovery of patients with brain injury, achieved with a medical team at Rajendra Institute of Medical Sciences (RIMS), Ranchi. I am one of the authors of the book Signal Analysis of Hindustani Classical Music, Springer, 2017 and the lead author of the book Hindustani Classical Music: A Historical and Computational Study, Sanctum Books, 2021 (IMTA award winner as the best book on raga music given by Indian Music Therapy Association (IMTA) in 2022).

My books Outliers: Detection and Analysis, Nova Science Publishers, Inc., 2022 and Statistical Musicology: Embracing Hindustani Ragas, Nova Science Publishers, Inc. 2022 and Randomness Revisited Using the V Programming Language, Nova Sc. Inc., 2023 are useful both for academics and the industries alike.

I have been the principal investigator of two major research projects, titled Analyzing the Structure and Performance of Hindustani Classical Music through Statistics sponsored by the University Grants Commission and Hindustani Raga Analysis Using Statistical Musicology with Therapeutic Applications for Stress Management sponsored by IDEAS: Technology Innovation Hub @ Indian Statistical Institute, Kolkata. I have recently published two patents in connection with the second project mentioned herein.

My recent book titled Stress Management Through Music: A Statistical Study, published by Notion Press in 2024 has won the IMTA-NADA award for the best book for the year 2024 for "outstanding contribution to music therapy".

A life member of Indian Statistical Institute and Acoustical Society of India, I am also an acknowledged reviewer associated with ACM, IEEE and AMS. I have received several awards in both teaching and research.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Spotlight on Research from India

Research Publishing > Spotlight on Research from India

Applications of Mathematics

Mathematics and Computing > Mathematics > Applications of Mathematics

Latest Content

Events

Connecting Planetary Health, Climate Justice, and Community Health: Join the Global Community Health Annual Workshop 2026

Geoscience Diplomacy with Iran in Hormuz Region

BACTRAC: Using Thrombectomy to Open a Window into Human Stroke Biology

Life in Research

The Health Barrier Nobody Mentioned

The Pyramid of Khufu

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Empirical O (Oemp) and the Knuth-Statistician debate

Share this post

Share with...

...or copy the link