[ad_1]
Key analysis and exclusive content under the tree. You can even give a last-minute gift to someone with a one-year Portfolio Signature subscription. And if you buy an annual subscription for multiple friends, friends and family at the same time, you can also get a quantity discount. So in addition to being a useful holiday gift, it can even support the production of quality inexpensive content. Know more
Benford’s rule is a widely accepted method of testing the originality of numbers that occur in everyday life. For large data that spans several orders of magnitude, or from a wide variety of different sources, the leading digits are more likely to be small than large. In the case of the decimal number system, for example, from the application of Benford’s rule it follows that The probability of occurrence of numbers that start with 1 is approximately 30%, while that of 9 does not even reach 5%.
It should be noted that it is natural that the numbers are not manipulated by humans.
Exactly what this statement sounds like, who invented it, and why it applies in the world, we wrote earlier in this article. Next, we will report on a particularly relevant application of Benford’s rule: on the reliability of official COVID case numbers.
What does Benford say?
In fact, a recent study has emerged (the work of Anran Wei and Andre E. Vellwock) on whether there is any manipulation in the numbers of COVID cases reported by individual countries. The researchers used four data sets: daily and total confirmed case numbers, and mortality numbers in the same two visits.
The COVID-19 data comes from the CSSE (Center for Systems Science and Engineering, Johns Hopkins University) and covers the period through September 1, 2020. Data disaggregated by region has been aggregated at the country level, as that the objective is to obtain as many observations as possible for the examination of Benford’s rule.
Here, however, it should be noted that the case of China is difficult to analyze because, according to official statistics, its situation stabilized very quickly and therefore there is insufficient monitoring. Therefore, in the case of China, the researchers decided that it was even more fortunate to examine the case numbers at the regional level to also increase the number of observations.
By Benford’s rule, the leading figures in the observations must occur with the following frequency:
However, it must be emphasized that a mere deviation from the theoretical distribution does not mean that the numbers have been manipulated. Therefore, it is worth conducting the study in such a way that a critical value or a level of statistical significance is determined, based on which we accept whether a data set complies with Benford’s rule. In the present case, the researchers considered that the data would be manipulated for a normalized squared deviation (factor d) greater than 25%, based on the work of William Goodman.
Suspicious results
The main finding of the researchers is that the COVID-19 case numbers generally comply with Benford’s rule, that is, the frequencies of the first digits are distributed in a similar way as it is clear from the theory. Measurements in all countries show that the d factor is only 3%, which is very close to the expected values.
The researchers then ranked the countries where the number of cases was large enough to allow the study to be done correctly. They conclude that there is no evidence of data manipulation in most of the countries studied, such as the United States, Brazil, India, Peru, or even the Republic of South Africa.
The main results of the researchers and the fit of Benford’s rule in the data (with the mean values of the factor d) can be seen in the following figure:
Clearly two suspected cases emerged: Russia and Iran. The COVID numbers of these autocratic regimes produced very strange differences in the previous study. In the case of Iran, for example, the number 2 has a difficult-to-explain jump in the daily numbers, resulting in a d-factor of 42 factors. But in the totals, they already seem to comply with the Benford rule. Therefore, the outcome in your case is not entirely clear.
Russia is much more interesting in this regard, as the increase in the total number of cases also does not follow Benford’s rule. Also, the numbers occur with almost uniform probability, which can be called very strange, according to the researchers. If the difference was only greater than an unexpected value, it could be explained by the fact that the counter is still there. But in the case of the Russians I don’t see this.
The final conclusion of the authors is that the Benford rule, within the specified sensitivity, seems to prevail in most of the countries studied, but in two of these places, the Russians and the Iranians, the figures may have been manipulated.
To this, however, we must add that while the Benfrod rule is a common method for investigating similar fraud, it is not a perfect tool. There is no scientifically accepted limit beyond which manual control of numbers can be ensured. In addition, there have been cases of false positives in other studies in which naturally occurring data sets did not meet Benford’s rule.
Therefore, it is not possible to be absolutely sure, as in the cited study. But the suspicion is there. Unfortunately, Hungary is not included in their study, so we tested the national COVID numbers ourselves. The results are interesting, and we will report on this soon in another article here on the Portfolio Prof page.
[ad_2]