# Benford’s Law Applied to COVID-19 Reports

**A look at COVID-19 case reports from around the world to see how well the numbers of daily positive cases fit into Benford’s Law. The better the fit, the more accurate the data.**

## What is Benford’s Law?

Numbers that represent real-life events follow a certain regularity. Specifically, the first digit of these numbers follows a strange pattern with the number 1 appearing about 30% of the time, the number 2 about 18% of the time, etc. — a frequency that declines logarithmically. This pattern is known as Benford’s Law, and it can be used to identify fraud and other irregularities with reported numbers. To learn more, see this Wikipedia page or the 2020 Netflix show “Connected” (episode “Digits”).

## How does it apply to COVID-19 reports?

COVID-19 reports are made of numbers just like any other reports created by people, and that makes it possible to apply Benford’s Law. I obtained the daily positive case numbers for the U.S. from The Covid Tracking Project, and for countries around the world from Johns Hopkins University, and calculated the frequency of numbers 1 to 9 in the first digit of the numbers. I compared the results to the expected, or Benford, frequency to get the **Benford Error **— the difference between the actual frequency and the frequency expected by Benford’s Law. This error tells how good or bad the data for the location is.

For example, England, UK has reported a total of 262 numbers of daily positive cases of COVID-19 since they started reporting as a standalone location on June 11, 2020. Here are the recent 10 of them: 10296, 8964, 8408, 9420, 7292, 8644, 8623, 7393, 6527, 5080. If you look at each of these 262 numbers, you will find that the number 1 is in the first digit 0.3664 (36.64%) of the time, the number 2 — 0.1641(16.41%) of the time, and so on. But Benford’s Law states that the number 1 should be found in the first digit 0.3010 (30.10%) of the time, the number 2 — 0.1761(17.61%) of the time, etc. So England’s report is off by 0.0654 for the number 1, by 0.0120 for the number 2 (the sign of the difference doesn’t matter), and so on. For all 9 numbers, their report is off by 0.1602 and that is their Benford Error.

## What are the results?

Using the Benford Error, I ranked the locations from best (smallest error) to worst (largest error), and created plots for each location that show the error visually.

USA:

`1. Oregon (OR), e=0.0849`

2. Guam (GU), e=0.0939

3. Arizona (AZ), e=0.1037

4. Montana (MT), e=0.1109

5. Wyoming (WY), e=0.1110

6. District of Columbia (DC), e=0.1161

7. Utah (UT), e=0.1179

8. Kentucky (KY), e=0.1316

9. Rhode Island (RI), e=0.1321

10. Washington (WA), e=0.1399

11. North Dakota (ND), e=0.1411

12. Alaska (AK), e=0.1528

13. Tennessee (TN), e=0.1569

14. Connecticut (CT), e=0.1581

15. Delaware (DE), e=0.1689

16. Alabama (AL), e=0.1739

17. Kansas (KS), e=0.1748

18. California (CA), e=0.1754

19. Louisiana (LA), e=0.1786

20. South Dakota (SD), e=0.1812

21. Wisconsin (WI), e=0.1903

22. North Carolina (NC), e=0.1923

23. Vermont (VT), e=0.1926

24. Nevada (NV), e=0.1926

25. Nebraska (NE), e=0.2041

26. Oklahoma (OK), e=0.2078

27. Arkansas (AR), e=0.2098

28. Georgia (GA), e=0.2125

29. Puerto Rico (PR), e=0.2146

30. Texas (TX), e=0.2241

31. Ohio (OH), e=0.2252

32. Mississippi (MS), e=0.2263

33. South Carolina (SC), e=0.2269

34. New Hampshire (NH), e=0.2298

35. Michigan (MI), e=0.2299

36. Hawaii (HI), e=0.2320

37. West Virginia (WV), e=0.2362

38. Idaho (ID), e=0.2460

39. Massachusetts (MA), e=0.2559

40. Virginia (VA), e=0.2605

41. Florida (FL), e=0.2732

42. Iowa (IA), e=0.2764

43. Illinois (IL), e=0.2999

44. U.S. Virgin Islands (VI), e=0.3213

45. Maryland (MD), e=0.3232

46. New Mexico (NM), e=0.3237

47. Minnesota (MN), e=0.3525

48. Colorado (CO), e=0.3564

49. Missouri (MO), e=0.3598

50. Maine (ME), e=0.3698

51. Pennsylvania (PA), e=0.3757

52. Indiana (IN), e=0.3902

53. New York (NY), e=0.4890

54. New Jersey (NJ), e=0.5526

55. Northern Mariana Islands (MP), e=0.6040

World:

1. Jordan, e=0.0625

2. Ukraine, Sumy Oblast, e=0.0661

3. Netherlands, Aruba, e=0.0678

4. Malawi, e=0.0679

5. Australia, New South Wales, e=0.0775

6. Peru, Pasco, e=0.0864

7. Germany, Thuringen, e=0.0893

8. Spain, C Valenciana, e=0.0916

9. Brazil, Maranhao, e=0.0928

10. Namibia, e=0.0967...661. Russia, Ulyanovsk Oblast, e=0.8742

662. Russia, Volgograd Oblast, e=0.8867

663. Russia, Krasnoyarsk Krai, e=0.9153

664. Russia, Krasnodar Krai, e=0.9320

665. Russia, Novosibirsk Oblast, e=0.9341

666. Russia, Karachay Cherkess, e=0.9381

667. Russia, Saratov Oblast, e=0.9396

668. Russia, Orenburg Oblast, e=0.9684

669. Tajikistan, e=0.9896

670. Russia, Mordovia Republic, e=1.0343

For the full results, please see this GitHub repository and the following files:

`usa_rank.csv`

(full link) — File showing how each U.S. state or territory ranks from best to worst based on how their COVID-19 case numbers fit into Benford's Law. The last column has the file name with the Benford plot for the location.

`usa_output/`

(full link) — Folder with Benford plots for U.S. states and territories.

`world_rank.csv`

(full link) — File showing how each country and province ranks from best to worst based on how their COVID-19 case numbers fit into Benford's Law. This file is searchable. The last column has the file name with the Benford plot for the location.

`world_output/`

(full link) — Folder with Benford plots for world countries and their provinces.

To see the original data:

`usa_data/`

(full link) — Folder with the original COVID-19 data for the U.S. from The Covid Tracking Project.

`world_data/`

(full link) — Folder with the original COVID-19 data for the world from Johns Hopkins University.

`extra/world.csv`

(full link) — A version of the world data in a single file, showing the data in a more concise way than the original data.

## How to interpret the results?

Small errors mean the reported cases are likely to be true and accurate, and large errors indicate inaccuracy. Large errors can be a sign of insufficient testing, misreporting, or direct falsification.

For the U.S., the error ranges from 0.08 for Oregon to 0.55 for New Jersey. For the world, the error ranges from 0.06 for Jordan to 1.03 for Mordovia, Russia.

Example of small error (good Benford fit):

Example of large error (bad Benford fit):

## What time period is covered? How many numbers used?

The data covers the period from the beginning of COVID-19 reporting in the early 2020 to March 3, 2021, or about 1 year of data or 365 numbers per location, 725 different locations (55 for the U.S. and 670 for the rest of the world). The exact number of numbers (no pun intended) varies by location because they didn’t start reporting at the same time. It also varies because zeros and negative numbers are unusable and were dropped. The actual number of numbers used for Benford-ness is included in the output, so the reader can take this metric into account along with the error. About 100 locations were excluded from the ranking because they had too few numbers (less than 50 usable numbers). These are typically small territories or places like cruise ships.