# Benford’s Law Applied to COVID-19 Reports

--

A look at COVID-19 case reports from around the world to see how well the numbers of daily positive cases fit into Benford’s Law. The better the fit, the more accurate the data.

## What is Benford’s Law?

Numbers that represent real-life events follow a certain regularity. Specifically, the first digit of these numbers follows a strange pattern with the number 1 appearing about 30% of the time, the number 2 about 18% of the time, etc. — a frequency that declines logarithmically. This pattern is known as Benford’s Law, and it can be used to identify fraud and other irregularities with reported numbers. To learn more, see this Wikipedia page or the 2020 Netflix show “Connected” (episode “Digits”).

## How does it apply to COVID-19 reports?

COVID-19 reports are made of numbers just like any other reports created by people, and that makes it possible to apply Benford’s Law. I obtained the daily positive case numbers for the U.S. from The Covid Tracking Project, and for countries around the world from Johns Hopkins University, and calculated the frequency of numbers 1 to 9 in the first digit of the numbers. I compared the results to the expected, or Benford, frequency to get the Benford Error — the difference between the actual frequency and the frequency expected by Benford’s Law. This error tells how good or bad the data for the location is.

For example, England, UK has reported a total of 262 numbers of daily positive cases of COVID-19 since they started reporting as a standalone location on June 11, 2020. Here are the recent 10 of them: 10296, 8964, 8408, 9420, 7292, 8644, 8623, 7393, 6527, 5080. If you look at each of these 262 numbers, you will find that the number 1 is in the first digit 0.3664 (36.64%) of the time, the number 2 — 0.1641(16.41%) of the time, and so on. But Benford’s Law states that the number 1 should be found in the first digit 0.3010 (30.10%) of the time, the number 2 — 0.1761(17.61%) of the time, etc. So England’s report is off by 0.0654 for the number 1, by 0.0120 for the number 2 (the sign of the difference doesn’t matter), and so on. For all 9 numbers, their report is off by 0.1602 and that is their Benford Error.

## What are the results?

Using the Benford Error, I ranked the locations from best (smallest error) to worst (largest error), and created plots for each location that show the error visually.

USA:

`1. Oregon (OR), e=0.08492. Guam (GU), e=0.09393. Arizona (AZ), e=0.10374. Montana (MT), e=0.11095. Wyoming (WY), e=0.11106. District of Columbia (DC), e=0.11617. Utah (UT), e=0.11798. Kentucky (KY), e=0.13169. Rhode Island (RI), e=0.132110. Washington (WA), e=0.139911. North Dakota (ND), e=0.141112. Alaska (AK), e=0.152813. Tennessee (TN), e=0.156914. Connecticut (CT), e=0.158115. Delaware (DE), e=0.168916. Alabama (AL), e=0.173917. Kansas (KS), e=0.174818. California (CA), e=0.175419. Louisiana (LA), e=0.178620. South Dakota (SD), e=0.181221. Wisconsin (WI), e=0.190322. North Carolina (NC), e=0.192323. Vermont (VT), e=0.192624. Nevada (NV), e=0.192625. Nebraska (NE), e=0.204126. Oklahoma (OK), e=0.207827. Arkansas (AR), e=0.209828. Georgia (GA), e=0.212529. Puerto Rico (PR), e=0.214630. Texas (TX), e=0.224131. Ohio (OH), e=0.225232. Mississippi (MS), e=0.226333. South Carolina (SC), e=0.226934. New Hampshire (NH), e=0.229835. Michigan (MI), e=0.229936. Hawaii (HI), e=0.232037. West Virginia (WV), e=0.236238. Idaho (ID), e=0.246039. Massachusetts (MA), e=0.255940. Virginia (VA), e=0.260541. Florida (FL), e=0.273242. Iowa (IA), e=0.276443. Illinois (IL), e=0.299944. U.S. Virgin Islands (VI), e=0.321345. Maryland (MD), e=0.323246. New Mexico (NM), e=0.323747. Minnesota (MN), e=0.352548. Colorado (CO), e=0.356449. Missouri (MO), e=0.359850. Maine (ME), e=0.369851. Pennsylvania (PA), e=0.375752. Indiana (IN), e=0.390253. New York (NY), e=0.489054. New Jersey (NJ), e=0.552655. Northern Mariana Islands (MP), e=0.6040`

World:

`1. Jordan, e=0.06252. Ukraine, Sumy Oblast, e=0.06613. Netherlands, Aruba, e=0.06784. Malawi, e=0.06795. Australia, New South Wales, e=0.07756. Peru, Pasco, e=0.08647. Germany, Thuringen, e=0.08938. Spain, C Valenciana, e=0.09169. Brazil, Maranhao, e=0.092810. Namibia, e=0.0967...661. Russia, Ulyanovsk Oblast, e=0.8742662. Russia, Volgograd Oblast, e=0.8867663. Russia, Krasnoyarsk Krai, e=0.9153664. Russia, Krasnodar Krai, e=0.9320665. Russia, Novosibirsk Oblast, e=0.9341666. Russia, Karachay Cherkess, e=0.9381667. Russia, Saratov Oblast, e=0.9396668. Russia, Orenburg Oblast, e=0.9684669. Tajikistan, e=0.9896670. Russia, Mordovia Republic, e=1.0343`

For the full results, please see this GitHub repository and the following files:

`usa_rank.csv` (full link) — File showing how each U.S. state or territory ranks from best to worst based on how their COVID-19 case numbers fit into Benford's Law. The last column has the file name with the Benford plot for the location.

`usa_output/` (full link) — Folder with Benford plots for U.S. states and territories.

`world_rank.csv` (full link) — File showing how each country and province ranks from best to worst based on how their COVID-19 case numbers fit into Benford's Law. This file is searchable. The last column has the file name with the Benford plot for the location.

`world_output/` (full link) — Folder with Benford plots for world countries and their provinces.

To see the original data:

`usa_data/` (full link) — Folder with the original COVID-19 data for the U.S. from The Covid Tracking Project.

`world_data/` (full link) — Folder with the original COVID-19 data for the world from Johns Hopkins University.

`extra/world.csv` (full link) — A version of the world data in a single file, showing the data in a more concise way than the original data.

## How to interpret the results?

Small errors mean the reported cases are likely to be true and accurate, and large errors indicate inaccuracy. Large errors can be a sign of insufficient testing, misreporting, or direct falsification.

For the U.S., the error ranges from 0.08 for Oregon to 0.55 for New Jersey. For the world, the error ranges from 0.06 for Jordan to 1.03 for Mordovia, Russia.

Example of small error (good Benford fit):

Example of large error (bad Benford fit):

## What time period is covered? How many numbers used?

The data covers the period from the beginning of COVID-19 reporting in the early 2020 to March 3, 2021, or about 1 year of data or 365 numbers per location, 725 different locations (55 for the U.S. and 670 for the rest of the world). The exact number of numbers (no pun intended) varies by location because they didn’t start reporting at the same time. It also varies because zeros and negative numbers are unusable and were dropped. The actual number of numbers used for Benford-ness is included in the output, so the reader can take this metric into account along with the error. About 100 locations were excluded from the ranking because they had too few numbers (less than 50 usable numbers). These are typically small territories or places like cruise ships.