Concepedia

TLDR

The article introduces the USPTO Patent Examination Research Dataset (PatEx), a collection of nearly 9.2 million US patent applications, and examines its selection biases and representativeness. The authors generate statistical evidence across application characteristics—type, age, ownership, origin, family status, and technology class—to identify areas prone to selectivity issues. They find that data are sparse before 1981 and that pre‑2001 filings suffer serious selection bias due to nonpublication, but after the November 2000 policy change coverage and representativeness improve, making PatEx generally representative of post‑2000 US patent applications across observable characteristics.

Abstract

Abstract This article describes the “USPTO Patent Examination Research Dataset” ( PatEx ) and explores possible selection issues and the representativeness of the nearly 9.2 million US patent application records it contains. We find that data are sparse for years before 1981, and that serious selection issues affect records on applications filed prior to 2001 due to nonpublication in the United States. Following implementation of a policy change in November 2000, both coverage and representativeness of the PatEx data improve substantially. We uncover specific areas that are prone to selectivity issues, by generating statistical evidence across application characteristics such as application type, age, ownership type, domestic or foreign origin, patent family status, and technology class among others. Although our exploration suggests to researchers several categories of specific concern, our findings overall show that the PatEx data are generally representative of the population of patent applications filed in the United States after November 2000 across observable characteristics.

References

YearCitations

Page 1