Notable Advances in Statistics: 1991-1999
The initial version of statistical programming language “R” was released in 1995. The open-source R was an implementation of the programming language S, which evolved into the commercially available S-plus after AT&T sold S. Open-source R became dominant in the academic world of statistical computation and computational statistics, and to an ever-increasing extent, the statistical world of teaching, publishing, and real-world applications. There were thousands of add-on packages for R, with enormous redundancies, and often with sub-optimal and poor documentation. Many statisticians learn R as their first programming language instead of basic languages such as Python, Java, C, Perl, or HTML.
By the end of the century, meta-analysis was a prime area for innovation in biostatistical practice. The Cochrane Collaboration (often shortened to ‘Cochrane’), was founded in 1993 to provide up-to-date, systematic reviews of all relevant randomized controlled trials of health care. By 2015, data and reviews were being provided to Cochrane by 31,000 contributors from more than 120 countries.
The search for patterns in big data was advanced by the 1995 Benjamini and Hochberg paper on false discovery rates.3
The Journal of Statistical Software was founded in 1996. Also in 1996, SPSS bought BMDP, then ignored it. In 2009 IBM bought SPSS and converted it into business analytics software. STATA, which was developed in 1985, continued to be used by many social scientists who liked its command line interface and huge archive of contributed code.
Statisticians were devising a variety of computer-intensive techniques for statistical analysis and modelling, a whole new field of endeavor precipitated by Efron’s bootstrap method. Computer intensive calculations were especially useful to nonparametric statistics and Bayesian statistics. The power of personal computers made monte carlo evaluations a practical and convincing way for statisticians to check mathematical derivations and to compare alternative methods. Algorithms for pattern analysis based on kernel methods in the support vector machine (SVM) became popular in the 1990s when the SVM was found to competitive with other algorithms used in machine learning applications.
The wide acceptance of the internet, social media, digital cameras, and online shopping allowed interested organizations to collect vast amount of data. “Data analysis” and “data analysts” became popular terms for those who examined big data and discovered informative patterns. The term “big data” was first proposed in print in 1997. Data sets where many quantities were measured on each of few sample units were described by statisticians as “big p, small n”. Big data issues arose in biostatistics when biologists began to collect “omics” data, including high-dimensional genomics, proteomics, and metabolomics observations. Another type of big data problem was the modeling and analysis of digital images, perhaps from microscopes or medical diagnostic instruments or satellites; statisticians necessarily began to devise novel image analysis techniques. By the end of the 20th century, geographic information system (GIS) software packages were consolidated and standardized, and GIS data sets were viewable over the internet.
During the 1990s, the American Statistical Association authorized 8 new sections, the Statistical Consulting Section (CNSL), the Statistics in Marketing Section (MKTG), the Teaching Statistics in Health Sciences Section (TSHS), the Bayesian Statistical Science Section (SBSS), the Statistics in Epidemiology Section (EPI), the Statistics In Sports Section (SIS), the Health Policy Statistics Section (HPSS), and the Risk Analysis Section (RISK).
Last revised: 2021-04-19