Data Science History

Data Science involves different subjects such as data mining, data statistics, data analysis, and data visualization. Collating vast data sets and managing them for specific purposes is the basic process, but there is a need to upgrade since technology continues to advance constantly.

The data science method has been used for a long time, even before it was given the term. The term “data science” was first introduced in 1974 by Peter Naur. He proposed the name as an alternative term for computer science. The topic, however, was featured specifically in a conference at the International Federation of Classification in 1996.

The evolution of modern data science took a long journey to what it is now. Keep reading to learn its rich history and the long journey it took over the decades.

Extensive Demographic Data Collection in 1663 by John Graunt

John Graunt was a renowned British demographer who desired to have a warning system to control the spread of bubonic plague. His extensive demographic data collection became the first data analysis method used in England, making him the “Father of Demographics.”

Theory of Conditional Probability in 1763 by Thomas Bayes

Bayes’ theorem, or the theory of conditional probability, became a strong foundation of Data Science in 1763. According to Bayes, conditional probability becomes a possibility that some events related to it have already happened. His theorem aims to update probabilities using previous evidence, additional evidence, and knowledge.

First Computer Programming in 1840 by Ada Lovelace

Ada Lovelace was the first to introduce computer programming to Data Science in the 17th Century. Being an associate with Charles Babbage, she gained more knowledge about programming and mechanical computation. She used what she had learned and eventually proposed a unique algorithm to compute Bernoulli’s numbers.

Lovelace was the key person behind the foundation of computer programming even before modern computers were invented.

Data Visualization in 1855 by Florence Nightingale

Nurse and medical reformer, Florence Nightingale, a Victorian icon, used data visualization and statistics during the Crimean war to record the casualty in the camps and hospitals. Based on her chart, the data showed that poor sanitation was the major cause of death of the soldiers. Also, she used data visualization to communicate the data to significant staff and heads.

Business Intelligence (BI) in 1865 by Richard Miller Devens

Richard Miller Devens was the first to incorporate “Business Intelligence” as a term for data analysis in his authored book, Cyclopaedia of Commercial and Business Anecdotes. BI is also used to create actions that will solve problems and concerns related to business. This theory helped business ventures and investors get an edge in the competition.

Data Processing in 1884 by Herman Hollerith

Herman Hollerith invented the tabulating machine in 1884. It is a machine intended to collect data using a punch card. Eventually, this machine was used for the data processing of the US Census in 1890. Later on, Hollerith established the Computing-Tabulating-Recording Company, which became popular now as International Business Machine or IBM.

Computable Numbers in 1936 by Alan Turing

Alan Turing wrote ” On Computable Numbers,” wherein he introduced Universal Machine as a computing device that performs complex computations. Because of this breakthrough, the path to modern-day computers was opened. According to his paper, the device can do different arithmetical operations and computations.

Social Security Contract was given to IBM in 1937

Under the administration of Franklin D. Roosevelt in 1937, when the Social Security Act officially became law, IBM received a contract to develop a device intended to read a punch card. It was then called IBM Type 77.

The said project involves two distinct sets of punch card machines. All outputs from these two machines were compared, merged, and filed as one data. It can handle up to 480 cards within one minute.

Colossus, the First Data Processing Machine in 1943 by Tommy Flowers

Tommy Flowers designed the Colossus, a theoretical computer, in 1943. It was considered the first data-processing machine in the UK intended to interpret the codes of the Nazis during WWII. It can perform computations like Boolean operations and the like. Moreover, it can intercept up to 5,000 characters per second and look for messages within a few hours instead of days.

Impact of Electronic Computing on Data Analysis in 1962 as Projected by John W. Tukey

John W. Tukey, a chemist-turned-statistician, contributed to data analysis. He made research and studies about graphical methods significant in data analysis. Aside from the impact of electronic computing on data analysis that he projected, he also contributed to the invention of the Stem-and-Leaf Diagram, Box-and-Whisker Plot, and Paired Comparisons.

In 1962, when he authored the book, “The Future of Data Analysis,” he also introduced the term “bit” or binary digit in the world of data science. This was the first time that data science was recognized across the globe.

Contemporary Data Processing in 1974 by Peter Naur

It was Peter Naur who gave the term “data science” its definition. According to Naur, data science is the method of dealing with gathered data as they are established. His book, “Concise Survey of Computer Methods,” discussed contemporary data processing in different applications.

Establishment of the International Association for Statistical Computing in 1977

The IASC, or International Association for Statistical Computing, was established in 1977. Its mission is to link old methods with modern technology and significant data science knowledge. The Association also aims to promote practical statistical computing throughout the world.

Exploratory Data Analysis by John W. Tukey

John W. Tukey wrote the book Exploratory Data Analysis in 1977. He emphasized the need for data examination and analysis before giving it a specific probability model. He also mentioned that the result might come out biased and incomplete if hypothesis testing and statistics are combined and used on the same data.

Data Mining in 1989 by Gregory Piatetsky- Shapiro

Gregory Piatetsky-Shapiro coined the term KDD or “Knowledge Discovery in Databases” in 1989. He also organized a workshop discussing the said topic. By 1990, “data mining” appeared for the first time in the database community.

Data Science in 1996

The International Federation of Classification Societies, or IFCS, held its fifth conference in Kobe, Japan, in 1996. This was when the term data science was first mentioned and classified. The meeting discussed data science which includes data gathering, clustering, classification, and analysis.

Statistics Renamed Data Science as Insisted by Jeff Wu in 1997

Jeff Wu gave his inaugural lecture entitled, “Statistics=Data Science?” wherein he insisted that the traditional term statistics be renamed and changed to data science. He stated that statistics is characterized as data collecting, modeling, and decision-making. With its new name, statistics itself will have its own unique and distinct identity.

Big Data in 1997

Michael Cox and David Ellsworth from NASA used the term “Big Data” for the first time in their research paper in 1997. It refers to sets of huge data that the computing system and other software tools find difficult to manage.

John R. Mashey, an entrepreneur in America and a computer scientist, also used Big Data as a term for his paper in 1998.

Data Science gained Prominence from 2001-2005

William S. Cleveland established data science as one distinct discipline, so he was credited for its prominence. From 2001-2005, statisticians, mathematicians, and data scientists became more familiar with and attached to the term “Data Science.” The National Science Board even promoted it in support of the data scientists by publishing “Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century” in 2005.

Hadoop 0.1.0 Released in 2006

Hadoop 0.1.0 was launched and released in 2006. It is an open-source and non-relational database based on Apache Nutch. Yahoo deployed it using the MapReduce programming model to handle huge volumes of databases. It works by simply dividing the files and distributing them in clusters, thus, data processing becomes more efficient and fast. Moreover, the launch of Hadoop begins the rise of Big Data.

Research Center for Dataology and Data Science established in 2007

The Research Center for Dataology and Data Science was established in 2007 at Fudan University in China. Two of their researchers, Yun Xiong and Yangyong Zhu, also published a paper in 2009 entitled “Introduction to Dataology and Data Science.”

Fudan University later on, hosted a workshop, “The First International Workshop on Dataology and Data Science,” where more than 30 participating scholars exchanged ideas about data science.

AMSAT Changed its Name to Section on Statistical Learning and Data Science in 2014

From the name American Statistical Association’s Section on Statistical Learning and Data Mining, it is now the Section on Statistical Learning and Data Science. The change in name in 2014 shows that data science is indeed popular. However, it may seem like a little step in terms of changes, but it is a huge step for ASA in its goal of linking statistics with data science.

Conclusion

The long journey of data science through the years, it is evident that it evolved massively from basic data stockpiles to advanced data processing and storing. The work of data scientists has become more organized, coordinated, and well-managed. As technology advances, data science will also have an improved future.

want to learn how to algo trade so you can remove all emotions from trading and automate it 100%? click below to join the free discord and then join the bootcamp to get started today.

JOIN ALGO TRADE CAMP & THE DISCORD