What is data science?

22/09/2020 - Digital

We’re hearing more and more about data science. This is a very recent discipline, but one which has grown quickly over recent years. Jérôme Feroldi, data scientist at Smartengo, explains exactly what this expression covers. 

Hello Jérôme. What’s the purpose of data science?

Data science aims at exploring and analyzing a high volume of (digital) data coming from various sources, using mathematical algorithms and statistics (such as “machine learning”), to optimize business processes, to assist with decision-making or to add value (informative, economic…).
40 zettabytes of data available in the world in 2020
It is much in demand due to the crosscutting nature of the skills it requires. Data science is indeed situated at a crossroads between: 
  • IT and particularly programming: it is necessary to be able to create (code) the algorithms used. 
  • Mathematics and statistics (linear algebra, differential calculus and probability). They are at the basis of algorithms and data manipuation.
  • A good knowledge of products, jobs and activities.

What are the concrete applications of data science?

The main applications of data science can be sorted into four groups: optimization, automatization, creation and prediction.

In concrete terms, many of them are present in our daily life. There’s data science in your search engine, in your YouTube or Deezer recommendations, on your e-commerce websites, in self driving cars or in your voice-operated assistant. It’s also used to manage spam messages or moderate sensitive content. 

In companies, data science is involved in all kinds of jobs and activities: finance, marketing, products, logistics and supply chain, etc. This is naturally also the case at Vallourec, where more than a hundred initiatives based on data science have been identified: at VAM®, in finance, in production (mills) and naturally for Smartengo.
Jerome Feroldi

In companies, data science is involved in all kinds of jobs and activities: finance, marketing, products, logistics and supply chain, etc. 

Jérôme Feroldi Data Scientist at Smartengo
Data is the new oil

Some key dates

  • 1959: the expression “machine learning” first appears for a learning program by IBM (a virtual checkers player).

  • 1991: first public version of Python 

  • 1997: appearance of the term Big Data in a Nasa article 

  • 2000: first public version of R

  • 2001: Doug Laney (Gartner) defines Big Data based on the “3 Vs”: Volume, Variety and Velocity

  • 2006: creation of Hadoop (a distributed processing system)

  • 2016: A deep learning algorithm beats the world champion in a game of Go


What are the links with Artificial Intelligence and Big Data?

Artificial Intelligence refers to the techniques used to imitate the mechanisms of the human brain: image recognition, predictive models for various phenomena (the weather, purchasing behavior, etc.), filtering abusive comments… 

Big Data refers more generally to the enormous volumes of data processed with the associated calculation power (conventional IT tools are unable to properly process these quantities and volumes). It can be described according to three vectors, the 3 V: the “Volume” related to the increase in exchanges and the explosion of data (so more servers and personnel), the “Variety” of the data types and the “Velocity” or real-time collection and processing times. “Data is the new oil!” of data science.

What do you believe the future holds for this discipline?

It’s reasonable to assume that in the near future more and more environments (services via applications, connected items, etc.) will generate ever greater volumes of data. This will result in more powerful and complex algorithms but also cloud environments
At the same time, it is very likely that two challenges will become increasingly important: the protection of privacy and the limitation/reduction of the energy footprint of these resource-hungry techniques.