The Pyramid of Data Science

The Pyramid of Data Science

Data Science is exploding… or so it seems. I came across an article by Oscar Olmedo that describes some of the stages that practitioners in the area follow, and I think it makes for an interesting discussion.

In general data science is interested in extracting knowledge from a given data set and in order to do that tools such as mathematical modelling and machine learning are employed. Olmedo lists the following steps in the data science process:

  1. Data selection and gathering
  2. Data cleaning/integration, and storage
  3. Feature extraction
  4. Knowledge extraction
  5. Visualisation

I think I agree with the view and in particular I am a firm believer that steps 1 and 2 are probably the most crucial and time consuming. You can read Olmedo’s post here.