What Is A Data Scientist? From a Product Manager Point of View

As part of my new nightly task of Understanding Tech as a PM, I accidently dove into the data scientist topic. While at work today, a good portion of my day was focused on analytics to report up trends based on some custom analytics built for our products. In my head my thoughts were “wow it sure would be nice to have someone feeding me this data to help make more informed decisions and priorities.” I then continued on with my daily PM tasks. Tonight I reflected on that question and went down a rabbit hole on data… specifically the role of a data scientist.

Let’s start from the beginning

In essence, a Data Scientist uses data to create as much impact as possible for a company or product. They produce insights, data products and product recommendations. Now where did this whole thing start?

  • In 1996 before data science was a thing, we popularized the term “Data Mining” where it basically covered the overall process of discovering useful knowledge from data and extracting patterns from it.
  • In 2001, William Cleveland brought data mining to another level. He did that by combining Computer Science with Data Mining. This made statistics a lot more technical. This allowed the ability to take advantage of computing power for statistics.
  • Web 2.0 emerged around this time. Websites are no longer a digital template and now operate as a medium for a shared experience among millions of users. Examples of this would be Myspace (2003), Facebook (2004), YouTube (2005). Users can now interact with websites by sending likes, uploading content, commenting and sharing. This now means there’s A LOT of data. It was so much data it became to much to handle using traditional technologies.
  • In 2010, the term “Big Data” popularized. This opened a world of possibilities to find insights using data. Simple questions like “How many active users do we have” required sophisticated data infrastructure just to support the handling of the data. We created parallel computing technology like MapReduce, Hadoop and Spark. The rise of big data meant the rise of data science to support the needs of businesses to draw insights from their massive unstructured datasets.
  • In 2010, with the new abundance of data, it made it possible to train machines with a data driven approach rather than a knowledge driven approach. This opened the possibility to change the world and affect our every day lives. Machine Learning and AI dominated the media overshadowing the other aspects of data science like analytics, metrics, exploratory analysis, business intelligence etc. The general public then thought of data science as researchers focused on machine learning and AI but the industry was hiring data scientists as analysts. The gap here is that most of these data scientists can work on these complicated problems but big tech companies like Google have so much low hanging fruit that doesn’t require advanced machine learning or statistical knowledge to find these impacts in their analysis.
  • Being a good Data Scientist is not about being a data cruncher. They’re problem solvers and strategists that deal with extremely hard problems to guide a company in the right direction.

The Data Science Hierarchy of Needs

AI and Deep Learning (top of the hierarchy) receives the most exposure in the media however don’t yield the most result for lowest amount of effort. The middle hierarchy of needs like experimentation, analytics, A/B testing is more important for industry hence the focus on data scientists. The middle hierarchy in essence has data scientists tell the company what to do with their product.

What Data Scientists Do?

What data scientists do depends on the company and the size of the company.

Startup

  • One data scientist pretty much does all of the hierarchy of needs but may not be doing AI or Deep Learning if it’s not a priority. This can include setting up the entire infrastructure, writing some software code, setting up the analytics, metrics and doing A/B testing.

Medium Sized Company

  • Software Engineers cover the collection part of the hierarchy.
  • Data Engineers cover the data cleaning, prep, infrastructure, data pipelines etc.
  • Data Scientists cover the top half of the hierarchy depending if the companies focus is on AI. Data scientists at this level typically have masters or PhDs because they’re expected to be able do the more complicated things.

Large Sized Company

  • Because a large sized company has a lot more money and resources, they can allocate that money on a lot more specific needs. Employees can focus on things that they’re best at rather than spending their time on several data science needs.
  • Software Engineers cover the collection part of the hierarchy.
  • Data Engineers cover the data cleaning, prep, infrastructure, data pipelines etc.
  • Data Scientist “Analytics”, cover the top half of the hierarchy besides the AI and Deep Learning. Their expertise is in quantitative analysis, data mining and the presentation of data.
  • Research Scientist, Core Data Science or Machine Learning Engineers cover AI and Deep learning specifically.

I know I learned a ton tonight and hopefully this can inform you as well.

Thanks for reading!

Alex Passero