It is quite common to read or listen to these terms, both in Spanish and in English. The objective of this post is to learn to differentiate them, knowing that there are experts who tend to extend the scope of the definitions, and therefore there are points in common and overlap between them.
Can we be a data scientist and that we ourselves are not aware of it?
Currently, there are several fields within the data science that overlap, such as machine learning, artificial intelligence, deep learning, IoT (Internet of Things),… since the science of data is a rather vast concept and It covers several disciplines, and also borrows techniques and tools from other related sciences.
The great experts in the field say that there are two types of data scientists:
1. Analyst. It usually encodes solutions, even if it is not an expert on it, and yes in modeling, statistical inference… Companies are often called, statisticians, decision-support engineers, quantitative analysts, and even data scientists.
2. Builder. They have some statistical training, but they really are experts developing. He’s mainly interested in exploiting data in production.
The science of data is multidisciplinary.
Machine Learning vs Deep Learning
It can be said that machine learning or automatic learning is a set of algorithms that are trained with a set of data to make predictions or perform actions in order to optimize some systems.
An example of this is the supervised classification algorithms, which are used to classify potential candidates according to their good prospects based on historical information. The techniques involved, for a given task (for example, supervised grouping), they are varied: Bayes, SVM, Neural Networks (RNA), self-organized maps (SOM) association rules, decision trees (for example ID3 algorithm), logistic regression or Combined learning methods (e.g. bootstrap or boosting aggregation). An example of this type of algorithms is K closest neighbors (K-NN Nearest Neighbour).
Another example is the classification algorithms not monitored, are those in which we do not have a battery of examples previously classified, but only from the properties of the examples try to give a grouping (classification, Clustering) of the examples according to their similarity. An example of this type of algorithms is K-media.
All of this is a subset of the science of data. When these algorithms are automated, it is called Artificial intelligence (AI), and more specifically, deep Learning. If the collected data come from sensors and if they are transmitted over the Internet, then it is machine learning or data science or deep learning applied to IoT (Internet of Things).
Some definitions for deep learning are considered to be neural networks (a machine learning technique) with a deeper layer.
Artificial intelligence is a subfield of computer science, which was created in the 1960s, and was trying to solve tasks that are easy for human beings, but difficult for computers. In particular, a strong call would be a system that can do anything a human being can do. This is quite generic and includes all kinds of tasks, like planning, moving around the world, recognizing objects and sounds, talking, translating, performing social or commercial transactions, creative work (making art or poetry),… NLP (Natural language Processing) is simply the part of the IA that has to do with language (usually written).
The machine learning, in mathematical terms, is a function: from an input, you want to produce the correct output, so the whole problem “is reduced ” Build a model of this mathematical function in some automatic way. To establish a distinction with IA, if I can write a very intelligent program that has similar behavior to that of a human being, it may be IA, but unless its parameters are automatically learned from the data, it is not machine learning. The classic IA provides search strategies (uninformed, informed or local, such as simulated tempered algorithm), satisfaction of restrictions…
The concept of deep learning is a type of automatic learning that is very popular today. It is a particular type of mathematical model that can be thought of as a composition of simple blocks (composition of the function) of certain type, and where some of these blocks can be adjusted to better predict the final result.
Data Sciences vs Machine Learning
Automatic learning and statistics are part of the data science. The word learning in automatic learning means that the algorithms depend on some data, used as a training set, to fine tune some parameters of the model or algorithm. This includes many techniques such as regression, Bayes, monitored clustering. But not all techniques fit into this category. For example, unsupervised clustering-a statistical technique and data science-aims to detect clusters and cluster structures without any prior knowledge or training set to help the classification algorithm. It takes a human intervention to categorize and label these clusters. Some techniques are hybrid, such as semi-supervised classification. Pattern detection or density estimation techniques also fit in this category.
Data science encompasses more than automatic learning. Data, in data science, can or may not come from a machine or mechanical process (survey data can be compiled manually, clinical trials involve a specific type of small data) and may have nothing to do with learning. But the main difference is that the data science covers the entire spectrum of data processing, not just the algorithmic or statistical aspects. In particular, the science of data covers:
- Data integration
- Distributed architecture
- Automation of machine learning
- viewing data
- Scorecards and Business Intelligence (BI)
- Data Engineering
- Deployment in production
- Automated data-based decisions
Of course, in many organizations, data scientists focus on only part of this process.