Introduction
After learning the Exploratory Data Analytics (EDA) techniques in the Data Analytics eBook, you will hone your predictive analytics skills. This eBook covers popular algorithms used in predictive analytics today, exploring their pros, cons, and applications to help you select the right algorithm for any given use case.
If you are not proactive, your only option is to be reactive
Data Analyst vs Scientist vs Engineer
Data Analyst
- Entry level position in the Data Science Domain.
- Does not invent new algorithms, but has a strong understanding of existing ones and how to apply them.
- Basic understanding of programming, statistics, machine learning, Big Data principles, and visualization.
- Presenting findings is a key skill in this position.
- Takes guidance from Data Scientists to acquire, process, and summarize data.
- Everyday job: Scraping data, querying databases for stakeholder requests, triaging data issues, and presenting findings with visualizations. A data analyst primarily analyzes past data.
Data Scientist
All the skills listed for a Data Analyst, in addition to:
- Needs advanced skills to deal with high volume and velocity data.
- Can independently conduct research without direction and tackle open-ended questions.
- Often holds an advanced degree in Computer Science, Statistics, or Mathematics and is capable of inventing new algorithms.
- Expectations: Adds value by uncovering new and hidden patterns in data.
Typical skills required:
- Languages: Python, R, Java, SQL, NoSQL
- Libraries: NumPy, Pandas, Seaborn, Bokeh, ggplot, Scikit-learn, D3.js, MapReduce
- Knowledge needed in: Big Data Storage/Processing Systems like Hadoop, HDFS, Spark, Pig, Hive, Kafka, etc.
- Dashboarding applications: Plotly-Dash, Tableau, etc.
Key traits:
- Inquisitiveness - the ability to ask questions, hunt down solutions, and keep an open mind to unearth other interesting and business-valuable findings hidden while trying to find a solution.
- A data scientist strategizes for the future. For example, a data scientist could improve user engagement with a self-paced learning system by identifying when and how users lose interest.
Data Engineer
Main Tasks:
- Builds data pipelines to clean, transform, and aggregate unorganized and messy data into databases and storage systems.
- Designs storage systems for Big Data.
- Ensures scalability and disaster recovery systems are implemented correctly.
- Lays a solid system foundation for data analysts and data scientists to build their models.
- Responsible for ensuring data flows smoothly from one system (including source) to another (including destination).
Typical skills required:
- Deep knowledge of Hadoop, Spark, Hive, Pig, MapReduce
- Deep knowledge of SQL databases
- Deep knowledge of NoSQL Databases
- Data warehousing needs and requirements

Business Analyst
A person who understands both the science related to data and the business domain, allowing them to provide holistic solutions.
Typical Business Skills:
- Analytic Problem-Solving: Approaching high-level challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
- Effective Communication: Detailing your techniques and discoveries to technical and non-technical audiences in a language they can understand.
- Intellectual Curiosity: Exploring new territories and finding creative and unusual ways to solve problems.
- Industry Knowledge: Understanding the way your chosen industry functions and how data is collected, analyzed, and utilized.
- Aids in data-driven decision-making, which is the practice of basing decisions on data analysis rather than purely on intuition.

Typical Data Driven Decision Process

Examples:
- What advertisements to send to which customer?
- What recommendations to send to which customer?
- Instant fraud detection
Analytics vs Business Analytics
Analytics
- Systematic analysis and interpretation of data — typically using mathematical, statistical, and computational tools — to improve our understanding of a real-world domain.
Business Analytics
- Evidence-based problem recognition and solving that happen within the context of business situations (Holsapple, Lee-Post, and Pakath, 2014).
- Modern label that applies to the data analysis part of Business Intelligence.

Evolution of Data Sources
As the field of analytics has matured, so too has the variety and volume of data we work with.
- Structured Data: Initially, we dealt primarily with highly structured transactional data stored in Relational Databases (SQL).
- Semi-Structured: With the web, we started handling document-based data like XML, JSON, and CSVs.
- Unstructured: The explosion of Social Media introduced varied formats like text posts, images, and videos.
- Streaming: Today, we deal with real-time, high-velocity data from IoT sensors, devices, and system logs.
