Data Science


Data science is a field of Big Data geared toward providing meaningful information based on large amounts of complex data. Data science, or data-driven science, combines different fields of work in statistics and computation in order to interpret data for the purpose of decision making.

Data Science Course Content

Introduction to Data Science
  • a.What is data science?
  • How is data science different from Bi and Reporting?
  • b.Who are data scientists?
  • What skill sets are required?
  • c.What do they do?
  • What kind of projects they work on?
Business statistics
  • a.Data types
  • Continuous variables
  • Ordinal Variables
  • Categorical variables
  • Time Series
  • Miscellaneous
  • b.Descriptive statistics?
  • c.Sampling
  • Need for Sampling?
  • Different types of Sampling
  • Simple random sampling
  • Systematic sampling
  • Stratified Sampling
  • d. Data distributions
  • Normal Distribution – Characteristics of a normal distribution
  • Binomial Distribution
  • e. Inferential statistics
  • f. Hypothesis testing
  • Type I error
  • Type II error
  • Null and alternate hypothesis
  • Reject or acceptance criterion
Introduction to R
  • A Primer to R programming
  • What is R? similarities to OOP and SQL
  • Types of objects in R – lists, matrices, arrays, data.frames etc.
  • Creating new variables or updating existing variables
  • IF statements and conditional loops - For, while etc.
  • String manipulations
  • Sub setting data from matrices and data.frames
  • Casting and melting data to long and wide format.
  • Merging datasets
Exploratory data analysis and visualization
  • Getting data into R – reading from files
  • Cleaning and preparing the data – converting data types (Character to numeric etc.)
  • Handling missing values – Imputation or replacing with place holder
  • values
  • Visualization in R using ggplot2(plots and charts) – Histograms, bar
  • charts, box plot, scatter plots
  • Adding more dimensions to the plots
  • Visualization using Tableau( Introduction)
  • Correlation – Positive , negative and no correlation
  • What is a spurious correlation
  • Correlation vs. causation
Introduction to Python:
  • a. Different types of predictive analytics – prediction, forecasting,
  • Optimization , segmentation etc.
  • b. Supervised learning
  • Prediction (Linear)
    1. Simple Linear Regression
    2. Assumptions
    3. Model development and interpretation
    4. Sum of least squares
    5. Model validation – tests to validate assumptions
    6. Multiple linear regressions
    7. Disadvantages of linear models
  • Logistic Regression
  • Need for logistic regression
  • Logit link function
  • Maximum likelihood estimation
  • Model development and interpretation
  • Confusion Matrix – error measurement
  • ROC curve
  • Measuring sensitivity and specificity
  • Advantages and disadvantages of logistic regression models

Decision trees

  • C5.0
  • Classification and Regression trees(CART)
  • Process of tree building
  • Entropy and Gini Index
  • Problem of over fitting
  • Pruning a tree back
  • Trees for Prediction (Linear) – example
  • Tress for classification models – example
  • Advantages of tree based models?
KNN – K nearest neighbors
  • Advantages and disadvantages of KNN
  • Re-Sampling and Ensembles Methods
  • Bagging
  • Random Forests
  • Boosting – Gradient boosting machines
  • Advanced methods
  • Support Vector machines
  • Neural networks
  • Introduction to deep learning
  • Introduction to online learning
  • Un-Supervised learning
  • Cluster analysis
  • Hierarchical clustering
  • K-Means clustering
  • Distance measures
  • Applications of cluster analysis – Customer Segmentation
  • Time series analysis - Forecasting
  • Simple moving averages
  • Exponential smoothing
  • Time series decomposition
  • Collaborative filtering
  • 5. User based Filtering
  • 6. Item based Filtering
Model validation and deployment
  • Error measurement
  • RMSE – Root Mean squared error
  • Misclassification rate
  • Area under the curve (AUC)
Practical use cases and best practices
  • a. Business problem to an analytical problem
  • Problem definition and analytical method selection
  • b. Guidelines in model development
Introduction to big-data and other tools ( Python and R-Server)
  • a. Big data and analytics?
  • Leverage Big data platforms for Data Science
  • b. Introduction to evolving tools e.g Spark
  • Machine learning with Spark
Introduction to Azure cloud and Big-Data computing over cloud
  • Creation of R-Server clusters
  • Computation of Big-Data ML algorithms over the Azure cloud
Introduction to Deep Learning
  • What is DL and how does it score better over traditional MLs?
  • Convolutional and Perceptron models
  • Comparison between DL and ML performances over the MNIST dataset
Analytical Visualization with Tableau
  • Why is it important for Data-Analyst
  • Tableau workbook walkthrough
  • Instruction of creation of your own workbooks
  • Demo of few more workbooks
Offerings from Kelly
  • Mock interviews questions and case studies walkthrough over Azure
  • Cortana gallery
  • Guidance to prepare resumes
  • Information on companies and industry trends on data science

Courses Features

  • Language
  • Lectures
  • Certification
  • Project
    5 Minor + 1 Major
  • Duration
    64 hrs + 36 hrs
  • Max-Students

© Copyright - 2019 | Cyberaegis | All Rights Reserved.