 #### Data Science

Description

Data science is a field of Big Data geared toward providing meaningful information based on large amounts of complex data. Data science, or data-driven science, combines different fields of work in statistics and computation in order to interpret data for the purpose of decision making.

#### Data Science Course Content

##### Introduction to Data Science
• a.What is data science?
• How is data science different from Bi and Reporting?
• b.Who are data scientists?
• What skill sets are required?
• c.What do they do?
• What kind of projects they work on?
• a.Data types
• Continuous variables
• Ordinal Variables
• Categorical variables
• Time Series
• Miscellaneous
• b.Descriptive statistics?
• c.Sampling
• Need for Sampling?
• Different types of Sampling
• Simple random sampling
• Systematic sampling
• Stratified Sampling
• d. Data distributions
• Normal Distribution – Characteristics of a normal distribution
• Binomial Distribution
• e. Inferential statistics
• f. Hypothesis testing
• Type I error
• Type II error
• Null and alternate hypothesis
• Reject or acceptance criterion
##### Introduction to R
• A Primer to R programming
• What is R? similarities to OOP and SQL
• Types of objects in R – lists, matrices, arrays, data.frames etc.
• Creating new variables or updating existing variables
• IF statements and conditional loops - For, while etc.
• String manipulations
• Sub setting data from matrices and data.frames
• Casting and melting data to long and wide format.
• Merging datasets
##### Exploratory data analysis and visualization
• Getting data into R – reading from files
• Cleaning and preparing the data – converting data types (Character to numeric etc.)
• Handling missing values – Imputation or replacing with place holder
• values
• Visualization in R using ggplot2(plots and charts) – Histograms, bar
• charts, box plot, scatter plots
• Adding more dimensions to the plots
• Visualization using Tableau( Introduction)
• Correlation – Positive , negative and no correlation
• What is a spurious correlation
• Correlation vs. causation
##### Introduction to Python:
• a. Different types of predictive analytics – prediction, forecasting,
• Optimization , segmentation etc.
• b. Supervised learning
• Prediction (Linear)
1. Simple Linear Regression
2. Assumptions
3. Model development and interpretation
4. Sum of least squares
5. Model validation – tests to validate assumptions
6. Multiple linear regressions
##### Classification
• Logistic Regression
• Need for logistic regression
• Maximum likelihood estimation
• Model development and interpretation
• Confusion Matrix – error measurement
• ROC curve
• Measuring sensitivity and specificity

#### Decision trees

• C5.0
• Classification and Regression trees(CART)
• Process of tree building
• Entropy and Gini Index
• Problem of over fitting
• Pruning a tree back
• Trees for Prediction (Linear) – example
• Tress for classification models – example
• Advantages of tree based models?
##### KNN – K nearest neighbors
• Re-Sampling and Ensembles Methods
• Bagging
• Random Forests
• Boosting – Gradient boosting machines
• Support Vector machines
• Neural networks
• Introduction to deep learning
• Introduction to online learning
• Un-Supervised learning
• Cluster analysis
• Hierarchical clustering
• K-Means clustering
• Distance measures
• Applications of cluster analysis – Customer Segmentation
• Time series analysis - Forecasting
• Simple moving averages
• Exponential smoothing
• Time series decomposition
• ARIMA
• Collaborative filtering
• 5. User based Filtering
• 6. Item based Filtering
##### Model validation and deployment
• Error measurement
• RMSE – Root Mean squared error
• Misclassification rate
• Area under the curve (AUC)
##### Practical use cases and best practices
• a. Business problem to an analytical problem
• Problem definition and analytical method selection
• b. Guidelines in model development
##### Introduction to big-data and other tools ( Python and R-Server)
• a. Big data and analytics?
• Leverage Big data platforms for Data Science
• b. Introduction to evolving tools e.g Spark
• Machine learning with Spark
##### Introduction to Azure cloud and Big-Data computing over cloud
• Creation of R-Server clusters
• Computation of Big-Data ML algorithms over the Azure cloud
##### Introduction to Deep Learning
• What is DL and how does it score better over traditional MLs?
• Convolutional and Perceptron models
• Comparison between DL and ML performances over the MNIST dataset
##### Analytical Visualization with Tableau
• Why is it important for Data-Analyst
• Tableau workbook walkthrough
• Instruction of creation of your own workbooks
• Demo of few more workbooks
##### Offerings from Kelly
• Mock interviews questions and case studies walkthrough over Azure
• Cortana gallery
• Guidance to prepare resumes
• Information on companies and industry trends on data science

#### Courses Features

• Language
English
• Lectures
1
• Certification
Yes
• Project
5 Minor + 1 Major
• Duration
64 hrs + 36 hrs
• Max-Students
20