Icon Drop a Query Icon Request a Call Back Icon Call us
Preloader
IconCall us: +91 9354485417
  • Follow Us On :

What you'll learn in this course?

At the end of Data Science with Python training course, participants will be able to
•    Understand the difference between Python basic data types
•    Know when to use different python collections
•    Implement python functions
•    Understand control flow constructs in Python
•    Handle errors via exception handling constructs
•    Be able to quantitatively define an answerable, actionable question
•    Import both structured and unstructured data into Python
•    Parse unstructured data into structured formats
•    Understand the differences between NumPy arrays and pandas dataframes
•    Understand where Python fits in the Python/Hadoop/Spark ecosystem
•    Simulate data through random number generation
•    Understand mechanisms for missing data and analytic implications
•    Explore and Clean Data
•    Create compelling graphics to reveal analytic results
•    Reshape and merge data to prepare for advanced analytics
•    Find test for group differences using inferential statistics
•    Implement linear regression from a frequentist perspective
•    Understand non-linear terms, confounding, and interaction in linear regression
•    Extend to logistic regression to model binary outcomes
•    Understand the difference between machine learning and frequentist approaches to statistics
•    Implement classification and regression models using machine learning
•    Score new datasets, evaluate model fit, and quantify variable importance


Prerequisites

All attendees should have prior programming experience and an understanding of basic statistics.
 

Who Can Apply?

  • IT and Software Professionals
  • Aspiring Data Scientists and Analysts
  • Students 

Course Curriculum

• History and current use

o Installing the Software

o Python Distributions

• String Literals and numeric objects

• Collections (lists, tuples, dicts)

• Datetime classes in Python

• Memory Management in Python

• Control Flow

• Functions

• Exception Handling

• Defining the quantitative construct to make inference on the question

• Identifying the data needed to support the constructs

• Identifying limitations to the data and analytic approach

• Constructing Sensitivity analyses

• Structured Data

o Structured Text Files

o Excel workbooks

o SQL databases

• Working with Unstructured Text Data

o Reading Unstructured Text

o Introduction to Natural Language Processing with Python

• Introduction to the ndarray

• NumPy operations

• Broadcasting

• Missing data in NumPy (masked array)

• NumPy Structured arrays

• Random number generation

• Filtering

• Creating and deleting variables

• Discretization of Continuous Data

• Scaling and standardizing data

• Identifying Duplicates

• Dummy Coding

• Combining Datasets

• Transposing Data

• Long to wide and back

• Univariate Statistical Summaries and Detecting Outliers

• Multivariate Statistical Summaries and Outlier Detection

• Group-wise calculations using Pandas

• Pivot Tables

• Histogram

• Box-and-whiskers plot

• Scatter plots

• Forest Plots

• Group-by plotting

• Introduction to the difference in Python, Hadoop, and Spark

• Importing data from Spark and Hadoop to Python

• Parallel execution leveraging Spark or Hadoop

• Exploring and understanding patterns in missing data

• Missing at Random

• Missing Not at Random

• Missing Completely at Random

• Data imputation methods

• Comparing Groups

o P-Values, summary statistics, sufficient statistics, inferential targets

o T-Tests (equal and unequal variances)

o ANOVA

o Chi-Square Tests

• Correlation

• Linear Regression

o Multivariate linear regression

o Capturing Non-linear Relationships

o Comparing Model Fits

o Scoring new data

o Poisson Regression Extension

• Logistic regression

o Logistic Regression Example

o Classification Metrics

• Machine Learning Theory

• Data pre-processing

o Missing Data

o Dummy Coding

o Standardization

o Training/Test data

• Supervised Versus Unsupervised Learning

• Unsupervised Learning: Clustering

o Clustering Algorithms

o Evaluating Cluster Performance

• Dimensionality Reduction

o A-priori

o Principal Components Analysis

o Penalized Regression

• Linear Regression

• Penalized Linear Regression

• Stochastic Gradient Descent

• Scoring New Data Sets

• Cross Validation

• Variance Bias-Tradeoff

• Feature Importance

• Logistic Regression

• LASSO

• Random Forest

• Ensemble Methods

• Feature Importance

• Scoring New Data Sets

• Cross Validation