Course Outline
Introduction
- The Data Science Process
 - Roles and responsibilities of a Data Scientist
 
Preparing the Development Environment
- Libraries, frameworks, languages and tools
 - Local development
 - Collaborative web-based development
 
Data Collection
- 
        Different Types of Data
        
- 
                Structured
                
- Local databases
 - Database connectors
 - Common formats: xlxs, XML, Json, csv, ...
 
 - 
                Un-Structured
                
- Clicks, censors, smartphones
 - APIs
 - Internet of Things (IoT)
 - Documents, pictures, videos, sounds
 
 
 - 
                Structured
                
 - Case study: Collecting large amounts of unstructured data continuously
 
Data Storage
- Relational databases
 - Non-relational databases
 - Hadoop: Distributed File System (HDFS)
 - Spark: Resilient Distributed Dataset (RDD)
 - Cloud storage
 
Data Preparation
- Ingestion, selection, cleansing, and transformation
 - Ensuring data quality - correctness, meaningfulness, and security
 - Exception reports
 
Languages used for Preparation, Processing and Analysis
- 
        R language
        
- Introduction to R
 - Data manipulation, calculation and graphical display
 
 - 
        Python
        
- Introduction to Python
 - Manipulating, processing, cleaning, and crunching data
 
 
Data Analytics
- 
        Exploratory analysis
        
- Basic statistics
 - Draft visualizations
 - Understand data
 
 - Causality
 - Features and transformations
 - 
        Machine Learning
        
- Supervised vs unsurpevised
 - When to use what model
 
 - Natural Language Processing (NLP)
 
Data Visualization
- Best Practices
 - Selecting the right chart for the right data
 - Color pallets
 - 
        Taking it to the next level
        
- Dashboards
 - Interactive Visualizations
 
 - Storytelling with data
 
Summary and Conclusion
Requirements
- A general understanding of database concepts
 - A basic understanding of statistics
 
Testimonials (4)
I liked Pablo's style, the fact that he covered a lot of subjects from report design , customization with html to implementing simple ML algortithms. Good balance theoretical information / exercices. Pablo really covered all topics i was interested in and gave comprehensive answers to my questions.
Cristian Tudose - SC Automobile Dacia SA
Course - Advanced Data Analysis with TIBCO Spotfire
Actual application of spotfire and all basic functions.
Michael Capili - STMicroelectronics, Inc.
Course - Introduction to Spotfire
Real world knowledge from someone in the industry
Matthew Cerbas - Shield Consulting Solutions, Inc.
Course - Grafana
I genuinely enjoyed the lots of labs and practices.