Practical Data Science with Cortana Intelligence: Azure Machine Learning, SQL Data Mining and R
24th October 2016 - 27th October 2016
Who is it for?
This course is aimed at analysts, analytical power users, predictive developers, BI power users and developers, budding data scientists, consultants.
What you will learn
You will learn machine learning, data mining, some statistics, data preparation, and how to interpret the results. You will see how to formulate business questions in terms of data science hypotheses and experiments, and how to prepare inputs to answer those questions. We will cover common issues and mistakes, how to resolve them, like overtraining, and how to cope with rare events, such as fraud. At the end of this course you will be able to plan and run data science projects.
As a practicing data miner, Rafal will also share his decade of hands-on experience while teaching you about Azure Machine Learning (Azure ML) which is the foundation of Cortana Intelligence Suite, and its highly-visual, on-premises companion, the SQL Server Analysis Services Data Mining engine, supplemented with the free open source and Cortana’s Revolution Analytics R software. We will use some Excel, however, most of our time will be spent in ML Studio, some in R, RStudio, SSDT, SSMS, and the Azure Portal.
Please note that this agenda is subject to last-minute alterations to best suit the needs and the flow of this live classroom course. Learning points marked with an asterisk (*) are optional, and will be covered subject to interest and time remaining.
Day 1: Overview of Practical Data Science for Business
- Introduction to data science and its components (machine learning/data mining, statistics, big data and data wrangling)
- Team, process, and tools
- Inputs and outputs
- Stating business question in data science term
- Scientific method and experiments
- Data formats: cases/observations, signatures.
- High-level overview of algorithm classes (classifiers, clustering, regression, recommenders)
- Moving data around and its storage
- Getting started with and using Azure ML, SSAS DM, and R (structures, models, data flow)
Day 2: Segmentation and Classification
- Introduction to segmentation
- Clustering algorithms (k-means, EM, and others)
- Interpreting clusters (cluster characteristics, discrimination, tornado charts)
- Introduction to classifiers (two-class and multi-class)
- Key algorithms (decision trees/forests, neural networks, naïve Bayes, boosting, and others)
- Class imbalance problem (fraud analytics and rare event prediction) *
Day 3: Model Validation and Statistics
- Descriptive statistics with R
- Interpreting classifier quality
- Testing model accuracy, reliability, and usefulness
- False positives vs. false negatives: classification (confusion) matrix
- Charting precision-recall (sensitivity-specificity) with ROC curves, lift charts, and precision-recall charts
- Optimising binary classifier thresholds for a known business goal of a prediction quality
- Refining models to improve accuracy and reliability *
- Avoiding over-training (over-fitting) in critical situations *
Day 4: Regressions, Recommenders, Other Algorithms, Production & Model Maintenance
- Deploying models to production (Azure ML web services, DMX queries, PMML)
- On-going maintenance and model updates *
- Introduction to recommender concepts
- Key recommender algorithms (association rules, collaborative filtering, matchbox recommenders, associative decision trees, Market Basket Analysis)
- Understanding itemsets and rules
- Rule importance vs. rule probability
- Introduction to simple regressions
- Key regression algorithms (linear regression, regression decision trees)
- Measuring linear regression quality (R-squared, predictor p-values, additional testing using R) *
- Briefly: remaining algorithms of interest: sequence clustering, SVM, perceptrons, Bayes point machine *
A more detailed description of what will be covered each day can be viewed here.
In this intensive four day workshop you’ll learn the fundamentals of data science and machine learning. The format is 60% lectures interspersed with 10% demos, plus approximately 30% time allocated for you to practice the demos while Rafal helps you resolve any issues and answers group questions. You should have your own account with access to Azure Machine Learning configured (both the free and paid-for versions are acceptable.) You do not need to practice: if you prefer you can use the available time for a discussion of your own data and projects. You are free to take our data samples and PPT slides, but no formal notes or workbooks are provided. A follow-up book-reading list will be shared.Cost: £1599 (to August 31 2016), �1799 (from September 1 2016) +VAT
Rafal Lukawiecki, Strategic Consultant at Project Botticelli Ltd (projectbotticelli.com), focuses on making advanced analytics easy, insightful, and useful, helping clients achieve better organizational performance. Passing those skills to consultants, developers, and board members is important to him. He specializes in business intelligence, looking for valuable patterns and correlations using data mining, and he is also known for his work in cryptography, enterprise architecture, and solution delivery. Rafal has been a popular, well-travelled speaker at major IT conferences since 1998. He even had the honour of sharing keynote platforms with Bill Gates, Neil Armstrong, and Steve Ballmer. A natural educator, he explains complex concepts in simple terms in an engaging, enjoyable, energetic style. Outside IT, Rafal spends a quarter of every year finding abstractions in natural landscapes, expressing them through traditional, black-and-white, large-format lm photography in his hand-made, silver-gelatin prints—see rafal.net.
Aldersgate House, 135-137 Aldersgate Street, EC1A 4JA
Cancellations must be submitted in writing, either via email or by post. Registrants whose cancellations are received at least 2 weeks before the beginning of the course are entitled to a full refund minus a £80 processing fee plus any credit card charges incurred. No refunds will be given to registrants who cancel less than 2 weeks before the beginning of the course or who fail to attend.
In the extraordinary case where the course is cancelled a full refund will be given to all paid registrants. Crossjoin Consulting will not refund any other amount paid by registrants to other companies, including travel expenses and hotel reservations.
Chris Webb's Introduction to MDX course was well-run, informative and good value. It gave me a good understanding of the key principles and ideas and how to apply them to real world situations. It was great environment to learn in, with Chris giving excellent practical exercises and having the depth of knowledge to answer to any group questions.