This is a presentation I gave at one of our building secure workloads seminars earlier this year.
Overview of Advanced Analytics & AI
- Accelerating adoption of AI by developers (consuming models)
- Rise of hybrid training and scoring scenarios
- Push scoring/inference to the event (edge, cloud, on-premise)
- Some developers moving into deep learning as non-traditional path to DS / AI dev
- Growth of diverse hardware arms race across all form factors (CPU / GPU / FPGA / ASIC / device
- Data preparation
- Model deployment & management
- Model lineage & auditing
Data Science & AI is not…
- Big Data
- Business Intelligence
- Creating beautiful visualisations just because we can!
Look at the following picture. Looks pretty awesome right?
At first glance it’s pretty insightful showing us how everyone on the world is connected by Facebook. But what does it actually tell us?
The western modernised world with access to the latest tech and resources has more Facebook users than other parts of the world.
Once we start to look below the gloss we see what’s actually going on.
A Structured Approach to Data Science
- KDD (Knowledge Discovery in Databases) in 1990s
- CRISP-DM (Cross Industry Standard Process for Data Mining)
- Defines 6 stages:
- Business understanding
- Data understanding
- Data preparation
The Microsoft Team Data Science Process
- A data science life-cycle definition
- A standardised project structure
- Infrastructure and resources for data science projects
- Tools and utilities for project execution
Ensure data integrity with data sources from the cloud and on premise are secured using the 14 principles of cloud security.
Identity management, encryption at rest and in transit.
Row level security for DBs etc.
Model security – management, version control etc.
Deployment – operational management and security
Infrastructure and resources for data science projects
- Cloud file systems for storing datasets
- Big data clusters (Hadoop, Spark/Databricks)
- Machine learning service
Azure Active Directory
Role Based Access (RBAC)
The AI Development Life-cycle
Machine Learning & AI Portfolio
When to use what?
Modern Data Architecture – Azure PaaS Data Services
Advantages of an all PaaS platform – this is a much more joined up approach.
All data in cloud means you can unlock the potential insights of your most valuable asset – your data!
- Pseudonymisation of data
- Audit logs
- Vnets security
- Locked down production environments
- Infrastructure as code, Continuous Deployment, CI e.g. Jenkins
- Managing change securely and repeatably
- Data Science ≠ Big Data
- Data Science should be driven by the needs of the business
- Structured approach to Data Science – TDSP
- Use Microsoft Azure data services to empower employees to work together securely on their data science projects in a managed and repeatable way.