This is a presentation I gave at one of our building secure workloads seminars earlier this year.

Overview of Advanced Analytics & AI

Key trends

  • Accelerating adoption of AI by developers (consuming models)
  • Rise of hybrid training and scoring scenarios
  • Push scoring/inference to the event (edge, cloud, on-premise)
  • Some developers moving into deep learning as non-traditional path to DS / AI dev
  • Growth of diverse hardware arms race across all form factors (CPU / GPU / FPGA / ASIC / device

Challenges

  • Data preparation
  • Model deployment & management
  • Model lineage & auditing
  • Explain-ability
  • Security

Data Science & AI is not…

  • Big Data
  • Business Intelligence
  • Creating beautiful visualisations just because we can!

Look at the following picture. Looks pretty awesome right?

At first glance it’s pretty insightful showing us how everyone on the world is connected by Facebook. But what does it actually tell us?

The western modernised world with access to the latest tech and resources has more Facebook users than other parts of the world.

Once we start to look below the gloss we see what’s actually going on.

A Structured Approach to Data Science

  • KDD (Knowledge Discovery in Databases) in 1990s
  • CRISP-DM (Cross Industry Standard Process for Data Mining)
  • Defines 6 stages:
  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modelling
  5. Evaluation
  6. Deployment

The Microsoft Team Data Science Process

  • A data science life-cycle definition
  • A standardised project structure
  • Infrastructure and resources for data science projects
  • Tools and utilities for project execution

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/

Ensure data integrity with data sources from the cloud and on premise are secured using the 14 principles of cloud security.

Identity management, encryption at rest and in transit.

Row level security for DBs etc.

Model security – management, version control etc.

Deployment – operational management and security

Infrastructure and resources for data science projects

  • Cloud file systems for storing datasets
  • Databases
  • Big data clusters (Hadoop, Spark/Databricks)
  • Machine learning service

Version control

Azure Active Directory

Role Based Access (RBAC)

Conditional access

The AI Development Life-cycle

 

Machine Learning & AI Portfolio

When to use what?

Modern Data Architecture – Azure PaaS Data Services

Advantages of an all PaaS platform – this is a much more joined up approach.

All data in cloud means you can unlock the potential insights of your most valuable asset – your data!

Customer Example

  • Pseudonymisation of data
  • RBAC
  • Audit logs
  • Vnets security
  • Locked down production environments
  • Infrastructure as code, Continuous Deployment, CI e.g. Jenkins
  • Managing change securely and repeatably

Key Takeaways

  • Data Science ≠ Big Data
  • Data Science should be driven by the needs of the business
  • Structured approach to Data Science – TDSP
  • Use Microsoft Azure data services to empower employees to work together securely on their data science projects in a managed and repeatable way.