Adept Events
  • Home
  • Event Info
    • Customer service / FAQ
    • In-house Info
    • Online and Live Streaming
    • Sponsoring
    • Terms and Conditions
  • Speakers
  • Contact us
    • Contact us
    • Customer service / FAQ
    • Call for speakers DW&BI Summit
    • About us
    • Newsletter
    • Materials upload
  • English
    • Dutch
Watch Video
Watch Video

Designing, Operating and Managing an Enterprise Data Lake

Governing your Information across Hadoop, Cloud Storage, Data Warehouses, MDM & NoSQL Data Stores

Date Price Contact
October 31 - November 1, 2018 €1450,- p.p. seminars@adeptevents.nl
+31 (0)172 742680
Time Location
09:30 - 17:00 Amrath Hotel Lapershoek, Hilversum
Next EditionTYPE
May 20-21, 2019
Date Price
October 31 - November 1, 2018 €1450,- p.p.
Time
09:30 - 17:00
Location Contact
Amrath Hotel Lapershoek, Hilversum seminars@adeptevents.nl
+31 (0)172 742680
Next Edition
May 20-21, 2019
TYPE
Date
October 31 - November 1, 2018
Price
€1450,- p.p.
Time
09:30 - 17:00
Location
Amrath Hotel Lapershoek, Hilversum
Contact
seminars@adeptevents.nl
+31 (0)172 742680
Next Edition
May 20-21, 2019
TYPE
  • Overview
  • Course description
  • Registration fee
  • Speakers
  • Venue

Designing, Operating and Managing an Enterprise Data Lake

Most organisations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, content management (ECM) systems and more recently Big Data NoSQL platforms such as Hadoop and other NoSQL databases. In addition the number of data sources is increasing dramatically especially from outside the enterprise.  Given this situation it is not surprising that many companies have ended up managing information in silos with different tools being used to prepare and manage data across these systems with varying degrees of governance.  In addition, it is not only IT that is now integrating data. Business users are also getting involved with new self-service data preparation tools.  The question is, is this the only way to manage data? Is there another level that we can get reach to allow us to more easily manage and govern data across an increasingly complex data landscape?
ferguson_mike_sk2_1461_hr2

The Enterprise Data Lake seminar

This 2-day seminar looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage and share trusted high quality information in a distributed and hybrid computing environment.  It also explores a new approach of how IT data architects, business users and IT developers can collaborate together in building and managing a logical data lake to get control of your data. This includes data ingestion, automated data discovery, data profiling and tagging and publishing data in an information catalog. It also involves refining raw data to produce enterprise data services that can be published in a catalog available for consumption across your company.  We also introduce multiple data lake configurations including a centralised data lake and a ‘logical’ distributed data lake as well as execution of jobs and governance across multiple data stores. It emphasises the need for a common collaborative process and common approach to governing and managing data of all types.

Learning objectives

Attendees will learn:

  • How to define a strategy for producing trusted data as-a-service in a distributed environment of multiple data stores and data sources
  • How to organise data in a centralised or distributed data environment to overcome complexity and chaos
  • How to design, build, manage and operate a logical or centralised data lake within their organisation
  • The critical importance of an information catalog in understanding what data is available as a service
  • How data standardisation and business glossaries can help make sure data is understood
  • An operating model for effective distributed information governance
  • What technologies and implementation methodologies they need to get their data under control.
  • How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud.

Target Audience

This seminar is intended for business data analysts doing self-service data integration, data architects, chief data officers, master data management professionals, content management professionals, database administrators, big data professionals, data integration developers, and compliance managers who are responsible for data management.  This includes metadata management, data integration, data quality, master data management and enterprise content management. The seminar is not only for ‘Fortune 500 scale companies’ but for any organisation that has to deal with Big Data, small data, multiple data stores and multiple data sources. It assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.

Discount for DAMA members

Members of the DAMA NL Dutch chapter as well as Belux or International chapter are eligible for ten percent discount.
DAMA-I_logo_black

At the top of this page you can download the PDF brochure of this workshop.

Mike Ferguson

Managing Director
Intelligent Business Strategies Ltd.
Mike Ferguson is an analyst and consultant and specialises in business intelligence / analytics, data management, big data and enterprise business integration. Mike has consulted for dozens of companies and teaches popular classes in Big Data, New Technologies for Data Warehousing and BI.

Read more

This event takes place at:

Hotel Lapershoek
Utrechtseweg 16
1213 TS  Hilversum
The Netherlands
Telephone +31 (0) 35-6231341

For a full itinerary, please see the website of the Amrath Hotel.

The Hotel Lapershoek can also be reached by public transport. Be sure to take the train to ‘Station Hilversum Sportpark’ from which it is only a three minute walk.
Please consult www.9292.nl (door-to-door journey planner, also available in English) or call 0900-9292 (travel advice by phone, € 0.70 p/m).

For attendees interested in an overnight stay, we have made a special price agreement with the hotel. Please let us know if you wish to make use of this.

The course starts at 09.30 am and ends at 5 pm. Registration commences at 08.30 am.

MODULE 1: STRATEGY & PLANNING

This session introduces the data lake together with the need for a data strategy and looks at the reasons why companies need it. It looks at what should be in your data strategy, the operating model needed to implement, the types of data you have to manage and the scope of implementation. It also looks at the policies and processes needed to bring your data under control

  • The ever increasing distributed data landscape
  • The siloed approach to managing and governing data
  • IT data integration, self-service data wrangling or both? – data governance or data chaos?
  • Key requirements for data management
    • Structured data – master, reference and transaction data
    • Semi-structured data – JSON, BSON, XML
    • Unstructured data – text, video
    • Re-usable services to manage data
  • Dealing with new data sources – cloud data, sensor data, social media data, smart products (the internet of things)
  • Understanding scope of your data lake
    • OLTP system sources
    • Data Warehouses
    • Big Data systems e.g. Hadoop
    • MDM and RDM systems
    • Data virtualisation
    • Streaming data
    • Enterprise Content M’gmt
  • Building a business case for data management
  • Defining an enterprise data strategy
  • A new inclusive approach to governing and managing data
  • Introducing the data lake and data refinery
  • Data lake configurations – what are the options?
  • Centralised, distributed or logical data lakes
  • Information Supply Chain use cases – establishing a multi-purpose data lake
  • The rising importance of an Information catalog
  • Key technology components in a data lake 
  • Hadoop as a data staging area and why it is not enough
  • Implementation run-time options – the need to execute in multiple environments
  • Integrating a data lake into your enterprise analytical architecture

MODULE 2: INFORMATION PRODUCTION METHODOLOGIES

Having understood strategy, this session looks at why  information producers need to make use of multiple methodologies in a data lake information supply chain to product trusted structured and multi-structured data for information consumers to make use of, to drive business value

  • Information production and information consumption
  • A best practice step-by-step methodology structured data governance
  • Why the methodology has to change for semi-structured and unstructured data
  • Methodologies for structured vs multistructured data

MODULE 3: DATA STANDARDISATION, THE BUSINESS GLOSSARY AND THE INFORMATION CATALOG

This session looks at the need for data standardisation of structured data and of new insights from processing unstructured data. The key to making this happen is to create common data names and definitions for your data to establish a shared business vocabulary (SBV). The SBV should be defined and stored in a business glossary and is important for information consumers to understand published data in a data lake. It also looks at the emergence of more powerful information catalog software and how business glossaries have become part of what a catalog offers

  • Semantic data standardisation using a shared business vocabulary within an information catalog
  • The role of a common vocabulary in MDM, RDM, SOA, DW and data virtualisation
  • Why is a common vocabulary relevant in a data lake and a Logical Data Warehouse?
  • How does an SBV apply to data in a Hadoop data lake?
  • Approaches to creating a common vocabulary
  • Business glossary products storing common business data names, e.g. Alteryx Connect Glossary, ASG, Collibra, Global IDs, Informatica, IBM Information Governance Catalog, Microsoft Azure Data Catalog Business Glossary, SAP Information Steward Metapedia, SAS Business Data Network, TIBCO Information Server
  • Planning for a business glossary
  • Organising data definitions in a business glossary
  • Key roles and responsibilities – getting the operating model right to create and manage an SBV
  • Formalising governance of business data names, e.g. the dispute resolution process
  • Business involvement in SBV creation
  • Beyond structured data – from business glossary to information catalog
  • What is an Information Catalog?
  • Why are information catalogs becoming critical to data mangement?
  • Information catalog technologies, e.g. Alation, Alteryx Connect, Amazon Glue, Apache Atlas, Collibra Catalog, IBM Information Governance Catalog & Watson Knowledge Catalog, Informatica EIC & Live Data Map, Microsoft Azure Data Catalog, Podium Data, Waterline Data, Zaloni Mica
  • Information catalog capabilities

MODULE 4: ORGANISING AND OPERATING THE DATA LAKE

This session looks at how to organise data to still be able to manage it in a complex data landscape. It looks at zoning, versioning, the need for collaboration between business and IT and the use of an information catalog in managing the data

  • Organising data in a centralised or distributed data lake
  • Creating zones to manage data
  • New requirements for managing data in centralised and distributed data lakes
  • Creating collaborative data lake projects
  • Hadoop as a staging area for enterprise data cleansing and integration
  • Core processes in data lake operations
  • The data ingestion process
  • Tools and techniques for data ingestion
  • Implementing systematic disparate data and data relationship discovery using Information catalog software
  • Using domains and machine learning to automate and speed up data discovery and tagging
  • Alation, IBM Watson Knowledge Catalog, Informatica CLAIRE, Silwood, Waterline Data Smart Data Catalog
  • Automated profiling and tagging and cataloguing of data
  • Automated data mapping
  • The data classification and policy definition processes
  • Manual and automated data classification to enable governance
  • Using tag based policies to govern data

MODULE 5: THE DATA REFINERY PROCESS

This session looks at the process of refining data to get produce trusted information

  • What is a data refinery?
  • Key requirements for refining data
  • The need for multiple execution engines to run in multiple environments
  • Options for refining data – ETL versus self-service data preparation
  • Key approaches to scalable ETL data integration using Apache Spark
  • Self-service data preparation tools for Spark and Hadoop, e.g. Alteryx Designer, Informatica Intelligent Data Lake, IBM Data Refinery, Paxata, Tableau (Project Maestro), Tamr, Talend, Trifacta
  • Automated data profiling using analytics in data preparation tools
  • Executing data refinery jobs in a distributed data lake using Apache Beam to run anywhere
  • Approaches to integrating IT ETL and self-service data preparation
  • Apache Atlas Open Metadata & Governance
  • Joined up analytical processing from ETL to analytical workflows
  • Publishing data and data integration jobs to the information catalog
  • Mapping produced data of value into your DW and business vocabulary
  • Data provisioning – provisioning consistent information into data warehouses, MDM systems, NoSQL DBMSs and transaction systems
  • Provisioning consistent refined data using data virtualisation, a logical data warehouse and on-demand information services
  • Governing the provisioning process using rules-based metadata
  • Consistent data management across cloud and on-premise systems

MODULE 6: REFINING BIG DATA & DATA FOR DATA WAREHOUSES

This session looks at how the data refining processes can be applied to managing, governing and provisioning data in a Big Data analytical ecosystem and in traditional data warehouses. How do you deal with very large data volumes and different varieties of data? How do you load and process data in Hadoop? How should low-latency data be handled? Topics that will be covered include:

  • A walk through of end-to-end data lake operation to create a Single Customer View
  • Types of big data & small data needed for single customer view and the challenge of bringing it together
  • Connecting to Big Data sources, e.g. web logs, clickstream, sensor data, unstructured and semi-structured content
  • Ingesting and analysing clickstream data
  • The challenge of capturing external customer data from social networks
  • Dealing with unstructured data quality in a Big Data environment
  • Using graph analysis to identify new relationships
  • The need to combine big data, master data and data in your data warehouse
  • Matching big data with customer master data at scale
  • Governing data in a Data Science environment

MODULE 7: INFORMATION AUDIT & PROTECTION – THE FORGOTTON SIDE OF DATA GOVERNANCE

Over recent years we have seen many major brands suffer embarrassing publicity due to data security breaches that have damaged their brand and reduced customer confidence. With data now highly distributed and so many technologies in place that offer audit and security, many organisations end up with a piecemeal approach to information audit and protection. Policies are everywhere with no single view of the policies associated with securing data across the enterprise. The number of administrators involved is often difficult to determine and regulatory compliance is now demanding that data is protected and that organisations can prove this to their auditors.  So how are organisations dealing with this problem?  Are the same data privacy policies enforced everywhere? How is data access security co-ordinated across portals, processes, applications and data? Is anyone auditing privileged user activity? This session defines this problem, looks at the requirements needed for Enterprise Data Audit and Protection and then looks at what technologies are available to help you integrate this into you data strategy

  • What is Data Audit and Security and what is involved in managing it?
  • Status check – Where are we in data audit, access security and protection today?
  • What are the requirements for enterprise data audit, access security and protection?
  • What needs to be considered when dealing with the data audit and security challenge?
  • Automatic data discovery and the information catalog – a huge help in identifying sensitive data
  • What about privileged users?
  • Using a data management platform and information catalog to govern data across multiple data stores
  • Securing and protecting data using tag based policies in an information catalog
  • What technologies are available to protect data and govern it? – Apache Knox, Cloudera Sentry, Dataguise, Hortonworks Ranger, IBM (Watson Data Platform, Knowledge Catalog, Optim & Guardium), Imperva, Informatica Secure@Source, Micro Focus, Privitar
  • Can these technologies help in GDPR?
  • How do they integrate with Data Governance programs?
  • How to get started in securing, auditing and protecting your data.

 

Taking part in this two-day workshop will only cost 1305 Euro when registering 30 days beforehand and 1450 Euro per person after the Early Bird period expires (excl. 21% Dutch VAT). This also covers documentation, lunch, tea/coffee.

Members of the DAMA NL, Belux or UK Chapter are eligible for 10 percent discount on the registration fee.

In completing your registration form you declare that you agree with our Terms and Conditions.

Extra discounts
Discounts are available for group bookings of two or more delegates representing the same organization made at the same time. Ten percent off for the second and third delegate and fifteen percent off for all delegates when registering four or more delegates (all delegates must be listed on the same invoice).
This cannot be used in conjunction with other discounts.

Payment
Full payment is due prior to the event. An invoice will be sent to you containing our full bank details including BIC and IBAN. Your payment should always include the invoice number as well as the name of your company and the delegate name.

Payment by credit card is available for attendees from countries outside the IBAN region. This is not an automated process via our website but requires a manual transaction by phone or Skype. For Credit Card payment please contact our office by e-mail or through our contact form mentioning your phone number to obtain your credit card information. Never mention your credit card details in our registration form, contact form or in e-mail messages.

testimonials

Ralf Putter Business Consultant, KPN ICT Consulting

“Very usefull training! Much expertise and well organized.”

Peter Stretton Data Architect, Bugaboo International

“Excellent overview of a domain which is not well understood.”

Ralf Putter Business Consultant, KPN ICT Consulting

“Very usefull training! Much expertise and well organized.”

Peter Stretton Data Architect, Bugaboo International

“Excellent overview of a domain which is not well understood.”

Ralf Putter Business Consultant, KPN ICT Consulting

“Very usefull training! Much expertise and well organized.”

Peter Stretton Data Architect, Bugaboo International

“Excellent overview of a domain which is not well understood.”

In-house Info

Practically all of our seminars and workshops can be offered as an In-house course for your company exclusively. We can tailor with extra focus on specific topics that apply to your organization. Also available in online format or in face-to-face format with live video stream.

MORE INFO

RELATED EVENTS

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

27-10-2025
Hands-on workshop - register now!

Agile Data Warehouse Design & Dimensional Modeling

Collaborative BI Requirements Analysis & Dimensional Modeling Training A dimensional data modelling course presented by leading data warehousing expert and author Lawrence Corr, covering the latest agile techniques for systematically gathering Business Intelligence (BI) requirements and designing effective DW/BI systems. Based on 7W, star schema and BEAM approach.

Lawrence Corr

 October 27-29, 2025

Utrecht

View

Adept Events
KvK Den Haag: 56059825
E: seminars@adeptevents.nl
T: +31 (0)172 742680
M: +31 (0)6 113 118 60
W: www.adeptevents.nl

Release
www.release.nl
@Release_nl
Download the Release App

BI-Platform
www.biplatform.nl
@BIPlatform
Download the BI-Platform App

© Adept Events is a registered trademark of Array Media B.V.
Share to Twitter Share to Facebook Share to LinkedIn
© 2025 Array Media b.v. - All rights reserved | Privacy | Disclaimer