Pieter Adriaans Maarten van Someren

Slides:



Advertisements
Verwante presentaties
Defining a standard JSON-based exchange format for learning metadata Manon Haartsen.
Advertisements

Update on EduStandard: public-private platform in Dutch education Henk Nijstad, Kennisnet / november 2013.
Kwalificaties op EQF niveau 5
Help, ik moet naar Office 2007!?. Wat horen wij bij klanten Training “New UI will cause too big of a loss in productivity” Training “New UI will cause.
Een reis door SOA valkuilen: de Meavita case Jeroen J van Beele Meavita ICT-verandermanagement 17 januari 2007.
Inleiding: hoe media worden Rick Dolphijn. Paranoimia…
Een alternatief voorstel Naar aanleiding van bestudering van de IAASB voorstellen denkt de NBA na over een alternatief. Dit alternatief zal 26 september.
Deltion College Engels C1 Gesprekken voeren [Edu/002]/ subvaardigheid lezen thema: Order, order…. can-do : kan een bijeenkomst voorzitten © Anne Beeker.
Smart Style on the Semantic Web Lynda Hardman CWI, Multimedia and Human-Computer Interaction TU/e, Multimedia and Internet Technology.
MASTERPROJECT M1 · Groep Equilibrium Marieke Steenbeeke Rick van Veghel Tim de Veen MASTERPROJECT M1 ZERO ENERGY BUILDING Previous weeks · Zero.
Teams on the frontline Geert Stroobant De Heide - Balans
Biml en Data Vault.
Virgielcollege Mede mogelijk gemaakt door uw Eerstejaarsch Commissie.
Voorziening levensonderhoud Religieuze Instituten Paul Op Heij ‘s-Hertogenbosch, 25 september 2013 The future depends on what you do today.
Vaardig? Een spectrum aan vaardigheden! Van informatie- naar media- naar exploratievaardig? Of e-Research & e-learning literate? Collaboration literate??
Accessible Instructional Materials. § Discussion: Timely access to appropriate and accessible instructional materials is an inherent component.
Process Mining: Discovery and Analysis of process-aware environments using event logs Eindhoven University of Technology Department of Computer Science.
Nieuwe wegen in ontwerpen met CAD
Identification Documents Port of Ghent All documents in this leaflet are copies of identification/legitimation documents that authorise persons to access.
SCENARIO BASED PRODUCT DESIGN
1 HOORCOLLEGE Customer Relationship Management
Woensdag 23 juli 2014 volgende vorige algemeen ziekenhuis Sint-Jozef Malle Dementia pathway: a condition specific approach Patrick De Wit, MD Thierry Laporta,
In samenwerking met het Europees Sociaal Fonds en het Hefboomkrediet The role of APEL in career coaching and competence management Competence navigation.
Specialismen Analyse en verificatie van protocollen Analyse van Petri-netten Component-specificatie Web-based information systems (Query)talen voor Web.
Software Engineering Sommerville, Ian (2001) Software Engineering, 6 th edition Ch.1-3
A South-African Building Renaissance Onderzoeksbespreking november 2004.
ontwik idee - keling dag 3 goals today Develop “criteria” to help you evaluate & select your ideas Some tools from Tassouls book to help you do this.
Enterprise Application Integration Walter Moerkerken Ilona Wilmont Integratie Software Systemen 8 mei 2006.
IOP and Vrije Universiteit1 Example of bad interface  Windows: Use Start to Stop.
De digitale coach Het verbeteren van een plan van aanpak Steven Nijhuis, coördinator projecten FNT Deze presentatie staat op:
Coping with inclusion in primary schools Innovation in practical training.
A South-African Building Renaissance Onderzoeksbespreking november 2004.
Vrije Universiteit amsterdamPostacademische Cursus Informatie Technologie Universal Modeling Language … why you need models? Models are necessary to communicate,
Automation SolutionsMFG/Pro Dutch usergroup 8 februari 2007 ISA S88 & S95 Het gebruik van deze normen in de productie.
Tussentoets Digitale Techniek. 1 november 2001, 11:00 tot 13:00 uur. Opmerkingen: 1. Als u een gemiddeld huiswerkcijfer hebt gehaald van zes (6) of hoger,
From computer power and human reason. Joseph Weizenbaum.
Geheugen, distributie en netwerken Netwerken: de basis voor distributie van gegevens en taken (processen) –bestaan zo’n 40 jaar, zeer snelle ontwikkeling.
Statistische of niet-statistische steekproeven
Organizing Organization is the deployment of resources to achieve strategic goals. It is reflected in Division of labor into specific departments & jobs.
Ontwikkeling van een organisatie door evolutie en revolutie
Motivation One secret for success in organizations is motivated and enthusiastic employees The challenge is to keep employee motivation consistent with.
Deltion College Engels B1 En Spreken/Presentaties [Edu/007] Thema: Soap(s) can-do : kan met enig detail verslag doen van ervaringen, in dit geval, rapporteren.
Deltion College Engels En Projectopdracht [Edu/001] thema: research without borders can-do/gesprekken voeren : 1. kan eenvoudige feitelijke informatie.
Deltion College Engels B1 Schrijven [Edu/004]/ subvaardigheid lezen thema: reporting a theft can-do : kan formulieren waarin meer informatie gevraagd wordt,
Deltion College Engels C1 Gesprekken voeren [Edu/006] thema: ‘I was wondering what you think of…’ can-do : kan deelnemen aan de conversatie bij zeer formele.
Writing exercise This one goes into your language portfolio!!! You have until the end of the week to hand it in… (So you have a little longer than it says.
1 december KC Development Tools Hands-on Oracle HTML DB v2.0.
Rational Unified Process RUP Jef Bergsma. Iterations –Inception –Elaboration –Construction –Transition De kernbegrippen (Phases)
Blended Learning. content Waarom wij e-learning hebben gebruikt Demo van de module Voorlopige resultaten van effecten op gebruikers.
Towards an implementation of Ethnographic Monitoring Jef van der Aa, Jan Blommaert & Massimiliano Spotti Babylon / Dept. of Culture Studies Faculty of.
"Genetisch Gewijzigde Organismen in relatie tot de voedselvoorziening in 't algemeen, en in 't bijzonder in ontwikkelingslanden” Discussie Forum 28 Januari.
Novel Technologies to assess Gut Health Claims for Carbohydrates Lubbert Dijkhuizen & Fons Voragen 14 juni 2011.
Ontbijtsessie 2 juli 2014 Kwaliteitsverbeteringen in Infra Projecten.
Combining pattern-based and machine learning methods to detect definitions for eLearning purposes Eline Westerhout & Paola Monachesi.
EML en IMS Learning Design
TRENDS and impact on your Business
Usability metrics Gebruiksvriendelijkheid ISO Effectiveness Efficiency Satisfaction Learnability Flexibility En nu? Inleiding Hoe gaan we de gebruiksvriendelijkheid.
WISKUNDIG MODELLEREN KUNST OF KUNDE? 11 november 2009 Jaap Praagman CQM.
Sharing best practices By Exar - Reinbouwgroep 28 november 2014 Peter Reinders.
Duurzame oplossingen het FIETS model. Het FIETS model Een eenvoudig model dat je helpt om na te denken over duurzame oplossingen. Het gaat om deze aspecten:
1 OMI Modelleren van content. 2 Vocabulary Content “gevangen” in begrippenapparaat: Vocabulary: lijst met termen nauwelijks semantiek Ontology:
Minor Project- en Programmamanagement
The Research Process: the first steps to start your reseach project. Graduation Preparation
Key Process Indicator Sonja de Bruin
Innovatie met IBM Cloud Orchestrator.
Finance Matters CoP Case studies
IBM Software A vehicle manufacturer deploys business rules in one hour instead of a week IBM Operational Decision Manager software helps speed new business.
G Project Sponsor & Business Lead : Anna Stamp & Jane Johnston
Gifted and Talented Deer /Mt. Judea 2019.
EULAR Study Group on Epidemiology
Transcript van de presentatie:

Pieter Adriaans Maarten van Someren Leren en Beslissen Pieter Adriaans Maarten van Someren

Doel Meer leren over lerende systemen en Data Mining door dit toe te passen op een practisch probleem Drie problemen: gedrag van zeilschip binnenklimaat gebouw gedrag computersysteem

Organisatie 4 weken teams van 2 a 3 studenten doel: begeleiding: probleem oplossen verslag uitbrengen begeleiding: elke woensdag plenaire bijeenkomst over voortgang elke week 1 bijeenkomst groepje + begeleider vrijdag 4 feb. presentaties resultaten maandag 7 feb. verslag inleveren

Vandaag 11-12 12.30 – 14.00 opgedeeld naar problemen: Data Mining methodiek (MvS) Korte inleiding problemen Verdeling teams / problemen 12.30 – 14.00 opgedeeld naar problemen: details probleem afspraken

Data Mining Gegeven data min of meer: probleem probleem: model van data doel: voorspelling, verklaring van specifieke observaties uitzonderingen herkennen gewoon interessant ...

Problemen Data vervuild: incompleet typefouten, meetfouten, rare dingen probleem onduidelijk: wel in hoofd opdrachtgever, maar niet in termen van de data te weinig / te veel data

Methodiek Stappen met produkten cyclisch / waterval systematisch werken is vaak nodig: samenwerking (wie doet wat) beheersen van duur/omvang/resultaat/kosten hergebruik van tussenresultaten meest gebruikt: CRISP

Stappen Business understanding: formuleer probleem van de opdrachtgever in termen van de data. product: probleem itv. data Data understanding: basisinformatie over data verdelingen, ontbrekende data, relevantie Data preparation: voorbereiden eigenlijke ML resultaat: data met minder fouten ...

Stappen Evaluation: Deployment: accuracy etc.? oplossing van business problem? product: hoe goed is oplossing? Deployment: verslag aan opdrachtgever toepassingen van oplossing in systeem

Preprocessing kost veel tijd!!

CRISP-DM: A Standard Process Model for Data Mining http://www.crisp-dm.org/

What is CRISP-DM? Cross-Industry Standard Process for Data Mining Aim: To develop an industry, tool and application neutral process for conducting Knowledge Discovery Define tasks, outputs from these tasks, terminology and mining problem type characterization Founding Consortium Members: DaimlerChrysler, SPSS and NCR CRISP-DM Special Interest Group ~ 200 members Management Consultants Data Warehousing and Data Mining Practitioners

Four Levels of Abstraction Phases Example: Data Preparation Generic Tasks A stable, general and complete set of tasks Example: Data Cleaning Specialized Task How is the generic task carried out Example: Missing Value Handling Process Instance Example: The mean value for numeric attributes and the most frequent for categorical attributes was used

Data Mining Context In data mining the context is defined by four dimensions Application domain: Medical Prognosis Data Mining Problem Type: Regression Technical Aspect: Censored Observations Tools and Techniques: Cox’s Regression, CIL’s GENNA The context of the data mining task at hand is the starting point for mapping the generic tasks to specific tasks required in this instance

Phases of CRISP-DM Not linear, repeatedly backtracking

Business Understanding Phase Understand the business objectives What is the status quo? Understand business processes Associated costs/pain Define the success criteria Develop a glossary of terms: speak the language Cost/Benefit Analysis Current Systems Assessment Identify the key actors Minimum: The Sponsor and the Key User What forms should the output take? Integration of output with existing technology landscape Understand market norms and standards

Business Understanding Phase Task Decomposition Break down the objective into sub-tasks Map sub-tasks to data mining problem definitions Identify Constraints Resources Law e.g. Data Protection Build a project plan List assumptions and risk (technical/financial/business/ organisational) factors

Data Understanding Phase Collect Data What are the data sources? Internal and External Sources (e.g. Axiom, Experian) Document reasons for inclusion/exclusions Depend on a domain expert Accessibility issues Legal and technical Are there issues regarding data distribution across different databases/legacy systems Where are the disconnects?

Data Understanding Phase II Data Description Document data quality issues requirements for data preparation Compute basic statistics Data Exploration Simple univariate data plots/distributions Investigate attribute interactions Data Quality Issues Missing Values Understand its source: Missing vs Null values Strange Distributions

Data Preparation Phase Integrate Data Joining multiple data tables Summarisation/aggregation of data Select Data Attribute subset selection Rationale for Inclusion/Exclusion Data sampling Training/Validation and Test sets

Data Preparation Phase II Data Transformation Using functions such as log Factor/Principal Components analysis Normalization/Discretisation/Binarisation Clean Data Handling missing values/Outliers Data Construction Derived Attributes

The Modelling Phase Select of the appropriate modelling technique Data pre-processing implications Attribute independence Data types/Normalisation/Distributions Dependent on Data mining problem type Output requirements Develop a testing regime Sampling Verify samples have similar characteristics and are representative of the population

The Modelling Phase Build Model Assess the model Choose initial parameter settings Study model behaviour Sensitivity analysis Assess the model Beware of over-fitting Investigate the error distribution Identify segments of the state space where the model is less effective Iteratively adjust parameter settings Document reasons of these changes

The Evaluation Phase Validate Model Review Process Human evaluation of results by domain experts Evaluate usefulness of results from business perspective Define control groups Calculate lift curves Expected Return on Investment Review Process Determine next steps Potential for deployment Deployment architecture Metrics for success of deployment

The Deployment Phase Knowledge Deployment is specific to objectives Knowledge Presentation Deployment within Scoring Engines and Integration with the current IT infrastructure Automated pre-processing of live data feeds XML interfaces to 3rd party tools Generation of a report Online/Offline Monitoring and evaluation of effectiveness Process deployment/production Produce final project report Document everything along the way