Www.isocat.org RELcat and friends 10 May /20111CLARIN-NL ISOcat workshop.

Slides:



Advertisements
Verwante presentaties
Advancing Dutch Through Intensive Instruction: Planning for Success with Backward Design and Social Media Language Matters Series Texas Language Center.
Advertisements

Defining a standard JSON-based exchange format for learning metadata Manon Haartsen.
Update on EduStandard: public-private platform in Dutch education Henk Nijstad, Kennisnet / november 2013.
Requirements -People are able to make their own memorial page, called a memori -The website will be build first in Dutch for extension.nl, then copied.
Een alternatief voorstel Naar aanleiding van bestudering van de IAASB voorstellen denkt de NBA na over een alternatief. Dit alternatief zal 26 september.
Deltion College Engels C1 Gesprekken voeren [Edu/002]/ subvaardigheid lezen thema: Order, order…. can-do : kan een bijeenkomst voorzitten © Anne Beeker.
Smart Style on the Semantic Web Lynda Hardman CWI, Multimedia and Human-Computer Interaction TU/e, Multimedia and Internet Technology.
Internet vriendschap Internet friendship
Teams on the frontline Geert Stroobant De Heide - Balans
Voorziening levensonderhoud Religieuze Instituten Paul Op Heij ‘s-Hertogenbosch, 25 september 2013 The future depends on what you do today.
Vaardig? Een spectrum aan vaardigheden! Van informatie- naar media- naar exploratievaardig? Of e-Research & e-learning literate? Collaboration literate??
Accessible Instructional Materials. § Discussion: Timely access to appropriate and accessible instructional materials is an inherent component.
Nieuwe wegen in ontwerpen met CAD
High quality internet for higher Education and Research 1 TF-LCPM: Exchanging new ideas New ideas within SURFnet Sharing with other NRENs
zaterdag 19 juli 2014 Saturday, 19 July 2014 I see what you don’t see I come from another galaxy My earthal life was not the intention I was meant.
Beyond Big Grid – Amsterdam 26 september 2012 Enquette 77 ingevulde enquettes, waarvan 60 met gebruikservaring = Mainly Computer Science.
PROJECTCOMPETENCE MANAGEMENT SCREENCompetenciesEdit1 DESCRIPTIONCompetencies in the “Competentie beheer” is a link to the editwizard for competencies.
SQL injections en meer... PERU. web application vulnerabilities Cross Site Scripting (21.5%) SQL Injection (14%) PHP includes (9.5%) Buffer overflows.
Woensdag 23 juli 2014 volgende vorige algemeen ziekenhuis Sint-Jozef Malle Dementia pathway: a condition specific approach Patrick De Wit, MD Thierry Laporta,
“Drawing your Mobility Map” (cf. A. Gohard-Radenkovic) Meertalige competencies & interculturele mediation Utrecht 2010 M-C. Kok Escalle.
In samenwerking met het Europees Sociaal Fonds en het Hefboomkrediet The role of APEL in career coaching and competence management Competence navigation.
ontwik idee - keling dag 3 goals today Develop “criteria” to help you evaluate & select your ideas Some tools from Tassouls book to help you do this.
Enterprise Application Integration Walter Moerkerken Ilona Wilmont Integratie Software Systemen 8 mei 2006.
De digitale coach Het verbeteren van een plan van aanpak Steven Nijhuis, coördinator projecten FNT Deze presentatie staat op:
De digitale coach Het verbeteren van een plan van aanpak Steven Nijhuis, coördinator projecten FNT Deze presentatie staat op:
zondag 3 augustus 2014 Click Klik Sunday, 03 August 2014.
Bedrijfsspecifieke extensies Standaard Rekeningschema
Vrije Universiteit amsterdamPostacademische Cursus Informatie Technologie Universal Modeling Language … why you need models? Models are necessary to communicate,
1 Linguistic Research And The CLARIN Infrastructure Jan Odijk Digital Humanities Lecture, Utrecht 23 Oct 2012.
Organizing Organization is the deployment of resources to achieve strategic goals. It is reflected in Division of labor into specific departments & jobs.
Motivation One secret for success in organizations is motivated and enthusiastic employees The challenge is to keep employee motivation consistent with.
Deltion College Engels B1 Gesprek voeren [Edu/001]
Deltion College Engels C1 Schrijven [Edu/002] thema: CV and letter of application can-do : kan complexe zakelijke teksten schrijven © Anne Beeker Alle.
Deltion College Engels B1 Gesprekken voeren [Edu/005] thema: applying for a job can-do : kan een eenvoudig sollicitatiegesprek voeren © Anne Beeker Alle.
Deltion College Engels B1 Gesprekken voeren [Edu/007] theme: Can I have my money back… can-do : kan minder routinematige situaties aan © Anne Beeker Alle.
Deltion College Engels C1 Gesprekken voeren [Edu/004]/ thema: There are lies, damned lies and statistics... can-do : kan complexe informatie en adviezen.
Deltion College Engels B2 Schrijven [Edu/004] thema: (No) skeleton in the cupboard can-do: kan een samenhangend verhaal schrijven © Anne Beeker Alle rechten.
Deltion College Engels C1 Luisteren [Edu/001] thema: It’s on tv can-do : kan zonder al te veel inspanning tv-programma’s begrijpen.
Deltion College Engels B2 Gesprekken voeren [Edu/006]/subvaardigheid schrijven notulen en kort voorstel thema: ‘What shall we do about non- active group.
Deltion College Engels B1 En Spreken/Presentaties [Edu/007] Thema: Soap(s) can-do : kan met enig detail verslag doen van ervaringen, in dit geval, rapporteren.
Deltion College Engels En Projectopdracht [Edu/001] thema: research without borders can-do/gesprekken voeren : 1. kan eenvoudige feitelijke informatie.
Deltion College Engels C1 Spreken/Presentaties [Edu/006] thema ‘I hope to convince you of… ‘ can-do : kan een standpunt uiteenzetten voor een publiek van.
Deltion College Engels B1 Schrijven [Edu/004]/ subvaardigheid lezen thema: reporting a theft can-do : kan formulieren waarin meer informatie gevraagd wordt,
Writing exercise This one goes into your language portfolio!!! You have until the end of the week to hand it in… (So you have a little longer than it says.
Telecommunicatie en Informatieverwerking UNIVERSITEIT GENT Didactisch materiaal bij de cursus Academiejaar
Telecommunicatie en Informatieverwerking UNIVERSITEIT GENT Didactisch materiaal bij de cursus Academiejaar
Woorden als or, and, but, when, because, so en since gebruiken we om twee zinsdelen te koppelen. Voorbeeld in het Nederlands: De dvd was erg duur maar.
DARE SUMMER SCHOOL Metadata Peter van Huisstede / Ursula Oberst 28 juni 2005.
All Right! 1 thv Unit 4 grammar 2.1 and 2.2.
Kenmerken van een persoonlijke brief
Rational Unified Process RUP Jef Bergsma. Iterations –Inception –Elaboration –Construction –Transition De kernbegrippen (Phases)
Combining pattern-based and machine learning methods to detect definitions for eLearning purposes Eline Westerhout & Paola Monachesi.
EML en IMS Learning Design
Sustainable employability in Tourism The human factor October 24, 2014 Where Europe Meets the Americas.
Het geheim van Linked Data Marcel ReuversGeonovum CB-NL 20 november 2014.
Deltion College Engels B1 Lezen [no. 001] can-do : 2 products compared.
Deltion College Engels B1 Gesprekken voeren [Edu/006] thema: Look, it says ‘No smoking’… can-do : kan minder routinematige zaken regelen © Anne Beeker.
Deltion College Engels B1 En Spreken/Presentaties [Edu/006] Thema: “The radio station“ can-do : kan een publiek toespreken, kan verzonnen gebeurtenissen.
Deltion College Engels C1 Schrijven [Edu/007] thema: Mind twister or how to write an essay… can-do : kan heldere, goed gestructureerde uiteenzetting schrijven.
Nothing Is As It Seems Lesson 7 What’s the Story?.
Deltion College Engels B2 Spreken [Edu/001] thema: What’s in the news? can-do : kan verslag doen van een gebeurtenis en daarbij meningen met argumenten.
Deltion College Engels B1 Spreken [Edu/001] thema: song texts can-do : kan een onderwerp dat mij interesseert op een redelijk vlotte manier beschrijven.
Deltion College Engels B1 Lezen [Edu/002] thema: But I ‘ve read it in… can-do : kan hoofdthema en belangrijkste argumenten begrijpen van eenvoudige teksten.
Mavo 4.  Goal(s)  Letter Puzzle  Write a letter  Check the letters  Do assignments 4A, 5A, 6A & 7 in Student Book page 50  Evaluation.
The Research Process: the first steps to start your reseach project. Graduation Preparation
Key Process Indicator Sonja de Bruin
SDI from a technological perspective: Architecture
IBM Software A vehicle manufacturer deploys business rules in one hour instead of a week IBM Operational Decision Manager software helps speed new business.
Chapter 1: Introduction
The student will be able to:
Transcript van de presentatie:

RELcat and friends 10 May /20111CLARIN-NL ISOcat workshop

10 May /2011CLARIN-NL ISOcat workshop2 Linguistic resources Data category registries Relation registries MPI DCR ISO DCR Typological Database System RRMPI RR MPI archive TDS databaseresource Vision

10 May /2011CLARIN-NL ISOcat workshop3 How can you use a Data Category Registry? • You can: – Find Data Categories relevant for your resources and embed references to them so the semantics of (parts of) your resources are made explicit • This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor directly interact with ISOcat – Interact with Data Category owners to improve (the coverage of) their Data Categories – Create (together with others) new Data Categories and/or selections needed for your resources and share those – Submit (your) Data Categories for standardization – Free of charge – Grass roots approach 

10 May /2011CLARIN-NL ISOcat workshop4 Referencing Data Categories • Each Data Category should be uniquely identifiable – Ambiguity: different domains use the same term but mean different ‘things’ – Semantic rot: even in the same domain the meaning of a term changes over time – Persistence: for archived resources Data Category references should still be resolvable and point to the specification as it was at/close to time of creation • Persistent IDentifiers – ISO 24619:2011 Language resource management -- Persistent identification and access in language technology applications  ISOcat uses ‘cool URIs’ • (/grammaticalGender/) 

10 May /2011CLARIN-NL ISOcat workshop5 Where do you put these references? • Preferably in a schema • Or in the resource itself (redundant) • Or in the metadata of the resource (less specific)

What is a schema? • “comes from the Greek word "σχήμα" (skhēma), which means shape, or more generally, plan.” (wikipedia) • A collection of building blocks and rules on how to combine them into a valid resource – XML document: • DTD, XML Schema, Relax NG, … • easy; see – Text document: • A grammar – Extended Backus–Naur Form (EBNF) –... • how to embed Data Category PIDs? – … 10 May /2011CLARIN-NL ISOcat workshop6

XML resource 10 May /2011CLARIN-NL ISOcat workshop7 nihongo … … …

XML Relax NG schema 10 May /2011CLARIN-NL ISOcat workshop8 ipa …

CGN/DCOI grammar tag = pos '(' feat* ')‘ pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW' feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' | 'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE' NTYPE = 'soortnaam' | 'eigennaam' GETAL = 'enkelvoud' | 'meervoud' | 'getal' GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief' GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig' NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief' POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij' BUIGING = 'zonder' | 'met-e' | 'met-s' GETAL-N = 'zonder-n' | 'meervoud-n' WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘ PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief' PVAGR = 'enkelvoud' | 'meervoud' | 'met-t' NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord' VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald' PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar' PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o' STATUS = 'vol' | 'gereduceerd' | 'nadruk' NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv' LWTYPE = 'bepaald' | 'onbepaald' VZTYPE = 'initieel' | 'versmolten' | 'naal' CONJTYPE = 'nevenschikkend' | 'onderschikkend' SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect' 10 May /2011CLARIN-NL ISOcat workshop9

CGN/DCOI grammar with DC references tag = pos '(' feat* ')' ‘WW’ ‘TW’ ‘VG’ ‘TSW’ pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW' feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' | 'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE' NTYPE = 'soortnaam' | 'eigennaam' GETAL = 'enkelvoud' | 'meervoud' | 'getal' GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief' GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig' NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief' POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij' BUIGING = 'zonder' | 'met-e' | 'met-s' GETAL-N = 'zonder-n' | 'meervoud-n' WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘ PVTIJD ‘verleden’ ‘conjunctie’ PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief' PVAGR = 'enkelvoud' | 'meervoud' | 'met-t' NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord' VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald' PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar' PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o' STATUS = 'vol' | 'gereduceerd' | 'nadruk' NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv' LWTYPE = 'bepaald' | 'onbepaald' VZTYPE = 'initieel' | 'versmolten' | 'naal' CONJTYPE = 'nevenschikkend' | 'onderschikkend' SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect' 10 May /2011CLARIN-NL ISOcat workshop10 Or use: • an XML representation for the schema • an XML schema for an XML representation of the content

10 May /2011CLARIN-NL ISOcat workshop11 Linguistic resources Data category registries Relation registries MPI DCR ISO DCR Typological Database System RRMPI RR MPI archive TDS databaseresource Vision 

Multiple DCRs? • Actually we don’t need multiple DCRs to have overlapping subsets – Overlaps are created due to • Data categories are typed, and might not have the type you need • External sets are imported just as they are – NKJP, GOLD, STTS, MDF, … – Only some take the effort to also provide mappings • There might be very fine differences between your data category and an existing one, and the owner doesn’t want to adapt • Still we would like to know that these data categories are the same or almost the same 10 May /2011CLARIN-NL ISOcat workshop12

10 May /2011CLARIN-NL ISOcat workshop13 Which data category types? • TDGs give DC types based on some reference model: – Metadata: CMDI – Morphosyntax: LMF – Terminology: TBX – POS field (closed DC) of the lexical entry “walk” gets the value ‘verb’ (simple DC) – PoS = ‘verb’ • If the DC type doesn’t fit your needs: – Verb (open DC) feature of a feature structure gets the value “walk” – Verb = ‘walk’ – Unfortunately the DCR data model hasn’t facilities yet to share a semantic core between various types – Create your own and add a sameAs relationship to RELcat

10 May /2011CLARIN-NL ISOcat workshop14 Relation Registry - RELcat • The Relation Registry is basically a triple store: – Subject: resource 1 – Predicate: relationship – Object: resource 2 • Example: # /first plural exclusive vernacular/ is-a /vernacular/ isocat:DC-1234 rel:subClassOf isocat:DC-4. # /first person/ part-of /first plural exclusive vernacular/ isocat:DC-1 rel:partOf isocat:DC # /plural/ part-of /first plural exclusive vernacular/ isocat:DC-2 rel:partOf isocat:DC # /exclusive/ part-of /first plural exclusive vernacular/ isocat:DC-3 rel:partOf isocat:DC-1234.

10 May /2011CLARIN-NL ISOcat workshop15 Relation type taxonomy • rel:related – rel:sameAs (symmetric) – rel:almostSameAs (symmetric) – rel:narrower (inverse of rel:broader) • rel:superClassOf (inverse of rel:subClassOf) – rel:broader (inverse of rel:narrower) • rel:subClassOf (inverse of rel:superClassOf) • rel:partOf – rel:directPartOf – rel:indirectPartOf Extensible taxonomy inspired by OWL and SKOS, which other relation types might be needed or other standard sets should be included? ! To cover n-ary relationships more advanced RDF constructs are probably needed.

n-ary relationships 10 May /2011CLARIN-NL ISOcat workshop16 blank V-1F-1 rel:hasValuerel:hasFeature T-1 rel:hasPart blank V-nF-n rel:hasValuerel:hasFeature rel:hasPart …

RELcat status • A first alpha version is running on the ISOcat test server – Only a web services interface, i.e., no user interface yet – Import of sets relations done by the system administrator – Initial sets are imported to support the CMDI semantic search • • – =dc:language =dc:language • For now your relations can be send to the system administrator in a simple spreadsheet with three columns – Subject Data Category – Relationship (see previous slide or suggest a new one) – Object Data Category 10 May /2011CLARIN-NL ISOcat workshop17

A new kitten: SCHEMAcat • Resource schema’s should be stored somewhere persistently – Get a PID, i.e., a handle • These schema’s can/should be annotated with data categories – SCHEMAcat  ISOcat • These data categories will have (typed) relationships among eachother – SCHEMAcat  RELcat 10 May /2011CLARIN-NL ISOcat workshop18

Schema types • SCHEMAcat should support any schema format – XML: • XML Schema • Relax NG • DTD • … – Text (how?): • EBNF • … – … 10 May /2011CLARIN-NL ISOcat workshop19

XML Relax NG schema 10 May /2011CLARIN-NL ISOcat workshop20 ipa … 

CGN/DCOI grammar with DC references tag = pos '(' feat* ')' ‘WW’ ‘TW’ ‘VG’ ‘TSW’ pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW' feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' | 'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE' NTYPE = 'soortnaam' | 'eigennaam' GETAL = 'enkelvoud' | 'meervoud' | 'getal' GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief' GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig' NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief' POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij' BUIGING = 'zonder' | 'met-e' | 'met-s' GETAL-N = 'zonder-n' | 'meervoud-n' WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘ PVTIJD ‘verleden’ ‘conjunctie’ PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief' PVAGR = 'enkelvoud' | 'meervoud' | 'met-t' NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord' VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald' PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar' PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o' STATUS = 'vol' | 'gereduceerd' | 'nadruk' NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv' LWTYPE = 'bepaald' | 'onbepaald' VZTYPE = 'initieel' | 'versmolten' | 'naal' CONJTYPE = 'nevenschikkend' | 'onderschikkend' SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect' 10 May /2011CLARIN-NL ISOcat workshop21 

SCHEMAcat status • In preparation … 10 May /2011CLARIN-NL ISOcat workshop22

10 May /2011CLARIN-NL ISOcat workshop23 Data Category Registry - ISOcat Linguistic knowledge baseLinguistic resource (schema) Data categories Containers Concepts Concept Registry Relation Relation Registry - RELcat A whole litter! Schema Registry - SCHEMAcat

ISOcat status • Standardization is slowly taking off – The Metadata TDG is starting next week – Terminology is planning the same – Their workflow and progress will be reported on the TC37 plenary and will hopefully spark some action of the other TDGs • ISOcat keeps improving in general, although development effort should be moving more towards RELcat and SCHEMAcat … 10 May /2011CLARIN-NL ISOcat workshop24