GSTPGO1 Onderzoek In Onderwijs

GSTPGO1 Onderzoek In Onderwijs
bijeenkomst 6 – groep bèta

Tussenevaluatie Motivatiebetogen CIMO Assessment of teaching

Tussenevaluatie Goed/Doorgaan Niet goed/Stoppen Suggesties
Inzicht in onderzoeksresultaten/methoden en inzicht in bruikbaarheid daarvan voor eigen onderwijs Heldere structuur opdrachten Discussie en uitwisseling Niet goed/Stoppen Teveel leeswerk, Vreselijke artikelen, Ratio inzet/opbrengst Suggesties Format wordt een beetje saai Onderzoek naar docent (orde, persoonlijke kwaliteit) Hoe zet je zelf onderzoek op (literatuur zoeken, methoden)?

Motivatie

CIMO C-I-M-O volgt alles logisch, bijv.:
Zijn de bepalende contextfactoren benoemd? Is de outcome voldoende nabij? Mechanisme – plausibel o.b.v. gelezen onderzoek? C, I, M en O voldoende geoperationaliseerd om te evalueren (welke observeerbare indicatoren?)

Assessment of teaching

Identify Purpose of Assessment
Diagnosis (to clarify the type and extent of learners’ learning difficulties in light of well-established criteria, for intervention); Screening (to identify learners who differ significantly from their peers, for further assessment); Qualification (to decide whether learners are sufficiently qualified for a job, course or role in life – that is, whether they are equipped to succeed in it – and whether to enroll them or to appoint them to it); Licensing (to provide legal evidence – the license – of minimum competence to practice a specialist activity, to warrant stakeholder trust in the practitioner); Programme evaluation (to evaluate the success of educational programmes or initiatives, nationally or locally); Comparability (to guide decisions on comparability of examination standards for later assessments on the basis of cohort performance in earlier ones) Therefore, the first step in evaluating your assessment practices is to identify the purpose(s) of each assessment practice.

FCI – Diagnostisch instrument (mechanica)
Along which of the paths below will the hockey puck move after receiving the "kick"? Misconcepten “meten” -misconcepten moeten bestaan (theorie) -items moeten “reageren” op specifiek misconcept -meerdere items per concept nodig -relevante uitkomst: deelscore per concept. In this situation, neither student exerts a force on the other. student "a" exerts a force on "b", but "b" doesn't exert any force on "a". each student exerts a force on the other but "b" exerts the larger force. each student exerts a force on the other but "a" exerts the larger force. each student exerts the same amount of force on the other.

CE, SE – Qualification, maar ook…
Goede scholen hebben jaar in jaar uit kleine verschillen tussen de twee soorten examens en vaker relatief hoge cijfers voor beiden. Scholen met weinig toegevoegde waarde hebben veel grotere verschillen tussen de twee examentypen. Het grootste verschil tussen de twee examensoorten doet zich voor bij vwo-scholen. Gemiddeld geeft een vwo-afdeling 0,57 meer voor het schoolexamen dan het centraal schriftelijk, op de havo is dat maar 0,27. Vakken als Duits, Nederlands en maatschappijleer springen er uit met gemiddeld een schoolexamen-cijfer dat gemiddeld 0,7 punt hoger ligt dan het centraal schriftelijk. Het kleinst (minder dan 0,2 punt) zijn de verschillen bij biologie, scheikunde en wiskunde. Bij bijna alle vakken zijn er vwo-scholen met een verschil van meer dan twee punten. Een groot en langdurig verschil is volgens Dronkers een goede indicator voor lage kwaliteit bij een vak of zelfs een school. Er is in de discussie opmerkelijk genoeg weinig aandacht voor de CORRELATIE tussen SE-cijfers en CE cijfers…. Bron:

Correlatie SE-CE – (van één school)
> Duidelijke correlatie voor Wiskunde > Geen correlatie voor Nederlands

Rekentoets: qualification and (programme?) evaluation

Door bemoeienis van het Freudenthal Instituut is rekenen vervangen door raadseltjes, vaak slecht geformuleerd – een ramp voor de wiskunde, waar juist alles in het werk wordt gesteld om dubbelzinnigheid uit te sluiten. Op andere vakgebieden worden mechanismen vervangen door verhalen, feiten door meningen. (Vincent Icke, NRC, maart 2015)

PISA: evaluating national educational systems

Threats to validity & societal acceptance
Framework (Relevancy, Acceptance, Curriculum coverage) Items (implementation of framework) Coding of items (scoring, categorisation) Cultural bias/translation issues Participants (Sampling, Effort put in) Analysis (Multilevel, Item response ….) Van Halen

Is PISA fundamentally flawed?
But what if there are “serious problems” with the PISA data? What if the statistical techniques used to compile it are “utterly wrong” and based on a “profound conceptual error”? Suppose the whole idea of being able to accurately rank such diverse education systems is “meaningless”, “madness”? What if you learned that PISA’s comparisons are not based on a common test, but on different students answering different questions? And what if switching these questions around leads to huge variations in the all-important PISA rankings, with the UK finishing anywhere between 14th and 30th and Denmark between fifth and 37th? What if these rankings -that so many reputations and billions of pounds depend on, that have so much impact on students and teachers around the world- are in fact “useless”? Published in TES magazine on 26 July, 2013

Item response curves Treshold vs. determination

Incomplete block design
The items were presented to students in thirteen standard test booklets, with each booklet being composed of four clusters, hence two hours of test time. Clusters labelled PM1, PM2, PM3, PM4, PM5, PM6A and PM7A denote the seven paper-based standard mathematics clusters, PR1 to PR3 denote the paper-based reading clusters, and PS1 to PS3 denote the paper- based science clusters. PM1, PM2 and PM3 were the same three mathematics clusters as those administered in 2009, and the remaining clusters would comprise new material. Two of the three reading clusters were intact clusters used in The remaining reading cluster was based on a cluster used in but with one unit substituted. The substitution was made after the 2010 oil spill in the Gulf of Mexico rendered a unit about the idyllic nature of the Gulf unusable. The three science clusters were intact clusters used in PISA 2009. The cluster rotation design for the standard booklets in the Main Survey corresponds to designs used in previous PISA surveys and is shown in Figure 2.1. This is a balanced incomplete block design. Each cluster (and therefore each test item) appears in four of the four-cluster test booklets, once in each of the four possible positions within a booklet, and each pair of clusters appears in one (and only one) booklet. An additional feature of the PISA 2012 test design is that one booklet (booklet 12) is a complete link, being identical to a booklet administered in PISA Each sampled student was randomly assigned to one of the thirteen booklets administered in each country, which meant each student undertook two hours of testing.

Non-significance explicitly reported in PISA

Maar meten de items wat ze willen meten?
The claim put forward here is that the survey test situation in point of fact has important problem- setting requirement characteristics that modify the problem determination and possible solution demarcation decisively; and that the claim of the PISA studies to assess ‘knowledge and skills for life’ is invalidated by the failure to take this fact into account. Dohn, N. B. (2007). Knowledge and skills for PISA—Assessing the assessment. Journal of Philosophy of Education, 41(1), 1-16.

Jessica on chocolate

GSTPGO1 Onderzoek In Onderwijs

Verwante presentaties

Presentatie over: "GSTPGO1 Onderzoek In Onderwijs"— Transcript van de presentatie:

Verwante presentaties

Over het project

Feedback

Inloggen

Inloggen via een sociaal netwerk:

GSTPGO1 Onderzoek In Onderwijs

Verwante presentaties

Presentatie over: "GSTPGO1 Onderzoek In Onderwijs"— Transcript van de presentatie:

Verwante presentaties

Over het project

Feedback