M ARIO F. T RIOLA 3rd E DITION Essentials of S TATISTICS
Programma vandaag 1e uur –Welkom en kennismaking –Organisatie en opzet van het onderwijs 2e uur –Waarom statistiek? –Vooruitblik op de stof hfst. 1,2 en 3
1. Welkom en kennismaking Docent en assistent Mix van 2 e jaars en schakelaars Presentielijst Huishoudelijke mededeling: oekaze CvB
2. Organisatie en opzet (1) Werkgroepen - presentielijst Website cursus –Introductie –Literatuur –Beoordeling en deadlines –Links –Proeftentamen –Rooster
Website cursus:
Boek Literatuur: Mario Triola: Essentials of Statistics, 3 rd edition Addison-Wesley Higher Education, 2007
2. Organisatie en opzet (2) Website cursus (vervolg) –Regels ! –Schema van de oefeningen –Tentamenstof Opdrachten week 2: hfst 1,2 en 3 Boek: kopie stof 1,2 en 3
Organisatie Geen hoorcolleges: –vragenuur op basis van ingediende vragen –heel veel oefenmateriaal Verplichte werkcolleges: –Het maken van opgaven is essentieel en daarom verplicht. –Steeds de uitwerkingen van de aangegeven ‘exercises’ voorafgaand aan het werkcollege inleveren in 2-voud. –Werkgroepen & begeleiding: groep 1: woensdag 1 groep 2: woensdag 2, groep 3: vrijdag –Computerpracticum?
3. Waarom statistiek? Lezen en schrijven artikelen vakgebied IK –Voorbeeld artikel MIS Quarterly Lezen en schrijven in het dagelijks leven –Voorbeeld: tabel actiecommitee in de buurt Baisvoorwaarde: logisch denken en redeneren –Voorbeeld: het Monty Hall-probleem –Voorbeeld Doping gebruik
Tabel (1) artikel MIS Quarterly
Tabel (2) artikel MIS Quarterly
Tabel buurtcomité
Intuitie is moeilijk Quiz: hoofdprijs U mag kiezen uit 3 deuren U kiest een deur … … Welke kans heeft U op de hoofdprijs? 1/3 1/3
Maar … Stel de quizmaster opent NA UW KEUZE een van de twee overgebleven deuren en laat zien dat daar niets in zit. U mag nu nog van deur wisselen. Doet U dit? Ja !! want dit vergroot Uw kans !!!
Analyse Stel de hoofdprijs zit achter deur 1: 1.U koos deur 1 (auto). De quizmaster opent een andere deur waarachter niets staat. Ruilen levert verlies op… 2.U koos deur 2 (leeg). De quizmaster opent deur 3 waarachter niets staat. Ruilen levert hoofdprijs! 3.U koos deur 3 (leeg). De quizmaster opent deur 2 waarachter niets staat. Ruilen levert hoofdprijs! 123
pauze
Triola, hoofdstuk 1 Belangrijke definities voor gebruik bij de statistiek
Sektie 1.1 Belangrijke definities Data Statistiek Populatie Census Steekproef
Definitie Statistiek a collection of methods for - planning studies and experiments, - obtaining data, - and then organizing, summarizing, presenting, analyzing, interpreting, - and drawing conclusions based on the data
Chapter Key Concepts Sample data must be collected in an appropriate way, such as through a process of random selection. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.
Sektie 1.2 Data typen Definities: –Populatie parameter versus steekproef statistic –Kwantitatieve versus kwalitatieve data –Discrete versus continue data –Meetnivo’s: nominaal, ordinaal, interval, ratio
Levels of Measurement 1.Nominal - categories only 2.Ordinal - categories with some order 3.Interval - differences but no natural starting point 4.Ratio - differences and a natural starting point
Sektie 1.3 Kritisch denken Misbruik, ondeskundig gebruik, verkeerd gebruik van de statistiek
Misuse # 1- Bad Samples Voluntary response sample (or self-selected sample) - one in which the respondents themselves decide whether to be included. In this case, valid conclusions can be made only about the specific group of people who agree to participate.
Misuse # 2- Small Samples Conclusions should not be based on samples that are far too small. Example: Basing a school suspension rate on a sample of only three students
To correctly interpret a graph, you must analyze the numerical information given in the graph, so as not to be misled by the graph’s shape. Misuse # 3- Graphs
Part (b) is designed to exaggerate the difference by increasing each dimension in proportion to the actual amounts of oil consumption. Misuse # 4- Pictographs
Misuse # 5- Percentages Misleading or unclear percentages are sometimes used. For example, if you take 100% of a quantity, you take it all. 110% of an effort does not make sense.
Loaded Questions Order of Questions Refusals Correlation & Causality Self Interest Study Precise Numbers Partial Pictures Deliberate Distortions Other Misuses of Statistics
Sektie 1.4 Ontwerp van het experiment Soorten studies –Observationeel –Experimenteel –Retrospectief –Prospectief (longitudinaal, cohort)
Confounding occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors Definition
Controlling Effects of Variables Blinding subject does not know he or she is receiving a treatment or placebo Blocks groups of subjects with similar characteristics Completely Randomized Experimental Design subjects are put into blocks through a process of random selection Rigorously Controlled Design subjects are very carefully chosen
steekproeven
Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected Definitions Simple Random Sample (of size n ) subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen
Random Systematic Convenience Stratified Cluster Methods of Sampling
Saunders-hfst 6
Triola, hoofdstuk 2 Statistiek voor het samenvatten en weergeven van data
1. Center: A representative or average value that indicates where the middle of the data set is located. 2. Variation: A measure of the amount that the values vary among themselves. 3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed). 4. Outliers: Sample values that lie very far away from the vast majority of other sample values. 5. Time: Changing characteristics of the data over time. Sektie 2.1 Overview Important Characteristics of Data CVDOT
Sektie 2.2 Frequentieverdelingen Gewone (rechte) telling van waarden in een tabel Samenvoegen van waarden in categorieën (classes)
Frequency Distribution Ages of Best Actresses Frequency Distribution Original Data
Samenhangende definities Lower class limits Upper class limits Class boundaries Class midpoints Class width Relatieve frequenties Cumulatieve frequenties (cumulatieve percentages)
Frequency Tables
Sektie 2.3 Histogrammen Grafische weergave van verdelingen
Histogram A bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies
Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies
One key characteristic of a normal distribution is that it has a “bell” shape. The histogram below illustrates this. Critical Thinking Interpreting Histograms
Sektie 2.4 Statistical graphics Andere vormen van visuele weergave –Polygon –Ogive –Dot plot –Stemplot –Pareto chart –Pie chart –Scatter plot –Time series
Ogive A line graph that depicts cumulative frequencies Insert figure 2-6 from page 58
Dot Plot Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values
Other Graphs
Triola, hoofdstuk 3 Statistiek voor het beschrijven, verkennen en vergelijken van data
Sektie 3.1 Overzicht Descriptive Statistics –summarize or describe the important characteristics of a known set of data Inferential Statistics –use sample data to make inferences (or generalizations) about a population
Sektie 3.2 Centrummaten Gemiddelde (mean) –Van steekproef en van populatie (mu) Mediaan (x-tilde) Modus Midrange Gewogen gemiddelde
Notation µ is pronounced ‘mu’ and denotes the mean of all values in a population x = n x x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x N µ = x x
Carry one more decimal place than is present in the original set of values. Round-off Rule for Measures of Center
use class midpoint of classes for variable x Mean from a Frequency Distribution
Best Measure of Center
Skewness
Sektie 3.3 Variatiematen Range Standaard deviatie –steekproef en populatie (sigma) Variantie Variatiecoëfficiënt (CV)
Key Concept Because this section introduces the concept of variation, which is something so important in statistics, this is one of the most important sections in the entire book. Place a high priority on how to interpret values of standard deviation.
Definition The standard deviation of a set of sample values is a measure of variation of values about the mean.
Sample Standard Deviation Formula ( x - x ) 2 n - 1 s =s =
Rationale for using n-1 versus n The end of Section 3-3 has a detailed explanation of why n – 1 rather than n is used. The student should study it carefully.
Standard Deviation - Important Properties The standard deviation is a measure of variation of all values from the mean. The value of the standard deviation s is usually positive. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others). The units of the standard deviation s are the same as the units of the original data values.
Population Standard Deviation 2 ( x - µ ) N = This formula is similar to the previous formula, but instead, the population mean and population size are used.
Variance - Notation standard deviation squared s 2 2 } Notation Sample variance Population variance
Estimation of Standard Deviation Range Rule of Thumb For estimating a value of the standard deviation s, Use Where range = (maximum value) – (minimum value) Range 4 s s
Estimation of Standard Deviation Range Rule of Thumb For interpreting a known value of the standard deviation s, find rough estimates of the minimum and maximum “usual” sample values by using: Minimum “usual” value (mean) – 2 X (standard deviation) = Maximum “usual” value (mean) + 2 X (standard deviation) =
The Empirical Rule
Definition The coefficient of variation (or CV) for a set of sample or population data, expressed as a percent, describes the standard deviation relative to the mean. SamplePopulation s x CV = 100% CV = 100%
Sektie 3.4 Maten van relatieve afwijking Z-scores Quartielen Percentielen
Key Concept This section introduces measures that can be used to compare values from different data sets, or to compare values within the same data set. The most important of these is the concept of the z score.
z Score (or standardized value) the number of standard deviations that a given value x is above or below the mean Definition
Sample Population x - µ z = Round z to 2 decimal places Measures of Position z score z = x - x s
Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 Unusual Values:z score 2
Q 1, Q 2, Q 3 divide ranked scores into four equal parts Quartiles 25% Q3Q3 Q2Q2 Q1Q1 (minimum)(maximum) (median)
Percentiles Just as there are three quartiles separating data into four parts, there are 99 percentiles denoted P 1, P 2,... P 99, which partition the data into 100 groups.
Sektie 3.5 EDA Uitbijters (outliers) Boxplot
Important Principles An outlier can have a dramatic effect on the mean. An outlier can have a dramatic effect on the standard deviation. An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured.
For a set of data, the 5-number summary consists of the minimum value; the first quartile Q 1 ; the median (or second quartile Q 2 ); the third quartile, Q 3 ; and the maximum value. A boxplot ( or box-and-whisker- diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q 1 ; the median; and the third quartile, Q 3. Definitions
Boxplots
Boxplots - cont
Einde vooruitblik 1,2 en 3