Basic concepts of statistics include understanding data types, measures of central tendency, measures of variability, and the principles of statistical inference.
A frequency distribution for nominal-level variables is a summary of data showing the number of occurrences of each distinct category.
Percentages and proportions can enhance clarity by providing a relative measure of frequency, making it easier to compare different categories or groups.
A frequency distribution for ordinal-level variables is a summary of data showing the number of occurrences of each category, where the categories have a meaningful order.
A frequency distribution for interval-ratio-level variables is a summary of data showing the number of occurrences within specified intervals or ranges of values.
Ratios, rates, and percentage change are used to compare different quantities, measure the occurrence of events over time, and quantify the relative change between two values, respectively.
An educated guess based on observation.
Nominal, ordinal, or interval-ratio.
99.5-105.5
Discrete or continuous.
Independent variable
Education level.
The hypothesis becomes a well-established theory.
Theory
Variables measured in a unit that can be subdivided infinitely (fractional numbers).
% = (f / N) × 100
Traits that can change values from case to case, such as age, gender, race, and social class.
It means that each unit increase represents the same amount, such as each child adding one unit.
The number of observations.
Counting number in each category, comparing sizes of categories.
f1 is the number of cases in the first category.
The birth rate is the number of births divided by the population size times 1000 per year.
Percentage change = ((f2 - f1) / f1) × 100
By reporting the number of times each score of a variable occurred.
Pie charts, bar graphs, histograms, and line charts.
They are very useful ways to display the overall shape of a distribution.
U.S. rates of marriage and divorce from 1950 to 2015 (rates per 100,000 population).
17.5-19.5
16
By looking at characteristics such as cute eyes, long hair, more muscle, and sexy body shape.
By moving from theory to hypothesis, then to observation, and finally to empirical generalization.
Get all possible data, in this case, everyone in Hong Kong.
The mathematical quality of the scores of a variable.
Scores that are actual numbers and have a true zero point and equal intervals between scores.
Mutually exclusive, exhaustive, and homogenous.
Protestant, Catholic, Jewish, None, Other.
Ratio = f1 / f2
Divide f1 by f2.
Subtract f1 from f2 (f2 - f1).
Tables that report the number of cases in each category of a variable.
Continuous interval-ratio level variables.
Presenting 'pictures' of research results.
Percentage of total population.
558
4.0
5 students.
Subtract 0.5 from the lower limit and add 0.5 to the upper limit.
18.78%
Males: 53, Females: 60, Total (N): 113
12,440,215
Hypothesis
A variable where scores can be ranked from high to low or from more to less.
By comparing a specific category (part) to all categories (whole).
In terms of the relationships between variables.
48.9%
Percentages and proportions can be calculated for variables at all levels of measurement.
All of the operations permitted for nominal level plus judgments of 'greater than' and 'less than'.
19/23 = 0.83
Divide by f1.
Continuous interval-ratio level variables, but can also be used for discrete interval-ratio level variables.
All possible cases must be included in the categories.
Into segments which are proportional in size to the percentage of cases in each category.
210
The divorce rate increased until around 1980 and then gradually declined.
Suicide rates for males and females by age group in 2017.
10.0
5.0%
253,881,929
1,777,173
Observation
Survey items that measure opinions and attitudes, such as strongly disagree, disagree, neutral, agree, strongly agree.
Decide whether the empirical observation supports the hypothesis.
Revise the theory according to the empirical observation.
Scores are numbers.
It means that zero represents the absence of the variable being measured, such as 0 = no children.
1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Other.
1,777,173
Ratios compare the relative sizes of categories.
The town had 7.39 births for every 1000 residents.
Frequency polygons.
Real limits instead of stated limits.
25.55%
48.90%
18-19
2.5-5.5
90.0%
2 years of age.
120.5
25.61%
Muslim and Buddhist
Is it easier for good-looking people to find a couple than those who are ordinary in appearance?
To gather data from a subset of the population, such as 100 men.
The whole (all categories).
Scores have some numerical quality and can be ranked.
Each category should be similar in nature.
1 = Republican, 2 = Democrat, 3 = Other.
Social class, attitude and opinion scales.
All of the operations permitted for nominal and ordinal levels plus all other mathematical operations (addition, subtraction, multiplication, division, square roots, etc.).
(17 / 2300) * 1000 = 7.39
10.7
The number of categories and the width of those categories.
Each case is counted in one and only one category.
25.40%
23.30%
They refer to how many cases fall below a given score or class interval.
75 and older.
Rates of suicide per 100,000 in each age group.
5 students.
33,500,000
4,315,993
Independent or dependent.
Empirical Generalization
Number of children or number of cars.
The part (specific category).
Proportion = f / N
The number of cases in all categories
All possible options should be included.
Protestant, Episcopalian, Catholic, Jewish, None, Other.
Classification into categories.
Age, number of children, income.
Determine the values of f1 (the number of cases in category 1) and f2 (the number of cases in category 2).
(Number of births / Population size) * 1000
f2 represents the second score, frequency, or value.
They look like bar charts.
The age distribution of the United States in 2017.
462
59.27%
1.0
11
114.5
20 students.
12,700,000
Summarize empirical evidence.
Frequency, or the number of cases in any category
Scores that are different from each other but cannot be treated as numbers.
Protestant, Catholic, Jewish, None, Other.
59,154,489
f2 is the number of cases in the second category.
1.21 females
7.3
46.58%
Yes, they can be used for discrete interval-ratio level variables.
0-5 years to 85 years and older.
The marriage rate showed a general decline from 1950 to 2015, with some fluctuations.
3-5
102.5
20
214,700,000
59,154,489
Variables measured in units that cannot be subdivided (whole numbers).
Scores are labels only, they are not numbers.
The entity from which data are gathered, such as people, groups, states, and nations.
No, they are just labels and cannot be treated as numbers.
Report actual frequencies.
Protestant, Non-Protestant.
253,881,927
Ratios compare parts to parts.
Determine the values for f1 (the value at time 1) and f2 (the value at time 2).
Complexity, large number of scores, requires collapsing or grouping of categories, deciding the number of categories and the width of those categories.
Categories must be exhaustive and mutually exclusive.
The categories (or scores) of the variable border each other, meaning the sides of the bars touch.
The number or percentage of cases in each category.
The point exactly halfway between the upper and lower limits of a class interval.
Divorce rates peaked around 1980 and then began to decline.
100-105
-3.58%
75 and older.
95.0%
34.14%
Guys with more muscle tend to use less time to secure a romantic date.
Dependent variable
Comparing the number of cases in a specific category to the number of cases in all categories.
Age (in years).
Each category should be distinct with no overlap.
1.7%
Classification into categories plus ranking of categories with respect to each other.
The relative increase or decrease in a variable over time.
Because of the large number of scores.
46.58%
Discrete variables.
1370
4.90%
As marriage rates declined, divorce rates initially increased but then also started to decline.
45 to 64.
61,600,000
10.0%.
124,148,263
8,631,985
The independent variable causes changes in the dependent variable.
Age or income.
Number of children.
Graphs can be used to present data visually, making it easier to identify patterns, trends, and relationships within the data.
Gender, race, religion, marital status.
3.4%
0.83 males
Multiply by 100.
(10.7 - 7.3) / 7.3 * 100
Class intervals refer to the categories used in the frequency distribution.
U.S. Bureau of the Census, 2018. American Community Survey, 2013-2017, Five Year Estimates.
21.20%
Add the upper limit to the lower limit and divide by 2.
1 year of age.
-49,900,000
108.5
20
8,300,000
2,031,055
Income (in dollars).
1 = Female, 2 = Male.
53,822,969
0.8%
All of the procedures used for nominal and ordinal levels plus description of scores in terms of equal units.
Multiply by a power of 10.
23/19 = 1.21
f1 represents the first score, frequency, or value.
By graphing a dot at each category’s midpoint and then connecting the dots.
Discrete variables with only a few categories.
The percentage of cases in each category.
1.70%
Marriage rates peaked around the early 1970s before starting to decline.
7.0
20 students.
10 to 14.
Nigeria (109.60%)
12.50%
55.0%.
55.0%
53,822,969