Stats

Minimum = 13.
Maximum = 88.
Mean = 32.6684
Median = 31.
stdev = 11.377
Skewness = 0.994064
Kurtosis = 3.91139

The statistics mentioned in class, mean, median, standard deviation, skewness, and kurtosis, provide valuable insights into the distribution of data like spread, central tendency, and shape of the data distribution.

Mean: It says the central tendency of the data, the mean age, would reveal the average age of the individuals in our sample.
Median: When the data is sorted in ascending order, the median is the middle value. The median is a measure of central tendency that is less sensitive to outliers or extreme values than the mean.
Standard Deviation (Stdev): The standard deviation is a measure of the spread or dispersion of the data. A higher standard deviation indicates that ages are more variable, whereas a smaller standard deviation indicates that ages are closer to the mean. It indicates how far the data points differ from the mean age.
Skewness: Skewness measures the asymmetry of the data distribution. A positive skew indicates that the data is skewed to the right, with the right side of the distribution having a longer tail. In the context of an age column, a positive skew may suggest that the collection contains more younger persons
Kurtosis: This measures the “tailedness” of the data distribution. High kurtosis shows heavy tails, or more extreme values in the data, whereas low kurtosis indicates lighter tails. Positive kurtosis implies that the distribution has more outliers or extreme values, whereas negative kurtosis suggests that the distribution contains fewer outliers.

These statistics can help you understand the age distribution in your dataset.

The mean and median are different so the distribution of ages is skewed to the right.
The skewness = 1.
Slightly greater than 3 for a normal distribution of 3
Normal distribution closely resembles the age distribution

T-Test

A t-test is a statistical hypothesis test that is used to assess whether or not there is a significant difference in the means of two groups or conditions. It’s a common approach for comparing the means of two samples to determine whether the observed differences are due to a true effect or just random variation.

There are two types of test

Independent Two-Sample T-Test: This test is used when comparing the means of two independent groups or samples.
Paired T-Test: This test is used when comparing the means of two related groups, such as before and after measurements on the same subjects or matched pairs of observations.

The choice to reject the null hypothesis in both t-tests is determined by the p-value and the significance threshold (usually 0.05). If the p-value significance level is exceeded, the null hypothesis is rejected, indicating a significant difference. If the p-value is greater than the significance level, you keep the null hypothesis because there is insufficient evidence for a meaningful difference.

Cohen’s d

Cohen’s d is a popular measure of impact size that has been elaborated upon by various scholars, including Sawilowsky.
Sawilowsky’s expansion contains extra criteria and interpretations for Cohen’s d values, offering more context for interpreting impact sizes.

Sawilowsky Method

- Calculate the mean first.
- Calculate the pooled Standard Deviation.
- Calculate the Cohen’s d.
  - A small Cohen’s d value (approx. 0.2) is a small size.
  - A medium Cohen’s d value (approx. 0.5) is a moderate size.
  - A large Cohen’s d value (approx. 0.8 or higher) is a large size.

How to calculate ?

Calculate the Means (M1 and M2):
Calculate the Standard Deviations (SD1 and SD2):
Calculate the Pooled Standard Deviation (pooled SD): $pooled SD = \sqrt{\frac{(n_1 - 1) \cdot SD1^2 + (n_2 - 1) \cdot SD2^2}{n_1 + n_2 - 2}}\\$
Calculate Cohen’s d:

What do we get from the above ?

Interpretation of Effect Size: The size stressed that the effect of the magnitude should be taken into account for specific settings. An effect size that is considered minor in one domain may be significant in another, so we need to consider all the points.
Direction of Effect: As the measure of impact size, Cohen’s d does not include information regarding the direction of the effect. Sawilowsky suggests including direction information, such as “positive Cohen’s d” or “negative Cohen’s d,” to indicate whether the effect correlates to an increase or decrease in the outcome.

In conclusion, Sawilowsky’s extension of Cohen’s d provides context-specific information which help us to understand effect on sizes. It recognizes that the relevance of an effect varies depending on the issue.

Week-6 (10/18) Working on numerical data.