While surfing into the data there are two columns called Age and Race
Age is Quantitative data and Race is Qualitative data. Checking the data type age is float64 and race is an object
Race has 6 different categories, they are Asian, Black, Hispanic, Native American, Other, and White
For the “age” variable, descriptive statistics will usually include measures summarizing the distribution and central tendency of the ages in the sample. In descriptive statistics, common examples of numerical variables such as “age” are:
- Mean: This is the average age, calculated by summing up all the ages and dividing by the total number of observations.
- Median: The middle value of the ages when they are arranged in ascending order. It’s less affected by extreme values (outliers) than the mean.
- Mode: The age that appears most frequently in the dataset.
- Range: The difference between the maximum and minimum ages in the dataset, providing an idea of the spread of ages.
- Standard Deviation: A measure of the dispersion or variability of ages. It tells you how spread out the ages are from the mean.
Race (Categorical Variable – object):
Summarizing the distribution of the various race categories will be the main goal of the descriptive statistics for the “race” variable. Common descriptive statistics for “race” include the following since it is a categorical variable with distinct categories (Asian, Black, Hispanic, Native American, Other, and White).
- Frequency Table: This table will show the count of each race category in the dataset, giving you an idea of how many individuals belong to each racial group.
- Percentage Distribution: You can calculate the percentage of each race category by dividing the count of each category by the total number of observations. This helps you understand the relative proportions of each racial group in the dataset.
- Mode: In this context, the mode represents the most common race category in the dataset.
- Bar Chart: A visual representation of the frequency or percentage distribution of different race categories using bar charts can provide a more intuitive view of the data.
Once the data is cleaned the AGE column is completely numeric so plotted the Q-Q plot
Data are compared to an expected theoretical distribution—typically the normal curve—in a Q-Q graphic. Plot deviations suggest departures from the theory, whereas straight lines show a close agreement. In data analysis and statistics, it is a tool for determining model fit, identifying outliers, and verifying data normalcy.
Our Q-Q plot looks like this
The Quantile-Quantile plot, or Q-Q plot for short, shows us how closely the quantiles (percentiles) in a dataset match the predicted values from a theoretical distribution, most commonly the normal distribution. A straight line connecting all of the dots in the Q-Q plot indicates that the dataset closely resembles the theoretical distribution.
once we read the data, we moved towards Race: we differentiated the data in 6 different ways and we calculated the data with respect to Mean, Median, Stdev, Variance, Skewness, and Kurtosis
Using all these data we get a clear picture and insights for the Washington police shooting.
Even we use this T-test and ANOVA test
An analysis of variance, or ANOVA, is a statistical test that compares the means of three or more groups to see if there are any noteworthy variations between them. It evaluates if there is more variation in group means than would be predicted by chance. An overall significance level, which indicates whether at least one group varies from the others considerably, is provided by an ANOVA. Post-hoc tests can determine which particular groups vary if they are significant. In experimental research, ANOVA is frequently used to examine how various factors or treatments affect a dependent variable and calculate the Null Hypothesis.