Week-9 (11/10) Working on Decision Tree.

A decision tree is a popular machine-learning algorithm for classification and regression tasks. In the context of classification, a decision tree is a tree-like model where each internal node represents a decision based on the value of a particular feature, each branch represents the outcome of the decision, and each leaf node represents the final class label.

Decision Tree

There is an in-built library called “sklearn” in that we call the function named DecisionTreeClassifier
While I was working on a dummy data set from sklearn, meaning they have their own data set which is available on their website called load_iris
Important Parameters in DecisionTreeClassifier:

  1. criterion: The function to measure the quality of a split (e.g., “gini” for Gini impurity or “entropy” for information gain).
  2. max_depth: The maximum depth of the tree.
  3. min_samples_split: The minimum number of samples required to split an internal node.
  4. min_samples_leaf: The minimum number of samples required at a leaf node.

How Decision Trees Work

  1. Decision Nodes:
    • Each internal node tests a specific attribute or feature.
    • The decision is based on the value of the feature.
  2. Branches:
    • Each branch represents the outcome of the decision.
    • Branches are labeled with the possible values of the decision attribute.
  3. Leaf Nodes:
    • Leaf nodes represent the final decision or class label.
    • Each leaf node is associated with a class label.
  4. Splitting:
    • The tree is built by recursively splitting the dataset based on the selected features.
    • The goal is to create pure leaf nodes with samples belonging to a single class.
  5. Stopping Criteria:
      • The tree-building process stops when a certain criterion is met (e.g., maximum depth reached or minimum samples in a leaf).

Decision trees are simple to use and can handle both numerical and categorical data. They are, however, prone to overfitting, especially when the tree depth is not well managed. Pruning and establishing a limited depth might assist in reducing overfitting.

I used the data set which I mentioned earlier
and the output is visualized something like this

The image generated for visualizing the decision tree provides a graphical representation of the structure and decision-making process of the trained model.

The image generated by visualizing the decision tree provides a graphical representation of the structure and decision-making process of the trained model. Let’s break down what the image is telling us:

  1. Root Node:
    • The root node is the tree’s top node. It indicates the initial decision’s characteristics and threshold. This choice is based on a certain property, and the tree branches out from that point based on the different values of that feature.
  2. Internal Nodes:
    • Internal nodes are those in the center of the tree. Each internal node indicates a choice made on the basis of a given characteristic and threshold. “Is feature X greater than feature Y?” is a common question.
  3. Leaf Nodes:
    • The leaf nodes are the terminal nodes at the bottom of the tree. A predicted class is represented by each leaf node. The anticipated class for instances that reach the leaf node is the majority class in that node.
  4. Edges (Branches):
    • The results of the decisions are represented by the edges linking the nodes. For example, if a node’s decision is true, you follow the left branch; if it is false, you follow the right branch.
  5. Feature Names and Thresholds:
    • At each decision node, the names of the characteristics and the threshold values used for splitting are presented. This information assists us in comprehending the circumstances in which the decision tree makes decisions.
  6. Class Names:
    • The leaf nodes are labeled with the names of the classes if you specify class_names while drawing the tree. This makes interpreting the final anticipated classes simple.

We can understand how the model partitions the feature space and produces predictions based on the input features by looking at the decision tree visualization. It’s a useful tool for deciphering the model’s inner workings and determining the most significant characteristics in the decision-making process.

Week-9 (11/08) Decision Tree.

Decision Tree

Decision tree classification is a common machine learning approach that may be used to categorize data. It is based on the idea of a decision tree, which is a hierarchical framework for making decisions. Decision trees may be conceived of in the context of statistics as a means to represent and analyze the connections between distinct variables in a dataset.

Decision Tree Classification Algorithm » DevOps

Steps that we are using in our project

  1. Data Preparation: Begin with a dataset with a collection of characteristics (independent variables) and a target variable (the variable to predict or classify). In statistics, features are equivalent to predictor variables, while the response is the target variable.
  2. Splitting the Data: Separate the dataset into training and testing. The decision tree is built using the training set, and its performance is evaluated using the testing set.
  3. Building the Decision Tree: In Python, we can use libraries like sci-kit-learn to create a decision tree classifier. The algorithm recursively splits the data into subsets based on the feature that provides the best split according to a certain criterion.
  4. Node Splitting: The method finds the characteristic that best divides the data into various groups at each node of the tree. The objective is to reduce impurity while increasing information acquisition. In statistical terms, this method is analogous to picking the variable with the most significant information for classification.
  5. Leaf Nodes: The decision tree splits the data until a stopping requirement, such as a maximum depth or a minimum number of samples per leaf node, is reached. The ultimate categorization choice is represented by the terminal nodes, also known as leaf nodes. Each leaf node holds the majority of the data points that arrive at it.
  6. Model Evaluation: After creating the decision tree, you may analyze its performance using the testing dataset. In statistics, common assessment criteria include accuracy, precision, recall, the F1-score, and the confusion matrix. These metrics assist in determining how successfully the model classifies the data.
  7. Visualization: To offer an interpretable picture of the categorization rules, decision trees can be shown. To plot and display the decision tree in Python, packages such as sci-kit-learn may be used.

Pros and Cons of Decision Tree

Pros

    1. Interpretability: Decision trees describe the decision-making process in a highly interpretable and visual manner. They are simple to grasp, making them an excellent choice for explaining the concept to non-technical stakeholders.
    2. No Assumptions about Data: Decision trees make no strong assumptions about the distribution of the underlying data. They can handle numerical and categorical data, as well as missing values using proper algorithms.
    3. Feature Selection: Decision trees do feature selection on their own by selecting the most informative characteristics for splitting nodes. This might assist you in determining the most significant variables in your dataset.

Cons

  1. Overfitting: Overfitting is common in decision trees, especially when the tree is deep and complicated. This means that the model may perform well on training data but badly on fresh, previously unknown data.
  2. Instability: Small changes in data can result in drastically different tree architectures. Because of this instability, decision trees may be less dependable than alternative models.
  3. Bias Toward Dominant Classes: In classification tasks with imbalanced classes, decision trees may be biased towards the majority class, leading to poor classification of the minority class.

Week-9 (11/06) Total Probability with Baye’s rule.

The idea of “total probability” is a fundamental principle in statistics and probability theory used to determine the likelihood of an occurrence by considering all conceivable ways or scenarios in which the event may occur. It is frequently employed when you have knowledge of the likelihood of an event occurring under various settings or scenarios. The law of total probability, a specific application of this notion, is closely connected to total probability.

According to the law of total probability, if the sample space is divided into mutually exclusive and exhaustive events (i.e., events that cover all possible outcomes and do not overlap), the probability of any event can be expressed as a weighted sum of the conditional probabilities of that event given the various scenarios.

Mathematically, for an event A and a partition of the sample space into events B1, B2, B3,… Bn
the law of total probability is expressed as:

Where:
P(A) is the probability of event (A).
P(A|Bi) is the conditional probability of an event P( given that the event P( has occurred.
is the probability of an event P( (part of the partition).

The basic concept is to analyze all alternative scenarios (represented by the many events in the partition), compute the conditional probabilities for the event of interest within each scenario, and then weight these conditional probabilities by the likelihood of each scenario occurring. The overall probability of the occurrence P(A) is obtained by adding these weighted conditional probabilities.

Bayes rule:
Total probability is connected to Bayes’ Rule and is frequently used in combination with it to update probabilities in the presence of uncertainty or new information. The denominator in Bayes’ Rule is computed using total probability, which is critical for updating the conditional probability of a hypothesis depending on new information.

The rule of total probability is employed in this situation to account for all alternative scenarios or hypotheses P(Hi), allowing you to compute the total probability of the evidence P(E) by summing over all these possibilities. This is significant because P(E) is the normalization factor in Bayes’ Rule, guaranteeing that the posterior probabilities are appropriately scaled.

Week-8 (11/03) Baye’s theorem.

The Bayes’ theorem is a statistical and probability theory that may be used to analyze data in Python. It allows you to adjust probability or draw conclusions about events based on fresh facts or data. Bayes’ theorem is frequently employed in statistical contexts for Bayesian inference, which is a versatile and powerful framework for statistical modeling and parameter estimation.

Bayes’ theorem can be expressed as

Where
– (P(A|B)) is the conditional probability of event A given that event B has occurred.
– (P(B|A)) is the conditional probability of event B given that event A has occurred.
– (P(A)) is the prior probability of event A, which is your initial belief in the probability of A before considering any new evidence.
– (P(B)) is the prior probability of event B, which is your initial belief in the probability of B before considering any new evidence.

In the context of Python data analysis, Bayes’ theorem can be utilized for a variety of tasks, including:
Bayes’ theorem can be used to update your views about the parameters of a statistical model depending on observed data. This is especially handy if you know the settings ahead of time.
Hypothesis Testing: Bayesian hypothesis testing uses the posterior probability to analyze the evidence for or against a hypothesis.
Predictive Modeling: Bayesian approaches, such as Bayesian regression or Bayesian networks, can be utilized to develop predictive models that yield more robust predictions and uncertainty estimates.
The code generates two graphs to visualize the results of the Bayesian parameter estimation using the Metropolis-Hastings sampler.
Mean Estimation:

  • The first graph depicts the estimated values of the normal distribution’s mean during the Markov Chain Monte Carlo (MCMC) sampling procedure.
  • The sample number or iteration is represented on the x-axis, demonstrating how the parameter estimations change over time.
  • The calculated mean values are represented on the y-axis.
  • In this graph, you can see how the sampler investigates various mean values to locate the region with the highest posterior probability, which corresponds to the most likely mean value given the data.

Standard Deviation Estimation:

  • The second graph depicts the estimated values of the normal distribution’s standard deviation over the course of the MCMC sampling procedure.
  • The x-axis shows the sample number or iteration, while the y-axis represents the estimated standard deviation values, as in the first graph.
  • This graph depicts how the sampler investigates various standard deviation values in order to locate the region with the highest posterior probability, showing the most likely value of the standard deviation given the data.

    These graphs depict the parameter estimation process as well as the MCMC sampler’s convergence. The sampled values should stabilize as the number of samples rises, producing more accurate estimations of the parameters.
    This is done using the dummy data for learning purposes
    I’ll implement this in my project

Week-8 (11/01) confidence interval for Age and Race.

95% Confidence Interval

Based on the Washington data from the population, a range of values known as the 95% confidence interval for the population’s mean and variance is created. It offers some ambiguity regarding the actual population mean and variance values.

95% Confidence Interval for the Mean

Formula: Confidence Interval = Sample Mean ± Margin of Error
The sample size, sample standard deviation, and selected confidence level (in this example, 95%), all influence the margin of error.
If the population standard deviation is known (often denoted as σ), we can use the Z-distribution and the formula:
Margin of Error (Z-distribution)

Where:
Z is the critical value corresponding to a 95% confidence level (usually 1.96 for 95% confidence).
σ is the population standard deviation.
n is the sample size.

95% Confidence Interval for the Variance

To construct a 95% confidence interval for the population variance (σ^2), we use the Chi-squared distribution.
The formula for the confidence interval is

Where
n is the sample size.
s^2 is the sample variance.
χ^2_upper and χ^2_lower are the critical values of the Chi-squared distribution corresponding to the upper and lower tail

Moving towards the project

While plotting the graph for all 7 different types of race wrt to age and calculating the 95% CI
This is how it looked like with respect to mean age:

95% CI for mean age of Asian race: (34.102561714042324, 38.17163183434478)
95% CI for mean age of Black race: (32.2672685325096, 33.36063844423459)
95% CI for mean age of Hispanic race: (33.025105503719516, 34.26648662508191)
95% CI for mean age of Native American race: (31.171378988317556, 35.06775144646505)
95% CI for mean age of Other race: (28.60618586384799, 38.69381413615201)
95% CI for mean age of Unknown race: (39.27258057718045, 41.59356115510302)
95% CI for mean age of White race: (39.67225966058736, 40.5853739271989)

with respect to age variance:
95% CI for age variance of Asian race: (-40.85984853099225, 308.09024454462985)
95% CI for age variance of Black race: (-9.30460757975456, 270.24488620549687)
95% CI for age variance of Hispanic race: (-9.912485515846711, 234.15910188380138)
95% CI for age variance of Native American race: (-33.487313303846264, 215.28258325129013)
95% CI for age variance of Other race: (-150.1015458360546, 415.0015458360547)
95% CI for age variance of Unknown race: (-26.69241274796164, 471.92870773790173)
95% CI for age variance of White race: (-8.759132313413176, 349.9572418892791)

And the graph of Q-Q plot wrt to age distribution

These are just random figures posted from the lot and also go the information of stats age wrt to their race
Race: Unknown
Median: 38.00
Mean: 40.43
Standard Deviation: 14.92
Variance: 222.62
Skewness: 0.67
Kurtosis: 2.9363

Race: Native American
Median: 32.00
Mean: 33.12
Standard Deviation: 9.53
Variance: 90.90
Skewness: 0.58
Kurtosis: 2.8620

Week-8 (10/30) Age and Race.

While surfing into the data there are two columns called Age and Race
Age is Quantitative data and Race is Qualitative data. Checking the data type age is float64 and race is an object
Race has 6 different categories, they are Asian, Black, Hispanic, Native American, Other, and White
For the “age” variable, descriptive statistics will usually include measures summarizing the distribution and central tendency of the ages in the sample. In descriptive statistics, common examples of numerical variables such as “age” are:

  1. Mean: This is the average age, calculated by summing up all the ages and dividing by the total number of observations.
  2. Median: The middle value of the ages when they are arranged in ascending order. It’s less affected by extreme values (outliers) than the mean.
  3. Mode: The age that appears most frequently in the dataset.
  4. Range: The difference between the maximum and minimum ages in the dataset, providing an idea of the spread of ages.
  5. Standard Deviation: A measure of the dispersion or variability of ages. It tells you how spread out the ages are from the mean.

Race (Categorical Variable – object):
Summarizing the distribution of the various race categories will be the main goal of the descriptive statistics for the “race” variable. Common descriptive statistics for “race” include the following since it is a categorical variable with distinct categories (Asian, Black, Hispanic, Native American, Other, and White).

  1. Frequency Table: This table will show the count of each race category in the dataset, giving you an idea of how many individuals belong to each racial group.
  2. Percentage Distribution: You can calculate the percentage of each race category by dividing the count of each category by the total number of observations. This helps you understand the relative proportions of each racial group in the dataset.
  3. Mode: In this context, the mode represents the most common race category in the dataset.
  4. Bar Chart: A visual representation of the frequency or percentage distribution of different race categories using bar charts can provide a more intuitive view of the data.

Once the data is cleaned the AGE column is completely numeric so plotted the Q-Q plot
Data are compared to an expected theoretical distribution—typically the normal curve—in a Q-Q graphic. Plot deviations suggest departures from the theory, whereas straight lines show a close agreement. In data analysis and statistics, it is a tool for determining model fit, identifying outliers, and verifying data normalcy.

Our Q-Q plot looks like this

The Quantile-Quantile plot, or Q-Q plot for short, shows us how closely the quantiles (percentiles) in a dataset match the predicted values from a theoretical distribution, most commonly the normal distribution. A straight line connecting all of the dots in the Q-Q plot indicates that the dataset closely resembles the theoretical distribution.

once we read the data, we moved towards Race: we differentiated the data in 6 different ways and we calculated the data with respect to Mean, Median, Stdev, Variance, Skewness, and Kurtosis
Using all these data we get a clear picture and insights for the Washington police shooting.

Even we use this T-test and ANOVA test
An analysis of variance, or ANOVA, is a statistical test that compares the means of three or more groups to see if there are any noteworthy variations between them. It evaluates if there is more variation in group means than would be predicted by chance. An overall significance level, which indicates whether at least one group varies from the others considerably, is provided by an ANOVA. Post-hoc tests can determine which particular groups vary if they are significant. In experimental research, ANOVA is frequently used to examine how various factors or treatments affect a dependent variable and calculate the Null Hypothesis.

Week-7 (10/28) K-4.

K-4 Algorithm

A vital tool in unsupervised machine learning is data clustering, which is accomplished by using the K-4 algorithm, sometimes referred to as K-4 clustering. The clustering method involves assembling comparable data items according to their shared attributes. Specifically, K-4 is a variant of the widely recognized K-Means method.
“K” in the K-4 method stands for the number of clusters to be formed. K-4 explicitly seeks to separate the data into four different clusters, in contrast to K-Means, which often focuses on identifying a fixed number of clusters (K). When you have prior knowledge or a particular application that needs precisely four clusters, this can be helpful.

The K-4 algorithm works as follows: first, data points are initially assigned to clusters. Then, to reduce the within-cluster variance, these assignments are iteratively refined. The data are then divided into four clusters as a consequence of this refinement process, which is continued until convergence.

Similar to K-Means, K-4 is a flexible method that has uses in data analysis, image segmentation, and consumer segmentation, among other domains. You must select a suitable value for K (four in this case) depending on the particular issue and dataset that you are dealing with. When used appropriately, the K–4 algorithm can be a potent tool for pattern recognition and data management.

Week-7 (10/23) K-Means.

lemniscate

In quantitative analysis, a lemniscate is a mathematical curve that resembles a figure-eight (∞).

It denotes a distinct link between two variables that exhibits a balanced, symmetric correlation. This form denotes a complex, entangled relationship between the variables, which frequently necessitates the use of specialized approaches for proper modeling and interpretation. It’s an important idea to understand when dealing with non-linear connections and improving the accuracy of statistical models in a variety of analytical domains.

Dbscan

DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a popular unsupervised clustering algorithm in data analysis.

It classifies data points according to their density, defining clusters as locations with many close points and labeling solitary points as noise. It does not require a prior specification of the number of clusters, making it suitable for a wide range of data types and sizes. DBSCAN distinguishes between core points (in dense areas), border points (on cluster edges), and noise points. It handles irregularly shaped data effectively and is excellent for locating clusters in spatial or density-related datasets, such as identifying hotspots in geographic data.

Clustering

Clustering is a technique that groups data points based on their similarity. It aims to discover patterns, structures, or correlations in data. It helps to get insights, and trends, and simplify complex datasets and decision-making by combining related data points into clusters.

    1. K-Means: Divides data into ‘k’ clusters.
    2. Hierarchical: Organizes data into a tree-like structure.
    3. DBSCAN: Identifies clusters based on data point density.
    4. Mean Shift: Finds density peaks in the data. 

K-means

K-means is a popular clustering technique used in machine learning algorithms, and when K (the number of clusters) is set to 2 it will partition data into two distinct groups

The technique begins by selecting two starting cluster centers at random and then allocates each data point to the nearest center. It computes the (typically Euclidean) distance between each point and the cluster centers, then assigns each point to the cluster with the closest center. This method is repeated until there is little variation in awarding points to clusters.

The approach attempts to minimize within-cluster variation by bringing data points inside the same cluster as close together as possible. It optimizes by recalculating cluster centers as the mean of each cluster’s data points. Because K-means can converge to a local minimum, resulting in different results on various runs, it’s typical to run it numerous times and choose the best result.

The K=2 case is handy for binary clustering (0 & 1), such as categorizing email as spam or not spam in short Yes or No. It’s also used in market segmentation, detecting customer preferences, and any other situation where data can be divided into two different groups, making it a key tool.

with K-means K=4 is a machine learning algorithm that separates a dataset into four independent clusters. It works by assigning data points based on similarity to the nearest cluster center and then recalculating the centers as the mean of the data points in each cluster. This method is effective for categorizing data into four separate groups, which can help with customer segmentation, image compression, or any other situation where categorizing data into four relevant categories is critical for analysis and decision-making.

Week-6 (10/20) Working on Project.

Today I was working on Outline on my project and figured out the Outlinbe of my whole project

Label Encoding: In label encoding, each category is assigned a unique numerical value. However, because it assumes an ordinal relationship between the categories, this may not be appropriate for all algorithms. For this reason, Scikit-learn provides the LabelEncoder.

One-Hot Encoding: For each category, this approach generates binary columns. Each category is converted into a binary vector, with 1s in the respective category column and 0s everywhere else. This is more appropriate when there is no intrinsic order in the categories. One-hot encoding is made simple by libraries like pandas.

Dummy Variables: When using one-hot encoding, you may encounter multicollinearity problems, in which one column may be predicted from the others. You can use n-1 columns in this situation and drop one category as a reference.

Frequency Counts: Features can be created based on the frequency of each category. This is beneficial if the number of categories is important information for your analysis.

Target Encoding (Mean Encoding): In some circumstances, the mean of the target variable for that category might be used to substitute categories. This can help with regression difficulties. However, concerns such as target leakage and overfitting must be addressed.

Missing Data: Determine how to handle missing categories. You can construct a separate “missing” category or use methods like mode, median, or a specific value to impute missing values.

Visualization: To show the distribution of categorical data, use plots such as bar charts, histograms, and pie charts. This can help you better grasp the facts.
Statistical Tests: To test for independence between two categorical variables, use statistical tests such as Chi-Square or Fisher’s exact test.

Feature Selection: In machine learning, feature selection strategies may be required to determine the most essential categorical variables for your model.

Decision trees, random forests, and gradient-boosting models can handle categorical data without requiring one-hot encoding. They automatically partition nodes based on categorical features.

Cross-Validation: Use suitable cross-validation approaches when working with categorical data in machine-learning models to avoid data leaking and overfitting.

Handling High cardinality refers to categorical variables having a large number of distinct categories. Techniques such as target encoding, grouping comparable categories, and employing embeddings may be useful in such instances.

Week-6 (10/18) Working on numerical data.

Stats

Minimum = 13.
Maximum = 88.
Mean = 32.6684
Median = 31.
stdev = 11.377
Skewness = 0.994064
Kurtosis = 3.91139

The statistics mentioned in class, mean, median, standard deviation, skewness, and kurtosis, provide valuable insights into the distribution of data like spread, central tendency, and shape of the data distribution.

  • Mean: It says the central tendency of the data, the mean age, would reveal the average age of the individuals in our sample.
  • Median: When the data is sorted in ascending order, the median is the middle value. The median is a measure of central tendency that is less sensitive to outliers or extreme values than the mean.
  • Standard Deviation (Stdev): The standard deviation is a measure of the spread or dispersion of the data. A higher standard deviation indicates that ages are more variable, whereas a smaller standard deviation indicates that ages are closer to the mean. It indicates how far the data points differ from the mean age.
  • Skewness: Skewness measures the asymmetry of the data distribution. A positive skew indicates that the data is skewed to the right, with the right side of the distribution having a longer tail. In the context of an age column, a positive skew may suggest that the collection contains more younger persons
  • Kurtosis: This measures the “tailedness” of the data distribution. High kurtosis shows heavy tails, or more extreme values in the data, whereas low kurtosis indicates lighter tails. Positive kurtosis implies that the distribution has more outliers or extreme values, whereas negative kurtosis suggests that the distribution contains fewer outliers.

These statistics can help you understand the age distribution in your dataset.

  • The mean and median are different so the distribution of ages is skewed to the right.
  • The skewness = 1.
  • Slightly greater than 3 for a normal distribution of 3
  • Normal distribution closely resembles the age distribution

T-Test

A t-test is a statistical hypothesis test that is used to assess whether or not there is a significant difference in the means of two groups or conditions. It’s a common approach for comparing the means of two samples to determine whether the observed differences are due to a true effect or just random variation.

There are two types of test

  • Independent Two-Sample T-Test: This test is used when comparing the means of two independent groups or samples.
  • Paired T-Test: This test is used when comparing the means of two related groups, such as before and after measurements on the same subjects or matched pairs of observations.

The choice to reject the null hypothesis in both t-tests is determined by the p-value and the significance threshold (usually 0.05). If the p-value significance level is exceeded, the null hypothesis is rejected, indicating a significant difference. If the p-value is greater than the significance level, you keep the null hypothesis because there is insufficient evidence for a meaningful difference.

Cohen’s d

Cohen’s d is a popular measure of impact size that has been elaborated upon by various scholars, including Sawilowsky.
Sawilowsky’s expansion contains extra criteria and interpretations for Cohen’s d values, offering more context for interpreting impact sizes.

Sawilowsky Method

    • Calculate the mean first.
    • Calculate the pooled Standard Deviation.
    • Calculate the Cohen’s d.
      • A small Cohen’s d value (approx. 0.2) is a small size.
      • A medium Cohen’s d value (approx. 0.5) is a moderate size.
      • A large Cohen’s d value (approx. 0.8 or higher) is a large size.

How to calculate ?

  • Calculate the Means (M1 and M2):
  • Calculate the Standard Deviations (SD1 and SD2):
  • Calculate the Pooled Standard Deviation (pooled SD):
  • Calculate Cohen’s d:

What do we get from the above ?

  • Interpretation of Effect Size: The size stressed that the effect of the magnitude should be taken into account for specific settings. An effect size that is considered minor in one domain may be significant in another, so we need to consider all the points.
  • Direction of Effect: As the measure of impact size, Cohen’s d does not include information regarding the direction of the effect. Sawilowsky suggests including direction information, such as “positive Cohen’s d” or “negative Cohen’s d,” to indicate whether the effect correlates to an increase or decrease in the outcome.

In conclusion, Sawilowsky’s extension of Cohen’s d provides context-specific information which help us to understand effect on sizes. It recognizes that the relevance of an effect varies depending on the issue.