This time we will be creating the graphs for the data visualisation. Data variables will be the same as the ones previously used. You can find the python program here, and try to run it on your machine as well.
There are two variables explanatory(dependent-variable on the x-axis) and response(independent target variable on the y-axis).
How to decide what type of graph is needed to be constructed?
It is decided using this flowchart.
Univariate histogram for quantitative variables.
This graph is unimodal, with its highest peak at the median category of 15 years of age to 18 i.e.75% of the data. It seems to be skewed to the right as there are higher frequencies in lower categories than the higher categories. The majority of the people started to smoke in the very young age of 10 to 20 years.
This graph is unimodal, with its highest peak at the median category of 18 years of age to 20 i.e.75% of the data. It seems to be skewed to the right as there are higher frequencies in lower categories than the higher categories. Mostly, people made smoking as their habit at 15 to 25 of their age.
Univariate graphs for categorical variables.
This graph is unimodal, with its highest peak in the category of 1.0 that represents that the person is an active smoker for the past 12 months.
Scatter-plot for both the variables as quantitative.
The graph above plots the age at which EPISODE OF NICOTINE DEPENDENCE BEGAN to the age corresponding started SMOKING REGULARLY. We can see that the scatter graph shows a low positive relationship/trend between the two variables.
The graph above plots the age at which EPISODE OF NICOTINE DEPENDENCE BEGAN to the age corresponding started SMOKING. We can see that the scatter graph shows a low positive relationship/trend between the two variables.
The graph depicts that a person has higher chances of experiencing episode of nicotine dependence is he is an active smoker.
Thus, the variables chosen do have a relation amongst them.