Previous Lesson Complete and Continue  

  What Is QQ Plot?

Lesson content locked

Enroll in Course to Unlock
If you're already enrolled, you'll need to login.

Transcript

- [Voiceover] We are going to study what Q-Q plot is, how to construct a Q-Q plot, how to interpret a Q-Q plot, at the end let's make a summary. P-plot stands for "probability" plot, that is taking the first letter P from the word probability. Q-Q plot stands for "quantile versus quantile" plot, taking the first letter Q from the word quantile. Q-Q plot belongs to P-plot. P-P plot stands for "probability versus probability" or "percent versus percent" plot, that is taking the first letter P from the word probability or percent. P-P plot also belongs to P-plot. Q-Q plot is more commonly used than P-P plot. A Q-Q plot is a probability plot for assessing how closely two data sets agree, which plots the two quantiles against each other, given two probability distributions with CDFs Fand G with associated quantile functions F-inverse and G-inverse, the inverse function of the CDF is the quantile function. The Q-Q plot draws the q-th quantile of F against the q-th quantile of G for a range of values of Q. A Q-Q plot is a cross-plot of matching two quantile values from two cumulative distributions, given the cumulative probability. Thus for input the cumulative probability the output is pair of numbers giving what quantile of F and what quantile of G fall at the cumulative probability. For a discrete distribution, a linear interpretation can be utilized to derive the quantile given the cumulative probability. Constructing a Q-Q plot is to calculate or estimate the quantiles to be plotted. Quantiles are plots taken at regular intervals from the cumulative distribution function, CVF, of a random variable. Dividing all the data into an essentially equal set in the data subsets is the motivation for an quantiles. In this example, when CDF equal to 0.2, data value is equal to two. Normal values is equal to -0.85 2 and -0.85 are cross-plotted in Q-Q plot. Follow this procedure, we can make more scatter plots in Q-Q plot for CDF equal to 0.4, 0.6, 0.8, and 1. If one or both of the axes in a Q-Q plot is based on a theoretical distribution with a continuous cumulative distribution function, CDF, all quantiles are uniquely defined and can be obtained by inverting the CDF. If one or more of the axes in a Q-Q plot is based on a discrete distribution, interpolated quantiles may be plotted. The differences of two distributions in center, or mean, spread, or variance, and shape, can be identified in Q-Q plot. It's more obvious than looking at two histograms, side by side. When all points on a Q-Q plot fall on the 45 degree line, the two distributions are the same. In this example, Log Porosity distribution is the same as the Core Porosity distribution. A systematic departure above or below the 45 degree line indicates that the center or mean of the distributions is different. In this example, Core Porosity values are higher than Log Porosity values. In Q-Q plot, scatter points are shifted above the 45 degree line. In this example, Core Porosity values are lower than Log Porosity values. In Q-Q plot, scatter points are shifted below the 45 degree line. A slope different from 45 degree line indicates that the spread or variance of two distributions is different. In this example, Core Porosity variance is higher than Log Porosity variance. The higher of the variance, the higher of the spread. In Q-Q plot, the slope is greater than 1. In this example, Core Porosity variance is lower than Log Porosity variance. The lower of the variance, the lower of the spread. In Q-Q plot, the slope is less than 1. In this example, Core Porosity and Log Porosity have different distribution shifts. In Q-Q plot, the scatter points make a curved line. A Q-Q plot compares the quantiles of data distribution with the quantiles of a standardized theoretical distribution from a specified family of distributions. The construction of a Q-Q plot does not require that the location or scale parameters of F be specified. The theoretical quantiles are computed from a standard distribution within the specified family. A linear point pattern indicates that the specified family reasonably describes the data distribution, and the location and the scale parameters can be estimated visually as the intercept and slope of the linear pattern. The linearity of the point pattern on a Q-Q plot is unaffected by changes in location or scale. On a Q-Q plot, the reference line representing a particular theoretical distribution depends on the location and scale parameters of that distribution. Q-Q plot is more commonly used than P-P plot.