Summary
In this lecture, you will learn what P-P plot is, how to construct a P-P plot, how to interpret a P-P plot.
Transcripts
- [Voiceover] We are going to study what P-P plot is, how to construct a P-P plot, how to interpret a P-P plot. At the end, let's make a summary. P-P plot stands for "probability" plot. That is taking the first letter P from the word Probability. P-P plot stands for probability versus probability, or percent versus percent plot. That is taking the first the letter P from the word probability, or percent. P-P plot belongs to P-Plot. Q-Q Plot stands for Quantile versus Quantile plot, taking the first letter Q from the word Quantile. Q-Q Plot also belongs to P Plot. Q-Q Plot is more commonly used than P-P Plot. P-P Plot is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other, given two probability distributions with cdfs F and G. It plots F Z and G Z as z ranges from negative infinity and positive infinity. P-P plot is a cross-plot of matching two cumulative probabilities from two cumulative distributions given the quantile value Z. That's for input of Z. The output is the pair of numbers, giving what percentage of F and what percentage of G fall at Z. For a discrete distribution, a linear interpretation can be utilized to derive the percentage given Z. P-P plots are vastly used to evaluate the skewness of distribution. As a cdf has range of zero to one, the domain of this parametric graph is negative infinity and positive infinity, and the range is the unit square zero one times zero one. Constructing a P-P plot is to calculate and estimate as cumulative distributions to be plotted given data values. In this example, when Log Porosity equal to Core Porosity equal to 0.3, cumulative frequency of Log Porosity equal to 0.3, cumulative frequency of Core Porosity equal to 0.3. Cross plot 0.3 versus 0.3 in P-P Plot. Follow this procedure and we can make some scatter points in P-P Plot. When P-P Plot compares the empirical cumulative distribution function, that is ecdf, of a variable with a specified theoretical cumulative distribution function F, the ecdf, denoted by F N X is defined as the proportion of nonmissing observations less than or equal to X, so that F N X are equal to I divided by N. To construct a P-P plot, the n nonmissing values are first sorted in increasing order. X I less than or equal to X two until less than or equal to X N. Then the S ordered vlaue X I is represented on the plot by the point whose x-coordinate is F X I, and whose whose y-coordinate is I divided by N. Like Q-Q plots, P-P plots can be used to determine how well a theoretical distribution models a data distribution. P-P plots are variant to changes in location and scale. The comparison line is the 45 degree line from zero and zero to one and one. The distributions are equal if, and only if, the plot falls on this line. Any deviation indicates a difference between the distributions. Ecdf is empirical cumulative distribution function. Tcdf is the theoretical cumulative distribution function. If the tcdf reasonably models the ecdf in all respects, that is distribution shape. Location on the scale, the scatter point pattern of the P-P plot is linear through the origin, and has unit slow. If the theoretical distribution has lower mean the empirical distribution, the scatter point pattern on the P-P plot is departure below the 45 degree line. There are bigger differences at higher density regions. If the theoretical distribution has higher mean than empirical distribution, the scatter point pattern on the P-P plot is departure about the 45 degree line. There are bigger differences at higher density regions. If the theoretical distribution has lower deviation than empirical distribution, the scatter point pattern on the P-P plot is departure from the 45 degree line. If the theoretical distribution has higher deviation than empirical distribution, the scatter point pattern on the P-P plot is departure from the 45 degree line in the opposite way as theoretical distribution with lower deviation. A P-P plot compares the theoretical cumulative distribution function of a data set with a specified theoretical cumulative distribution function F. The construction of a P-P plot requires the location and scale parameters of F to evaluate the cdf at the ordered data values. On a P-P plot, changes in location or scale don't necessarily preserve linearity. On a P-P plot, the reference line for any distribution is always the diagonal line Y equal to X. An advantage of P-P plots is that they are discriminating in regions of high probability density, since in these regions the empirical and the theoretical cumulative distributions change more rapidly than in regions of lower probability density.
- [Voiceover] We are going to study what P-P plot is, how to construct a P-P plot, how to interpret a P-P plot. At the end, let's make a summary. P-P plot stands for "probability" plot. That is taking the first letter P from the word Probability. P-P plot stands for probability versus probability, or percent versus percent plot. That is taking the first the letter P from the word probability, or percent. P-P plot belongs to P-Plot. Q-Q Plot stands for Quantile versus Quantile plot, taking the first letter Q from the word Quantile. Q-Q Plot also belongs to P Plot. Q-Q Plot is more commonly used than P-P Plot. P-P Plot is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other, given two probability distributions with cdfs F and G. It plots F Z and G Z as z ranges from negative infinity and positive infinity. P-P plot is a cross-plot of matching two cumulative probabilities from two cumulative distributions given the quantile value Z. That's for input of Z. The output is the pair of numbers, giving what percentage of F and what percentage of G fall at Z. For a discrete distribution, a linear interpretation can be utilized to derive the percentage given Z. P-P plots are vastly used to evaluate the skewness of distribution. As a cdf has range of zero to one, the domain of this parametric graph is negative infinity and positive infinity, and the range is the unit square zero one times zero one. Constructing a P-P plot is to calculate and estimate as cumulative distributions to be plotted given data values. In this example, when Log Porosity equal to Core Porosity equal to 0.3, cumulative frequency of Log Porosity equal to 0.3, cumulative frequency of Core Porosity equal to 0.3. Cross plot 0.3 versus 0.3 in P-P Plot. Follow this procedure and we can make some scatter points in P-P Plot. When P-P Plot compares the empirical cumulative distribution function, that is ecdf, of a variable with a specified theoretical cumulative distribution function F, the ecdf, denoted by F N X is defined as the proportion of nonmissing observations less than or equal to X, so that F N X are equal to I divided by N. To construct a P-P plot, the n nonmissing values are first sorted in increasing order. X I less than or equal to X two until less than or equal to X N. Then the S ordered vlaue X I is represented on the plot by the point whose x-coordinate is F X I, and whose whose y-coordinate is I divided by N. Like Q-Q plots, P-P plots can be used to determine how well a theoretical distribution models a data distribution. P-P plots are variant to changes in location and scale. The comparison line is the 45 degree line from zero and zero to one and one. The distributions are equal if, and only if, the plot falls on this line. Any deviation indicates a difference between the distributions. Ecdf is empirical cumulative distribution function. Tcdf is the theoretical cumulative distribution function. If the tcdf reasonably models the ecdf in all respects, that is distribution shape. Location on the scale, the scatter point pattern of the P-P plot is linear through the origin, and has unit slow. If the theoretical distribution has lower mean the empirical distribution, the scatter point pattern on the P-P plot is departure below the 45 degree line. There are bigger differences at higher density regions. If the theoretical distribution has higher mean than empirical distribution, the scatter point pattern on the P-P plot is departure about the 45 degree line. There are bigger differences at higher density regions. If the theoretical distribution has lower deviation than empirical distribution, the scatter point pattern on the P-P plot is departure from the 45 degree line. If the theoretical distribution has higher deviation than empirical distribution, the scatter point pattern on the P-P plot is departure from the 45 degree line in the opposite way as theoretical distribution with lower deviation. A P-P plot compares the theoretical cumulative distribution function of a data set with a specified theoretical cumulative distribution function F. The construction of a P-P plot requires the location and scale parameters of F to evaluate the cdf at the ordered data values. On a P-P plot, changes in location or scale don't necessarily preserve linearity. On a P-P plot, the reference line for any distribution is always the diagonal line Y equal to X. An advantage of P-P plots is that they are discriminating in regions of high probability density, since in these regions the empirical and the theoretical cumulative distributions change more rapidly than in regions of lower probability density.