Previous Lesson Complete and Continue  

  Basic Concepts in Geostatistics

Lesson content locked

Enroll in Course to Unlock
If you're already enrolled, you'll need to login.

Transcript

- In this lecture we are going to study: Some statistics which are essential to geostatistics. The data are generated via experimental measurement. Statistics is a mathematical tool for quantitative analysis of data, summarizing those observations, and as such, it serves as the means by which we extract useful information from data. That will includes: Population and Sample, Variable, Mean and Weighted mean, Standard deviation, Histogram and cumulative histogram , Probability Distribution, and Cumulative Probability Distribution, and examples, Normal distribution, Uniform distribution, Central limit theory. At the end, let us make a summary. Typically, the population is very large, making a complete enumeration of all the values in the population is impossible. The sample usually represents a subset of manageable size. Samples are collected and statistics are calculated from the samples. For example, the mean height of men. This is a hypothetical population because it includes all men that have lived, are alive and will live in the future. Typically, it is impossible to measure the entire population because not all members are observable. Instead, we could take a subset of this population called a sample and use this sample to draw inferences about the population under study, given some conditions. Thus we could measure the mean height of men in a sample of the population which we call a statistics and use this to draw inferences about the parameter of interest in the population. It is an inference because there will be some uncertainty and inaccuracy involved in drawing conclusions about the population based upon a sample. A variable is a measure which can assume any of a prescribed set of values. A continuous variable can assume a continuum of values between any two given values. For example, porosity. A discrete, or categorical, variable can assume any of a specified finite or countable list of value. For example, facies. A random variable is a set of possible values from a random experiment. For example, porosity or facies can be random variable. As in basic math, variables represent something, and we can denote them with an x or a y any other letter for that matter. But in statistics, it is normal to use an x to denote a random variable. The random variable takes on different values depending on the situation. Each value of the random variable has a probability or percentage associated with it. The mean and expected value represented by the Greek letter mu are used synonymously to refer to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution. The arithmetic mean or simply mean of a sample x1, x2, until xn, is the sum of the sampled values divided by the number of items in the sample, as defined in the equation. The weighted mean is similar to an ordinary mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others, defined in the equation. Therefore data elements with a high weight contribute more to the weighted mean than do elements with a low weight. The weights cannot be negative. Some may be zero, but not all of them since division by zero is not allowed. When the weights are normalized such that they sum up to 1. The above formula is simplified as this equation. Standard deviation represented by the Greek letter, sigma, measures the amount of variation or dispersion from the average. A low standard deviation indicates that the data points tend to be very close to the mean, also called expected value. A high standard deviation indicates that the data points are spread out over a large range of values. The first equation is the variance of data x. Mu x is the mean of the data. The second equation is the unbiased variance of data x. This third equation is standard deviation of x. The histogram is a bar chart of the frequency of sample values measured. The frequency in each bin is equal to the total number of samples falling into that bin divided by the total number of samples. The frequency of samples in a histogram bin can be interpreted as the probability of a sample value to fall in that bin. A bin is a sample data range. The cumulative histogram is a bar chart of the cumulative frequency of sample values measured. The cumulative frequency in each bin is equal to the total frequencies in all of the bins up to the specified bin. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. A cumulative probability distribution is a table or an equation that links each outcome of a statistical experiment with its cumulative probability of occurrence. The table here, which associates each outcome with its probability, is an example of a probability distribution of the random variable X. It will make clear the relationship between random variables and probability distributions. Suppose you flip a coin two times. This simple statistical experiment can have four possible outcomes: head head, head tails, tails head, and tails tails. Now, let the variable X represent the number of Heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable, because its value is determined by the outcome of a statistical experiment. Probabilities associated with the number of heads are shown in the table. In the table, the cumulative probability refers to the probability that the random variable X is less than or equal to x. A cumulative probability refers to the probability that the value of a random variable falls within a specified range. Let us return to the coin flip experiment. If we flip a coin two times, we might ask, what is the probability that the coin flips would result in one or fewer heads? The answer would be a cumulative probability. It would be the probability that the coin flip experiment results in zero heads plus the probability that the experiment results in one head. So we take 0.25 + 0.50, that's equal to 0.75. Let us take a look at the first equation, X is the random variable, Pis the probability of X. It says that Pis always between 0 and 1, never negative and never greater than one. The second equation says sum of the probabilities is equal to 1. Normal distribution equation is shown here. Mu is the mean. Sigma is the standard deviation. It is the function of e powered to x minus the mean squared. Standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one, which is shown in the equation and graph. Y is the function of e powered to x squared. The graph of the normal distribution depends on two factors, the mean and the standard deviation. The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height of the graph. When the standard deviation is large, the curve is short and wide. When the standard deviation is small, the curve is tall and narrow. All normal distributions look like a symmetric, bell-shaped curve. The cumulative density function, cdf, of the normal distribution, is defined in the equation. The cumulative density function, cdf, of standard normal distribution is shown in the this equation and in the graph. If the random variable is real-valued, the cumulative distribution function gives the probability that the random variable is no larger than a given value. In the real-valued case, the cdf is the integral of the pdf provided that this function exists. An uniform distribution, sometimes also known as a rectangular distribution, is a distribution that has constant probability. The probability density function and cumulative distribution function for a continuous uniform distribution on the interval a and b are defined here. Central Limit Theorem says that as the number of variables in a sum increases the distribution of the sum of random variables approaches the normal distribution regardless of the shape of the distribution of the individual random variables. From this lecture, we have studied Population is a set of entities under study and sample is subset of population. Variables are divided as continuous variable, discrete variable, and random variable. Mean and weighted mean describes data scale, and standard deviation describes data spread. Histogram and cumulative histogram, Probability Distribution and Cumulative Probability Distribution describe data distribution. Two examples of Normal distribution and Uniform distribution are shown here. And Central limit theory is briefly addressed. Here is the end of the lecture.