Previous Lesson Complete and Continue  

  2. Type of Analytics

Lesson content locked

Enroll in Course to Unlock
If you're already enrolled, you'll need to login.

Transcript

- To better understand how a corporation may with with a data scientist, let's discuss the work flow. The first step is to define the basic question to be answered by the data science team. This requires a close interaction between the client, your scientist or engineers, and the data scientist to define the scope and expectations according to the current data quality and availability. Once the goals and scope are clear, data scientists will do some investigation and come out with a series of models that they believe can provide the highest level of inference, prediction, and decision capability. Once those have been defined, the data scientist will quickly prototype and validate each of them. The validation for benchmarking the prototype against multiple metrics to ensure that they fall within an acceptable performance threshold. After the base prototypes have been ruled out, a data scientist will prepare a demonstration and then send it to the client. If the prototype does not solve the business issue, then the business can decide if he should iterate farther without changing the scope, data or model, or de-prioritize the project. However, if a prototype is successful, then software engineers will be able to proceed with product selection and prototype in multiple iterations. The data science execution workflow demonstrates several key components. First, the most important thing is the question to be answered. Second, the second most important thing is the data. Third, the amount and quality of the data available limits or enables the question. Having data may not guarantee the success of the project if initial questions were not posed clearly. Last but not least, watch out for answers without sufficient data to support it. That is, the qualitative model may depend heavily on the availability of data. The work of a data scientist can be broken down in three levels of analytics. The first level focuses on exploring past and present data with the aid of visualization and visual tools. These methods are widely adopted in the industry. This is referred as "descriptive analytics." the next level focuses on developing models to predict and seek outcomes from historical data. There is a plethora of models available and this is where most research is currently happening. This is referred as predictive analytics. The last level focuses on seeking and optimal business decision by using predictive models. This is considered analytics with the highest value for business, and it's referred as prescriptive analytics. In the next slides we will talk in more detail about this type of analytics with some examples. Descriptive analytics provides insights from past data and seeks to answer what and why something happened. Descriptive analytics aims at quantitatively describing the main features of a collection of data. It's covering known relationships and infers social drivers. This process usually involves statistical analysis, correlations, data cleansing, visualization, dimensionality reduction, clustering analysis and variable selection. For example, at your left, descriptive analytics have been used to identify potential outliers when correlating permeability and porosity. There are multiple techniques for layer treatment, but seek to avoid a generation of misleading interpretations. At the right, K means algorithms have been used to group lithofaces from multiple deposition environments here through the graphic analysis. Each of the lithofaces trends may be represented by thousands of statistical attributes, are mapped into 2D plot with the aid of image and analytical deduction techniques. In the case of analytics, this process focuses on analyzing current and historical facts to make predictions about future events using output label data. For example, rock classification according to the position environment, plot and forecasting. The two main plots of prediction are classification and regression, in classification the output variable will take class labels. In regression the output value takes continuous values. At the left, existing interpretations have been used to create a model for classifying Shell, Thai gas, and gas content formations. The below signatures are mapped to any of these three formation classes. At the right, a regression we see here, based on the current year and network, is used to predict the drilling dysfunctions associated to the Torque China profile seven minutes ahead in the future, with a slowing growing deterioration that you can see. Prescriptive analytics is the most challenging type of analytics, but it brings the highest value to the business since it affects directly the decision making process. It is basically concerned with answering questions such as "what, when, and why" a process will happen in the future. That means prescriptive analytics relies on descriptive and predictive analytics to determine the best course of action for ideal set of objectives. In our business, prescriptive analytics involves several challenges including uncertainty and risk management, multi-objective and constrained optimization, effective selection of variables and models, and derivation of optimal learning trends from dynamic objectives. At the left we can observe how the use of a data-driven predictive model, combined with a simulation model can deliver higher level of production that the base case when accounting to geological uncertainty. Namely, low, high, and hyper active channel restraints. At the right we can use descriptive analytics and physical size to determine the optimized position class by locating a number of wells, well separation, number of estates, or cluster sizes in a conventional plate development. Besides the type of analytics, it is important to understand when they're learning something from data are performed in unsupervised or supervised fashion. Unsupervized learning generally of course, when we look at the data without any connection with output information. For example, looking at static data such as geological attributes with no reference well production. On the other hand, supervised learning seeks to connect data with a response or output. In this case for example, we can mention how to relate completion or drilling parameters with our MPB values. Nevertheless there's a fair family of methods that lie between these two types of learning. Namely semi-supervised learning. These methods are receiving a lot of attention for many practical schematics. For example, there are a few well logs that can be labeled as reliable or unreliable, but we can not absolutely label al of them. So therefore we need to predict how to classify. Obviously the list of opportunities that can be enabled via data science is endless. From top to bottom and left to right we can mention, descriptive analytics via correlation can be performed from the LAS or at the level to connect trends from neighboring well logs. Descriptive and predictive analytics via principal components or random force could be used to identify main production progress. Predictive analytics can make use of geological completion and drilling data between preferred, best, drilling locations. Predictive analytics or real time data, can be used to proactively track fill operations or equipment anomalies. Prescriptive analytics can be used to optimize and control future data measurements in a feedback process. Prescriptive analytics also can provide effective means to balance risk and return of investment in many fill portfolio statistics.