In the Uncountable interface, there are several ways to predict the experimental results of any given experiment. This article walks users through the various ways in which they can set up, train, and model their experiments.
Step 1. Select the Kind of Experimental Design
Uncountable’s platform offers three types of workflows for obtaining recommended experiments: 1. Learn About Inputs, 2. Automatically Suggest Experiments, and 3. Optimize with Chosen Inputs. Each kind of workflow can be optimized for whatever a user’s current goals and data might be.
The following diagram illustrates what each of the three different workflows does:
Answering the following series of questions can help you decide which workflow to choose from:
- Do you have prior data of experiments with measurements?
- Have a substantial portion of ingredients or parameters been used in this prior data?
- If the answer to both these questions is “yes,” proceed to 1.2. If “no,” proceed to 1.1.
1.1) Learn About Inputs
If you do not have prior data to refer to, this is the best option to begin gathering informative data and building a good model. Learn About Inputs helps spread out screening experiments throughout an experimental space to maximize information-gathering without referring to past data.
In order to learn about the inputs, you have to first define what inputs you wish to learn about. You can define these inputs by selecting a Constraint that you have already designed.
You have two options to choose from, the Uncountable DOE, and a conventional factorial DOE. Selecting the type of DOE depends on what your current experimental aims and limitations are.
The conventional factorial DOE (1.1.2) favors a more structured and exhaustive approach for cases with a large number of initial experiments that are relatively fast and cheap to conduct. Conversely, Uncountable DOE (1.1.1) is best-suited for a less comprehensive, tailored approach that prioritizes efficiency above all else.
1.1.1) Uncountable DOE
The Uncountable DOE uses a proprietary design and selection algorithm to choose the optimal experiments in an experimental space, given a fixed number of experiments to be run. The main aim of the Uncountable DOE is to build a strong foundation for future optimization, by learning the most about an experimental space with as few initial experiments as possible. The disadvantage to using the Uncountable DOE is that it has a less conventional structure than a factorial DOE. Please use this option if you want to use experimental resources as efficiently as possible. Proceed to Step 2.
1.1.2) Full Factorial Design
A factorial design fully crosses and maps all the input parameters. This is an oft-used method in the design of experiments. The disadvantage of this methodology is that, depending on the number of ingredients or parameters you are optimising over, a very large number of experiments need to be run to complete a full factorial design. Please use this option if you prefer a structured initial set of experiments and are able to run all the experiments needed cheaply and speedily. Proceed to Step 2.
1.2) “Optimize” or “Automatically Suggest”
If you have constructed a specific Constraint of input ingredients and process parameters, select the option to Optimize with Chosen Inputs and proceed to Step 2 to select your Constraint.
If you do not wish to construct a specific Constraint, select the option to Automatically Suggest Experiments and skip straight to Step 3. This option is especially useful if you wish to make a quick assessment of what a recommended experiment will resemble without devoting time to construct a comprehensive Constraint.
Step 2. Choosing the Right Constraints
For more guidance on constructing a Constraint on the Uncountable Platform, please refer to the Defining Constraints section of this article.
In Step 1, if you are using the “Learn About Inputs” workflow and do not have prior data, regardless of whether you are using the Uncountable DOE or full factorial design, you need to have a fully filled in Constraint to inform the selection algorithm about what your input parameter space looks like.
If you selected “optimize with Chosen Inputs” in Step 1, you need to have constructed a Constraint that is fully filled in. Select that Constraint from the dropdown menu, and check that the choice and ranges of ingredients and process parameters are consistent with your intentions. You should include important ingredients that you do not intend to use, but which were often used in past data, and make sure that their use frequency is set to “Never Use”.
If you have not Constraint chosen beforehand, you need to use the “Automatically Suggest Experiments” option in Step 1.
Step 3. Choosing the Right Spec
For more guidance on constructing a Spec on the Uncountable Platform, please refer to the Defining Specs section of this document.
Select the Spec you have constructed before, and check that all the measurements of interest for this project are listed, under the correct conditions, and with the correct priorities and goal thresholds.
Keep in mind that the goal thresholds should be reasonably set, because thresholds that are far outside the realm of the pre-existing data’s measurements can distort the priorities of the Uncountable optimization objective function.
Step 4. Choosing the Training Set
By default, the training set to be used for your model will be every experiment in your current project. In general, using this default training set is the best course of action.
Nevertheless, it would sometimes be advisable to exclude or include experimental data in order to get a better model with more accurate and sensible predictions and recommendations. The following is a checklist for experiments that you can choose to include or exclude to form a custom training set:
Exclude experiments when –
- They are far outliers in terms of the types or amounts of ingredients used.
- They have incomplete input data, particularly for important process parameters.
- In your assessment, the measurements taken from the experiment are unreliable.
The Uncountable platform also has automatic outlier-exclusion capabilities that can help to exclude outlier experiments, even if they are accidentally included.
Include experiments when –
- They are part of your project data, and don’t satisfy the three exclusion criteria above.
- They are part of the data of a different project, but are closely related in terms of both inputs and outputs to the current project’s data, and don’t satisfy the three exclusion criteria
In general, it is recommended to include as many experiments as can fit into a single batch within the training set. For example, if a user can include anywhere between 7 and 10 experiments within a single batch, it is recommended to use 10.
The platform’s recommendation for the number of experiments is, at best, a rough estimate based on the number of factors which the system takes into account. Usually, between 6 to 8 experiments will suffice, although if a given experiment has few varying inputs, it can be advisable to run fewer.
We can return to fine-tuning the training set after the first initial run of our optimization job.
Step 5. Initial Run – Interpreting Your Initial Model
Having chosen the Constraints, Spec, and Training Set, where applicable, you can name the current job run and type in how many experiments you wish to generate. Once this is finished, click “Suggest Formulation(s)” and the job will be run.
It is sometimes necessary to run a job a second time after making an assessment of how good a first initial model is. Models can be built relatively quickly, so you should not hesitate to re-run a job after making changes if you think you can improve on the model being built or the recommendations being given.
5.1) Assess Recommendations
One of the main ways you can assess the success of an initial run is to check the suggested experiments. As the subject matter expert, assess how sensible the recommendations are, in particular:
- Are certain ingredients overused, or used at too high levels?
- You can lower the maximum allowed for these ingredients in the Constraint field and easily recalculate the recommendation by clicking on the purple “Regenerate” button.
- Are certain compulsory ingredients not used?
- Set the ingredient to be “Always used” in the Constraint.
- Are certain rarely used ingredients used too frequently?
- Go to the Dashboard and check on past experiments that used this ingredient. Were they promising experiments, or are they outliers that ought to be excluded from the Training set?
- Do ingredient totals add up? Are particular categories of ingredients used at too high or low levels?
- You can adjust these limits in the Constraint.
- Do the recommendations look very similar to each other?
- This can have several causes. The two most common are an overly restrictive Constraint, or a Spec that puts too much emphasis on a single goal. Check to see if you ought to loosen your Constraint or if one of the measurement goals has been made too ambitious. It can also be the case that the past data is very comprehensive and the model has high confidence that the best experiments to test are in this region.
5.2) Assess the Model’s Predictive Accuracy
A more in-depth assessment of your model can be made by using the “analyze Model” tool. The Training Accuracy table informs you about how well-constructed the model used for optimization is.
For each output, the following is displayed in the table:
- RMSE – This is the model’s predictive error, and the lower the RMSE, the more accurate the model’s predictions are. This RMSE is calculated from a leave-one-out cross-validation procedure
- r² Score – This is the Coefficient of Determination, and the higher it is and closer to 1, the better the explanatory power of your model. It is one way of assessing how much of the variance and the range of the output of your data set can be explained by the model created.
- Explained Error – The Explained Error compares the magnitude of the RMSE with the magnitude of the standard deviation of your data. Since the RMSE is an absolute measure, the magnitude of which can differ from output to output, normalising based on the standard deviation helps you to compare between different outputs. The higher the Explained Error, and closer to 100%, the more accurate the model’s predictions are.
- Scatter Plot of Predicted vs. Actual: In many ways, one of the most intuitive and informative measures of predictive quality. We want the model’s predictions to be close to the actual values of the experiments in the training data set, i.e. the points should lie close to the diagonal grey line of y=x on the graph. In the scatter plot view, look out for the following:
- Outliers, if one or two points have predictions that are very different from the actual values, more so than all other data points. Are the measurements from these experiments reliable? If they are not, you can remove them from the training data set altogether.
- “Vertical lines” like those in the following figure:
This graph models predicted value (x-axis) against actual value (y-axis). The “lines” indicate that the model has made the same prediction for several data points which, in reality, have different actual measurements. The three most common reasons for this are 1) blank process parameter data, since those are interpreted by the model to be “0”, remove data points that have incomplete process parameters or remove these process parameters from the model features, 2) missing important input features, make sure that all important ingredients are part of your Constraint used to create the model, 3) hidden subgroups of the data, check to see that the experiments you are putting together are not from different groups or experimental conditions that should not be mixed together.
5.3) Check the Effect Sizes Table
For each output, you can see how large the effect sizes of each input are for that particular output inside the model. It can be helpful to make a quick check of which inputs have the largest positive and negative effect sizes.
Does it make sense that these ingredients and process parameters have such large effect sizes? If not, you can check the scatter plot of the particular output and particular input whose effect size looks suspect, and see if you have anomalous experiments driving the effect size.
Step 6. Optimization
After making the adjustments to Constraints, Spec, and the training set if needed in Step 5, you can re-run the job with the adjusted settings. See if the resulting measures of model accuracy are better, and check once more if the recommended experiments are sensible next steps for your project.
Once you are reasonably satisfied, you can select from among the recommendations and import them into the main project, and then perform the actual experiments in the lab.