View Correlations

Before exploring your imported data on the platform, taking a look at potential correlations between different entities might accelerate potential findings. Correlations between can be found easier when accessing the “View Correlations” tab in “Visualize”.

In order to discover potential correlations, navigate to the View Correlations page from the Visualize tab.

Multiple options arise in a first instance, which represent the different cases to analyze:

– Inputs/Outputs: See correlations between the selected inputs and outputs

– Calculations/Outputs: If any calculations such as price or overall cost are set within the project, those can be selected for further analysis with respective outputs

– Outputs/Outputs: observe correlations between different outputs

– Component Outputs/Outputs: If you have an intermediate recipe in your recipe, and that has a measurement you’d like to see if it relates to a full formula output, you can view these. For instance a measured polymer property vs a formulated product property.

To see correlations between two dimensions, select one input from the dropdown menu or type the first letters of it to limit search time. If you want to check all relevant outputs of the active spec for correlations, click the respective button under “Select Outputs”. To select all possible outputs, enter the type bar and select all outputs by clicking on them.

These plots are called “Sankey Diagrams”. The platform automatically looks for stronger trends in your data, and then presents relationships with higher R (correlation) values and higher sample size. 


Sankey diagrams don’t have a X or Y axis, they are merely connecting nodes to denote relationships. So lines don’t go up or down. What is important is just the connection between the node on the left (aka an input/ingredient/calculations), and an output. 


Blue lines indicate a positive trend – so the higher the node on the left goes, the higher the node on the right tends to go.


Red lines indicate a negative trend – so the higher the node on the left goes, the lower the node that is it connected to on the right tends to go.  

The thicker the line, the stronger the relationship.

When any output is selected, several inputs are shown with their strength of correlation on screen. Please note that only correlations greater than 0.5 with a p-value less than 0.005 are shown. Additionally, the figure does take the additional parameter of “Minimum Samples” into account. When clicking on one on the lines, you are directly transferred to the “Explore Data” tab to continue your analysis. A second tab “Correlation Matrix” is present for a quantitative view of the results.

Scrolling further down, a list of all correlations bound to the given dimensions is available. The respective plot (the same as when clicking on the lines) is shown when clicking the “View” button.

Please note that several options can be set to increase the possibility of finding representative results and avoid misleading correlations up from the start.

Minimum Samples: taking a larger number or samples sets a threshold to data samples with a lower number of formulations, yet high degree of correlation. With lower quantities, the chance to reach extreme correlations is higher as each formulation has more weight in the calculation

Correlation Threshold: setting a higher correlation threshold will exclude not necessary relevant correlations to get a better overview.

Another possibility to get a general overview is the “Show Outputs Without Conditions” option, accessible via the cogwheel on the right hand side. When selected, all output dimensions are treated without any conditions, limiting the number of correlations shown. 

Lastly, the grade of correlation (linear, quadratic, logarithmic) as well as the outlier inclusion can be changed in “Advanced Options”, an additional choice of the mentioned cogwheel.

Updated on November 11, 2021

Was this article helpful?

Related Articles