Running Predictive Analytics On Your Data


When creating a Data Flow, you can easily run predictive analytics on your data sets using Machine Learning functions, without prior knowledge of advanced statistics.

Train and run multiple iterations of predictive models in parallel, evaluate and compare models actively, and select which model you want to save. Then you can re-run your model against new data sets.

Note: To use the Machine Learning feature, binaries must be installed. For more information, see Installing and Configuring TIBCO WebFOCUS DSML Services on the TIBCO WebFOCUS KnowledgeBase.

Procedure: How to Access Predictive Models

After you create a Data Flow, you can select from different model algorithms to run against your data set.

  1. From the WebFOCUS start page, click Application Directories, then click an application, or, from the WebFOCUS Reporting Server browser interface, click an application. Then right-click a data set, and select Flow, as shown in the following image.

    The Data Flow opens, as shown in the following image.

  2. From the side panel, click Models.

    The Train Models panel opens, as shown in the following image.

    The following models display within the Train Models panel: Binary Classification, Regression, and Anomaly Detection.

    Now you can select a model to train and run against your data.

Procedure: How to Train Binary Classification Models

These models predict binary values based on four different algorithms: Random Forest, K-Nearest-Neighbors, Logistic Regression, and Extreme Gradient Boosting.

Note: When running the Binary Classification model algorithm, smaller-sized data files may not generate a model. Larger-sized files are recommended for best results.

  1. Double-click or drag and drop the Binary Classification model to the canvas.

    The Configure dialog box displays, as shown in the following image.

    You can click the Target dropdown menu to select a different target. All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.

  2. Click Apply.

    Your selected model type appears on the data flow canvas, as shown in the following image.

  3. Click the Train and Predict icon to train your model.

    The Compare Model Evaluation Results dialog opens, as shown in the following image.

    The model algorithms run in parallel, allowing you to easily compare results and determine which model is best. You can filter which model comparisons you want to see by selecting or deselecting the model check boxes.

  4. Close the Compare Model Evaluation Results dialog box to return to the canvas.

    Note: To re-open the Compare Model Evaluation Results dialog box, click the Compare icon on the canvas toolbar.

    Your model data displays in the following tabs. You can select different model algorithm options from the model drop-down menu. The best model is selected by default.

    Result. A preview of the first 50 rows of your new data set. Target and predicted columns are highlighted yellow.

    Evaluation. A report that demonstrates the accuracy of the selected model.

    Confusion Matrix. A report that includes the performance metrics and hyperparameter values.

    ROC. The effect of the decision-threshold on various metrics is shown in the ROC curve.

    Precision-Recall. The Precision-Recall curve is independent of the true negatives.

    Feature Importances. The most important features in your data set.

    Note: Feature Importances is available for the Random Forest model only.

    Training Log. A report that includes the performance metrics and hyperparameter values.

Procedure: How to Train Regression Models

These models predict numeric values based on four different regression algorithms: Random Forest, K-Nearest-Neighbors, Polynomial Regression, and Extreme Gradient Boosting.

  1. Double-click or drag and drop the Regression model to the canvas.

    The Configure dialog box displays, as shown in the following image.

    You can click the Target dropdown menu to select a different target. All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.

  2. Click Apply.

    Your selected model type appears on the data flow canvas, as shown in the following image.

  3. Click the Train and Predict icon to train your model.

    The Compare Model Evaluation Results dialog opens, as shown in the following image.

    The model algorithms run in parallel, allowing you to easily compare results and determine which model is best. The best model has the lowest Root Mean Square Error value, and a scatter plot with dots closest to the red line. You can filter which model comparisons you want to see by selecting or deselecting the model check boxes.

  4. Close the Compare Model Evaluation Results dialog box to return to the canvas.

    Note: To re-open the Compare Model Evaluation Results dialog box, click the Compare icon on the canvas toolbar.

    Your model data displays in the following tabs. You can select different model algorithm options from the model drop-down menu. The best model is selected by default.

    Result. A preview of the first 50 rows of your new data set. Target and predicted columns are highlighted yellow.

    Evaluation. A report that demonstrates the accuracy of the selected model.

    Feature Importances. The most important features in your data set.

    Note: Feature Importances is available for the Random Forest model only.

    Logs. A report that includes the performance metrics and hyperparameter values.

Procedure: How to Train Anomaly Detection Models

These models detect anomalies, based on one clustering algorithm: Isolation Forest.

  1. Double-click or drag and drop the Anomaly Detection model to the canvas.

    The Configure dialog box displays, as shown in the following image.

    All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.

  2. Click Apply.

    Your selected model type appears on the data flow canvas, as shown in the following image.

  3. Click the Train and Predict icon to train your model.

    Your model data displays in the following tabs, using the Isolation Forest model algorithm.

    Result. A preview of the first 50 rows of your new data set. Target and predicted columns are highlighted yellow.

    Anomaly Scores. A report that demonstrates the accuracy of the selected model.

    Logs. A report that includes the performance metrics and hyperparameter values.

Procedure: How to Edit Predictive Models

After your model is generated, you can edit your model target, predictors, and hyperparameters. Hyperparameters have default values that are unique to each model.

To edit your model target and predictors, right-click the canvas model node, point to Edit Settings, and then click Target and Predictors.

To edit your model hyperparameters, right-click the canvas model node, point to Edit Settings, point to Hyperparameters, and then click a model algorithm type.

You can also click the Model Editor icon to change targets, predictors, and hyperparameters.

Procedure: How to Save Predictive Models

When training a model, you can save it from the Compare Model Evaluation Results dialog box. After running a model, you can save it from the tabbed panel beneath the canvas. You can then re-run your saved model against new data sets.

  1. Click the Save icon to save your model.

    The Save dialog opens, as shown in the following image.

    You can change the model algorithm, name, or location, and add a description.

  2. Click Save.

    Your model is saved to your selected folder location, as shown in the following image.

    Trained models are saved with evaluation results, logs, and associated files, to run the model at a later time.