--- # required metadata title: "rx_predict: Scores using a Microsoft ML Machine Learning model" description: "Reports per-instance scoring results in a data frame or revoscalepy data source using a trained Microsoft ML Machine Learning model with arevoscalepydata source." keywords: "models, prediction" author: WilliamDAssafMSFT ms.author: wiassaf ms.date: 07/15/2019 ms.topic: "reference" ms.prod: "sql" ms.technology: "machine-learning-services" ms.service: "" ms.assetid: "" # optional metadata ROBOTS: "" audience: "" ms.devlang: "Python" ms.reviewer: "" ms.suite: "" ms.tgt_pltfrm: "" ms.custom: "" monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15" --- # *microsoftml.rx_predict*: Scores using a Microsoft machine learning model ## Usage ``` microsoftml.rx_predict(model, data: typing.Union[revoscalepy.datasource.RxDataSource.RxDataSource, pandas.core.frame.DataFrame], output_data: typing.Union[revoscalepy.datasource.RxDataSource.RxDataSource, str] = None, write_model_vars: bool = False, extra_vars_to_write: list = None, suffix: str = None, overwrite: bool = False, data_threads: int = None, blocks_per_read: int = None, report_progress: int = None, verbose: int = 1, compute_context: revoscalepy.computecontext.RxComputeContext.RxComputeContext = None, **kargs) ``` ## Description Reports per-instance scoring results in a data frame or revoscalepy data source using a trained Microsoft ML Machine Learning model with arevoscalepydata source. ## Details The following items are reported in the output by default: scoring on three variables for the binary classifiers: PredictedLabel, Score, and Probability; the Score for oneClassSvm and regression classifiers; PredictedLabel for Multi-class classifiers, plus a variable for each category prepended by the Score. ## Arguments ### model A model information object returned from a microsoftml model. For example, an object returned from `rx_fast_trees` or `rx_logistic_regression`. ### data A [revoscalepy](/machine-learning-server/python-reference/revoscalepy/index) data source object, a data frame, or the path to a `.xdf` file. ### output_data Output text or xdf file name or an `RxDataSource` with write capabilities in which to store transformed data. If *None*, a data frame is returned. The default value is *None*. ### write_model_vars If `True`, variables in the model are written to the output data set in addition to the scoring variables. If variables from the input data set are transformed in the model, the transformed variables are also included. The default value is `False`. ### extra_vars_to_write `None` or character vector of additional variables names from the input data to include in the `output_data`. If `write_model_vars` is `True`, model variables are included as well. The default value is `None`. ### suffix A character string specifying suffix to append to the created scoring variable(s) or `None` in there is no suffix. The default value is `None`. ### overwrite If `True`, an existing `output_data` is overwritten; if `False` an existing `output_data` is not overwritten. The default value is `False`. ### data_threads An integer specifying the desired degree of parallelism in the data pipeline. If *None*, the number of threads used is determined internally. The default value is *None*. ### blocks_per_read Specifies the number of blocks to read for each chunk of data read from the data source. ### report_progress An integer value that specifies the level of reporting on the row processing progress: * `0`: no progress is reported. * `1`: the number of processed rows is printed and updated. * `2`: rows processed and timings are reported. * `3`: rows processed and all timings are reported. The default value is `1`. ### verbose An integer value that specifies the amount of output wanted. If `0`, no verbose output is printed during calculations. Integer values from `1` to `4` provide increasing amounts of information. The default value is `1`. ### compute_context Sets the context in which computations are executed, specified with a valid revoscalepy.RxComputeContext. Currently local and [revoscalepy.RxInSqlServer](/machine-learning-server/python-reference/revoscalepy/RxInSqlServer) compute contexts are supported. ### kargs Additional arguments sent to compute engine. ## Returns A data frame or an [revoscalepy.RxDataSource](/machine-learning-server/python-reference/revoscalepy/RxDataSource) object representing the created output data. By default, output from scoring binary classifiers include three variables: `PredictedLabel`, `Score`, and `Probability`; `rx_oneclass_svm` and regression include one variable: `Score`; and multi-class classifiers include `PredictedLabel` plus a variable for each category prepended by `Score`. If a `suffix` is provided, it is added to the end of these output variable names. ## See also [`rx_featurize`](rx-featurize.md), [revoscalepy.rx_data_step](/machine-learning-server/python-reference/revoscalepy/rx-data-step), [revoscalepy.rx_import](/machine-learning-server/python-reference/revoscalepy/rx-import). ## Binary classification example ``` ''' Binary Classification. ''' import numpy import pandas from microsoftml import rx_fast_linear, rx_predict from revoscalepy.etl.RxDataStep import rx_data_step from microsoftml.datasets.datasets import get_dataset infert = get_dataset("infert") import sklearn if sklearn.__version__ < "0.18": from sklearn.cross_validation import train_test_split else: from sklearn.model_selection import train_test_split infertdf = infert.as_df() infertdf["isCase"] = infertdf.case == 1 data_train, data_test, y_train, y_test = train_test_split(infertdf, infertdf.isCase) forest_model = rx_fast_linear( formula=" isCase ~ age + parity + education + spontaneous + induced ", data=data_train) # RuntimeError: The type (RxTextData) for file is not supported. score_ds = rx_predict(forest_model, data=data_test, extra_vars_to_write=["isCase", "Score"]) # Print the first five rows print(rx_data_step(score_ds, number_rows_read=5)) ``` Output: ``` Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off. Beginning processing data. Rows Read: 186, Read Time: 0, Transform Time: 0 Beginning processing data. Beginning processing data. Rows Read: 186, Read Time: 0.001, Transform Time: 0 Beginning processing data. Beginning processing data. Rows Read: 186, Read Time: 0.001, Transform Time: 0 Beginning processing data. Using 2 threads to train. Automatically choosing a check frequency of 2. Auto-tuning parameters: maxIterations = 8064. Auto-tuning parameters: L2 = 2.666837E-05. Auto-tuning parameters: L1Threshold (L1/L2) = 0. Using best model from iteration 590. Not training a calibrator because it is not needed. Elapsed time: 00:00:00.6058289 Elapsed time: 00:00:00.0084728 Beginning processing data. Rows Read: 62, Read Time: 0, Transform Time: 0 Beginning processing data. Elapsed time: 00:00:00.0302359 Finished writing 62 rows. Writing completed. Rows Read: 5, Total Rows Processed: 5, Total Chunk Time: 0.001 seconds isCase PredictedLabel Score Probability 0 False True 0.576775 0.640325 1 False False -2.929549 0.050712 2 True False -2.370090 0.085482 3 False False -1.700105 0.154452 4 False False -0.110981 0.472283 ``` ## Regression example ``` ''' Regression. ''' import numpy import pandas from microsoftml import rx_fast_trees, rx_predict from revoscalepy.etl.RxDataStep import rx_data_step from microsoftml.datasets.datasets import get_dataset airquality = get_dataset("airquality") import sklearn if sklearn.__version__ < "0.18": from sklearn.cross_validation import train_test_split else: from sklearn.model_selection import train_test_split airquality = airquality.as_df() ###################################################################### # Estimate a regression fast forest # Use the built-in data set 'airquality' to create test and train data df = airquality[airquality.Ozone.notnull()] df["Ozone"] = df.Ozone.astype(float) data_train, data_test, y_train, y_test = train_test_split(df, df.Ozone) airFormula = " Ozone ~ Solar_R + Wind + Temp " # Regression Fast Forest for train data ff_reg = rx_fast_trees(airFormula, method="regression", data=data_train) # Put score and model variables in data frame score_df = rx_predict(ff_reg, data=data_test, write_model_vars=True) print(score_df.head()) # Plot actual versus predicted values with smoothed line # Supported in the next version. # rx_line_plot(" Score ~ Ozone ", type=["p", "smooth"], data=score_df) ``` Output: ``` 'unbalanced_sets' ignored for method 'regression' Not adding a normalizer. Making per-feature arrays Changing data from row-wise to column-wise Beginning processing data. Rows Read: 87, Read Time: 0.001, Transform Time: 0 Beginning processing data. Warning: Skipped 4 instances with missing features during training Processed 83 instances Binning and forming Feature objects Reserved memory for tree learner: 22620 bytes Starting to train ... Not training a calibrator because it is not needed. Elapsed time: 00:00:00.0390764 Elapsed time: 00:00:00.0080750 Beginning processing data. Rows Read: 29, Read Time: 0.001, Transform Time: 0 Beginning processing data. Elapsed time: 00:00:00.0221875 Finished writing 29 rows. Writing completed. Solar_R Wind Temp Score 0 290.0 9.2 66.0 33.195541 1 259.0 15.5 77.0 20.906796 2 276.0 5.1 88.0 76.594643 3 139.0 10.3 81.0 31.668842 4 236.0 14.9 81.0 43.590839 ```