skutil.metrics module¶
skutil.metrics houses the pairwise kernel matrix functionality that is built using Cython which behaves similar to scikit-learns pairwise behavior.
- 
class skutil.metrics.GainsStatisticalReport(n_groups=10, n_folds=None, n_iter=None, score_by='lift', iid=True, error_score=nan, error_behavior='warn')[source]¶
- Bases: - object- A class that computes actuarial statistics for scoring predictions given prescribed weighting and loss data. Primarily intended for use with - skutil.h2o.H2OGainsRandomizedSearchCV.- Parameters: - n_groups : int, optional (default=10) - The number of groups to use for lift and gini computations. - score_by : str, optional (default=’lift’) - The metric to return for the - scoremethod.- n_folds : int, optional (default=None) - The number of folds that are being fit. - error_score : float, optional (default=np.nan) - The score to return for a - pd.qcuterror- error_behavior : str, optional (default=’warn’) - One of {‘warn’, ‘raise’, ‘ignore’}. How to handle non-unique bin edges in pd.qcut - Methods - as_data_frame()- Get the summary report of the fold fits in the form of a pd.DataFrame. - fit_fold(pred, expo, loss[, prem, store])- Used to fit a single fold of predicted values, exposure and loss data. - score(_, pred, \*\*kwargs)- Scores the new predictions on the truth set, and stores the results in the internal stats array. - score_no_store(_, pred, \*\*kwargs)- Scores the new predictions on the truth set, and does not store the results in the internal stats array. - 
as_data_frame()[source]¶
- Get the summary report of the fold fits in the form of a pd.DataFrame. - Returns: - df : pd.DataFrame - A dataframe of summary statistics for each fold 
 - 
fit_fold(pred, expo, loss, prem=None, store=True)[source]¶
- Used to fit a single fold of predicted values, exposure and loss data. - Parameters: - pred : 1d H2OFrame, pd.DataFrame, np.ndarray - The array of predictions - expo : 1d H2OFrame, pd.DataFrame, np.ndarray - The array of exposure values - loss : 1d H2OFrame, pd.DataFrame, np.ndarray - The array of loss values - prem : 1d H2OFrame, pd.DataFrame, np.ndarray, optional (default=None) - The array of premium values. If None, is equal to the - expoparameter.- store : bool, optional (default=True) - Whether or not to store the results of the scoring procedure. This is set to false when calling - score, which is intended for test data.- Returns: - self : 
 - 
score(_, pred, **kwargs)[source]¶
- Scores the new predictions on the truth set, and stores the results in the internal stats array. - Parameters: - _ : H2OFrame, np.ndarray - The truth set - pred : H2OFrame, np.ndarray - The predictions - Returns: - scr : float - The score (lift/gini) for the new predictions 
 - 
score_no_store(_, pred, **kwargs)[source]¶
- Scores the new predictions on the truth set, and does not store the results in the internal stats array. - Parameters: - _ : H2OFrame, np.ndarray - The truth set - pred : H2OFrame, np.ndarray - The predictions - Returns: - scr : float - The score (lift/gini) for the new predictions 
 
- 
- 
skutil.metrics.check_X_y(X, y, accept_sparse=None, dtype='numeric', order=None, copy=False, force_all_finite=True, ensure_2d=True, allow_nd=False, multi_output=False, ensure_min_samples=1, ensure_min_features=1, y_numeric=False, warn_on_dtype=False, estimator=None)[source]¶
- Input validation for standard estimators. - Checks X and y for consistent length, enforces X 2d and y 1d. Standard input checks are only applied to y, such as checking that y does not have np.nan or np.inf targets. For multi-label y, set multi_output=True to allow 2d and sparse y. If the dtype of X is object, attempt converting to float, raising on failure. - Parameters: - X : nd-array, list or sparse matrix - Input data. - y : nd-array, list or sparse matrix - Labels. - accept_sparse : string, list of string or None (default=None) - String[s] representing allowed sparse matrix formats, such as ‘csc’, ‘csr’, etc. None means that sparse matrix input will raise an error. If the input is sparse but not in the allowed format, it will be converted to the first listed format. - dtype : string, type, list of types or None (default=”numeric”) - Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list. - order : ‘F’, ‘C’ or None (default=None) - Whether an array will be forced to be fortran or c-style. - copy : boolean (default=False) - Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion. - force_all_finite : boolean (default=True) - Whether to raise an error on np.inf and np.nan in X. This parameter does not influence whether y can have np.inf or np.nan values. - ensure_2d : boolean (default=True) - Whether to make X at least 2d. - allow_nd : boolean (default=False) - Whether to allow X.ndim > 2. - multi_output : boolean (default=False) - Whether to allow 2-d y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True. - ensure_min_samples : int (default=1) - Make sure that X has a minimum number of samples in its first axis (rows for a 2D array). - ensure_min_features : int (default=1) - Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when X has effectively 2 dimensions or is originally 1D and - ensure_2dis True. Setting to 0 disables this check.- y_numeric : boolean (default=False) - Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms. - warn_on_dtype : boolean (default=False) - Raise DataConversionWarning if the dtype of the input data structure does not match the requested dtype, causing a memory copy. - estimator : str or estimator instance (default=None) - If passed, include the name of the estimator in warning messages. - Returns: - X_converted : object - The converted and validated X. - y_converted : object - The converted and validated y. 
- 
skutil.metrics.check_array(array, accept_sparse=None, dtype='numeric', order=None, copy=False, force_all_finite=True, ensure_2d=True, allow_nd=False, ensure_min_samples=1, ensure_min_features=1, warn_on_dtype=False, estimator=None)[source]¶
- Input validation on an array, list, sparse matrix or similar. - By default, the input is converted to an at least 2D numpy array. If the dtype of the array is object, attempt converting to float, raising on failure. - Parameters: - array : object - Input object to check / convert. - accept_sparse : string, list of string or None (default=None) - String[s] representing allowed sparse matrix formats, such as ‘csc’, ‘csr’, etc. None means that sparse matrix input will raise an error. If the input is sparse but not in the allowed format, it will be converted to the first listed format. - dtype : string, type, list of types or None (default=”numeric”) - Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list. - order : ‘F’, ‘C’ or None (default=None) - Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array. - copy : boolean (default=False) - Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion. - force_all_finite : boolean (default=True) - Whether to raise an error on np.inf and np.nan in X. - ensure_2d : boolean (default=True) - Whether to make X at least 2d. - allow_nd : boolean (default=False) - Whether to allow X.ndim > 2. - ensure_min_samples : int (default=1) - Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check. - ensure_min_features : int (default=1) - Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and - ensure_2dis True. Setting to 0 disables this check.- warn_on_dtype : boolean (default=False) - Raise DataConversionWarning if the dtype of the input data structure does not match the requested dtype, causing a memory copy. - estimator : str or estimator instance (default=None) - If passed, include the name of the estimator in warning messages. - Returns: - X_converted : object - The converted and validated X. 
- 
skutil.metrics.exponential_kernel(X, Y=None, sigma=1.0)[source]¶
- The - exponential_kernelis closely related to the- gaussian_kernel, with only the square of the norm left out. It is also an- rbf_kernel. Note that the adjustable parameter,- sigma, plays a major role in the performance of the kernel and should be carefully tuned. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.- The kernel is given by: \(k(x, y) = exp( -||x-y|| / 2\sigma^2 )\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- sigma : float, optional (default=1.0) - The exponential tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.gaussian_kernel(X, Y=None, sigma=1.0)[source]¶
- The - gaussian_kernelis closely related to the- exponential_kernel. It is also an- rbf_kernel. Note that the adjustable parameter,- sigma, plays a major role in the performance of the kernel and should be carefully tuned. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.- The kernel is given by: \(k(x, y) = exp( -||x-y||^2 / 2\sigma^2 )\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- sigma : float, optional (default=1.0) - The exponential tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.inverse_multiquadric_kernel(X, Y=None, constant=1.0)[source]¶
- The - inverse_multiquadric_kernel, as with the- gaussian_kernel, results in a kernel matrix with full rank (Micchelli, 1986) and thus forms an infinite dimension feature space.- The kernel is given by: \(k(x, y) = 1 / sqrt( -||x-y||^2 + c^2 )\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- constant : float, optional (default=1.0) - The linear tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.laplace_kernel(X, Y=None, sigma=1.0)[source]¶
- The - laplace_kernelis completely equivalent to the- exponential_kernel, except for being less sensitive for changes in the- sigmaparameter. Being equivalent, it is also an- rbf_kernel.- The kernel is given by: \(k(x, y) = exp( -||x-y|| / \sigma )\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- sigma : float, optional (default=1.0) - The exponential tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.linear_kernel(X, Y=None, constant=0.0)[source]¶
- The - linear_kernelis the simplest kernel function. It is given by the inner product <x,y> plus an optional- constantparameter. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with a- linear_kernelis the same as standard PCA.- The kernel is given by: \(k(x, y) = x^Ty + c\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- constant : float, optional (default=0.0) - The linear tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.multiquadric_kernel(X, Y=None, constant=0.0)[source]¶
- The - multiquadric_kernelcan be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.- The kernel is given by: \(k(x, y) = sqrt( -||x-y||^2 + c^2 )\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- constant : float, optional (default=0.0) - The linear tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.polynomial_kernel(X, Y=None, alpha=1.0, degree=1.0, constant=1.0)[source]¶
- The - polynomial_kernelis a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized. Adjustable parameters are the slope (- alpha), the constant term (- constant), and the polynomial degree (- degree).- The kernel is given by: \(k(x, y) = ( \alpha x^Ty + c)^d\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- alpha : float, optional (default=1.0) - The slope tuning parameter. - degree : float, optional (default=1.0) - The polynomial degree tuning parameter. - constant : float, optional (default=1.0) - The linear tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.power_kernel(X, Y=None, degree=1.0)[source]¶
- The - power_kernelis also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.- The kernel is given by: \(k(x, y) = -||x-y||^d\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- degree : float, optional (default=1.0) - The polynomial degree tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.rbf_kernel(X, Y=None, sigma=1.0)[source]¶
- The - rbf_kernelis closely related to the- exponential_kerneland- gaussian_kernel. Note that the adjustable parameter,- sigma, plays a major role in the performance of the kernel and should be carefully tuned. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.- The kernel is given by: \(k(x, y) = exp(- \gamma * ||x-y||^2)\)- where: \(\gamma = 1/( \sigma ^2)\)- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- sigma : float, optional (default=1.0) - The exponential tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html 
- 
skutil.metrics.spline_kernel(X, Y=None)[source]¶
- Thespline_kernelis given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).The kernel is given by: \(k(x, y) = 1 + xy + xy * min(x,y) - (1/2 * (x+y)) * min(x,y)^2 + 1/3 * min(x,y)^3\)Parameters: X : array_like (float), shape=(n_samples, n_features) The array of pandas DataFrame on which to compute the kernel. If Yis None, the kernel will be computed withX.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None)
- The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.
 Returns: res : float The result of the kernel computation. 
- 
skutil.metrics.tanh_kernel(X, Y=None, constant=0.0, alpha=1.0)[source]¶
- The - tanh_kernel(Hyperbolic Tangent Kernel) is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as an activation function for artificial neurons.- The kernel is given by: \(k(x, y) = tanh (\alpha x^T y + c)\)- It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice. - There are two adjustable parameters in the sigmoid kernel, the slope - alphaand the intercept- constant. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in the works by Hsuan-Tien and Chih-Jen.- Parameters: - X : array_like (float), shape=(n_samples, n_features) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- Y : array_like (float), shape=(n_samples, n_features), optional (default=None) - The array of pandas DataFrame on which to compute the kernel. If - Yis None, the kernel will be computed with- X.- constant : float, optional (default=0.0) - The linear tuning parameter. - alpha : float, optional (default=1.0) - The slope tuning parameter. - Returns: - c : float - The result of the kernel computation. - References - Souza, Cesar R., Kernel Functions for Machine Learning Applications http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html