NXG Logic Explorer: Knowledge Discovery and Statistical Analysis Package
NXG Logic Explorer is a Windows machine learning package for data analytics, predictive analytics, unsupervised class discovery, supervised class prediction, and simulation.
Benefits
Explorer leverages several technologies to substantially improve your productivity by reducing the time required for performing many procedures. Identify novel patterns in exploratory datasets, quickly analyze a set of data for hypothesis testing, perform simulations, and perform text mining to extract meaningful ideas in data. Benefits for using Explorer include:
- Automatic de-stringing of messy Excel input files
- Perform parallel feature analysis (PFA) by simultaneously generating summary statistics, Shapiro-Wilk tests, histograms, and count frequencies for multiple continuous and categorical variables.
- Simultaneously run ANOVA, Welch ANOVA, chi-squared, and Bartlett's test on multiple variables.
- Automatically generate multi-variable linear, logistic, and Cox PH regression models based on a default p-value criterion for filtering from univariate models.
- Class discovery: Simultaneously run numerous unsupervised clustering and linear/non-linear dimensional reduction methods on many input variables, and generate score plots for each method.
- Perform feature selection using all-possible pairs (APP) of classes or one-against all (OAA) classes while using cross-validation to partition the test data set.
- Class prediction: Simultaneously run numerous supervised classification analyses on selected features using CV, and generate ROC curves based on bootstrap-bias.
- Fit 24 probability distributions against correlated input variables, and simulate new correlated datasets for the best fitting distribution.
Machine Learning and AI Methods
The machine learning approaches available in Explorer include:
- Crisp K-means cluster (class discovery module)
- Fuzzy-K-means cluster (class discovery module)
- Unsupervised neural gas (class discovery module)
- Gaussian mixture models (class discovery module)
- Unsupervised random forests (class discovery module)
- Kernel-based PCA (class discovery module)
- Kernel Gaussian radial basis function PCA (class discovery module)
- Kernel Tanimoto distance-based PCA (class discovery module)
- Diffusion maps (class discovery module)
- Localized linear embeddings (class discovery module)
- Laplacian eigenmaps (class discovery module)
- Locally preserved projections (class discovery module)
- Stochastic neighbor embedding (class discovery module)
- Sammon mapping (class discovery module)
- Decision tree classification (class prediction module)
- Supervised random forests (class prediction module)
- K-nearest neighbor (class prediction module)
- Learning vector quantization (class prediction module)
- Support vector machines (class prediction module)
- Kernel regression (class prediction module)
- Supervised neural gas (class prediction module)
- Mixture of experts (class prediction module)
With regard to AI, there are plenty of artificial intelligence methods incorporated into Explorer, and these include:
- Kohonen networks, or self-organizing maps (class discovery module)
- Unsupervised artificial neural networks (class discovery module)
- Supervised artificial neural networks (SANN)
- Swarm intelligence (class prediction module)
Activation functions for SANN include:
- Identity
- Logistic
- Softmax
- tanh
- Hermite
- Laguerre
- Exponential
- RBFN
SANN back-propagation learning includes:
SANN Connection weight updates include:
The SANN objective function can be:
Output-side functions include:
- Identity
- Logistic
- Softmax
- tanh
Screenshots
Online User's Guide
Features
- Filter: Use logical (if statements) to select specific observations of your dataset for analyses.
- Summarize: Perform summary statistics on variables including mean, standard deviation, minimum, maximum, range, quartiles (25th, 50th, 75th percentiles), skewness, kurtosis, and Shapiro-Wilk normality test.
- Transformation: z-score, log, quantile, rank, percentile, van der Waerden, rank, nominal to binary, binarization (values >0 coded as 1), super-resolution root MUSIC, fuzzification, fast wavelet transform (FWT).
- Association: Covariance, Pearson, Spearman, Euclidean, Manhattan, Chebychev, Canberra, Tanimoto (binary, continuous), and Association rules.
- Independence: Equality of means tests for 2- and k-group parametric and non-parametric t-tests, Welch t-test, Mann-Whitney, ANOVA, Welch ANOVA test, Kruskal-Wallis, test for two proportions.
- Paired tests: Paired t-test, Wilcoxon signed rank.
- Multiway chi-squared contingency tables: 2x2, rxc, McNemar's test.
- Regression: Multiple linear, multivariate, binary logistic, polytomous logistic, Poisson (additive, multiplicative, power). Rapid selection of all-possible pairs of first-order interaction terms for all regression methods.
- Linear regression diagnostics: residual, standardized residual, Studentized, deletion, Cook's D, leverage, DFFITS, DFBETAS.
- Multivariate tests: Wilks' lambda, Pillai trace, Lawley-Hotelling, Roy's greatest root.
- Logistic regression diagnostics: standardized Pearson, deviance, leverage, Hosmer-Lemeshow GOF table.
- Poisson regression models: additive, muliplicative, power (geometric), automatic determination of optimal power model.
- Poisson regression diagnostics: Pearson, deviance, adjusted, leverage, DFBETAS.
- Longitudinal regression: Generalized estimating equations (GEE) and Kernel-based longitudinal regression using a Gaussian link, with selection of exchangeable, autoregressive, independent, and unstructured correlation patterns.
- Survival: Kaplan-Meier grouped analysis and Cox proportional hazards regression.
- Cox PH regression diagnostics: Cox-Snell residuals, Nelson-Aalen cumulative hazards, plots for PH assumption (stratum-specific baseline hazards), tests of PH assumption, Martingale residuals, deviance residuals, Schoenfeld and scaled Schoenfeld residuals.
- Automatic regression model building strategy (Auto-MBS): Univariate regression models with p-values below a threshold criterion (e.g., P=0.25) are automatically employed in multiple variable models. The threshold value can be permanently set for all future runs. Auto-MBS only applies to linear, binary logistic, and Cox PH regression. For linear models, auto-MBS can be employed for models with and without a constant term.
- Automatic regression runs without outliers: Multiple linear and multiple binary logistic models are automatically rerun without observations that are overly influencial (outliers).
- Automatic regression-based feature selection: Once the multiple variable regression model is run, a threshold p-value (e.g., P=0.05) can be set to store a list of significant predictors in memory for use in other models (e.g., clustering, class prediction, etc.).
- Text mining: Text mining and sentiment analysis with concept vector document clustering based on stemming/stopping and N-grams with hierarchical cluster analysis of results. Download PubMed abstracts for biomedical text mining studies.
- Pattern/class discovery and non-linear manifold learning: crisp K-means cluster (CKM), fuzzy-K-means cluster (FKM), self-organizing maps (SOM), unsupervised neural gas (UNG), Gaussian mixture models (GMM), unsupervised random forests (URF), covariance/correlation-based principal components analysis (CPCA), kernel distance-based PCA (KDPCA), kernel Gaussian radial basis function PCA (KGPCA), kernel Tanimoto distance-based PCA (KTPCA), diffusion maps (DM), localized linear embeddings (LLE), Laplacian eigenmaps (LEM), locally preserved projections (LPP), unsupervised artificial neural networks (UANN), stochastic neighbor embedding (t-SNE), Sammon mapping (Sammon), non-negative matrix factorization (NMF), classic multidimensional scaling (CMDS), non-metric multidimensional scaling (NMMDS), and hierarchical cluster analysis (HCA).
- Component subtraction (denoising and decorrelation): Remove noise from data matrix using Marčenko-Pastur law for limit distribution of eigenvalues, remove correlation based on first principal component or multiple components.
- Feature selection: Cross-validation-based all possible pairwise (APP) groupwise selection, one against all (OAA) groupwise selection, all at once (AAO) groupwise selection. Hypothesis tests include Student's t-test, Mann-Whitney test, Gini index, information gain (entropy), as well as the greedy-plus-take-away sequential forward-reverse method using Mahalanobis distance and F-to-enter and F-to-remove to minimize the "nesting" problem.
- Class prediction: linear regression (LREG), decision tree classification (DTC) -- similar to CART (Classification and Regression Trees), supervised random forests (SRF), K-nearest neighbor (KNN), naive Bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Fisher discriminant analysis (FDA), learning vector quantization (LVQ), polytomous logistic regression (PLOG), gradient ascent support vector machines (SVMGA), least squares support vector machines (SVMLS), supervised artificial neural networks (SANN), kernel regression (KREG), particle swarm optimization (PSO), supervised neural gas (SNG), and mixture of experts (MOE).
- Cross validation: Bootstrap, k-fold, leave one out (LOOCV).
- Transformations inside of CV training folds: In order to reduce information leakage, Explorer will perform feature normalization or mean-zero standardization over all objects inside of training folds, and apply learned parameters to feature transform objects in the test fold.
- Class prediction performance: Sensitivity/specificity, kappa vs. error (classifier diversity), receiver operator characteristic (ROC) curves, ROC area under the curve comparisons for all pairwise 2-class comparisons, as well as average AUC.
- Simulation of correlated data: Fit probability distributions to correlated data and simulate correlated quantiles from the following distributions: Beta, BetaPERT, Binomial, Cauchy, Chi-squared, Dagum(4-parameter), Discrete (categorical), Erlang, Exponential, F-ratio, Gamma(Erlang), Generalized extreme value, Gumbel, Geometric, Laplace, Logistic, Log-normal, Negative binomial, Normal, Pareto, Poisson, Power, Rayleigh, Stable (Levy), Student's t, Triangle, Uniform, and Weibull.
- Monte Carlo uncertainty analysis: Specify objective function and correlation matrix for multiple features having different distributions, assess sensitivity of each feature on outcome variable.
- Monte Carlo cost analysis: Store run parameters for distributions, correlations, etc. in Excel file for easy editing and input.
System Requirements
Windows 7, 8, 10 (16GB RAM)