Program Crashes
-Explorer crashes and quits running. If you experience a crash of Explorer, then please note what you were doing during the crash, as this will help debug a potential bug.
File/Data Input/Output
File Opening:
-My Excel or .csv file won't import. If an Excel or .csv file won't open, try saving it as a tab-delimited file, and then import as Excel. There are many ways the formatting of an Excel file can alter parameters during import. Also, make sure there are no features whose values are all missing or are all zeroes. Importing and using such features can be disastrous at run-time.
Data Input:
-The number of features is smaller than the number of columns in the input file. This means that there are columns with blanks for feature names. Replace the blank entries with a valid feature name.
-All the text values for a text feature are removed (empty) after import. First, see the Blog page on input of text-based features and how to remedy issues that may arise due to coding and special characters. Make sure the checkbox "Remove spaces and special symbols on input" is unchecked in the Missing Data Code group of the General Default Settings. Having this checked allows NXG Logic Explorer to remove text from the input file so that only numbers remain. This issue can also occur if there are many more trailing empty records in the input file. To prevent this, open the input file and delete any trailing empty records which may be present at the bottom of the file.
-AZA is not recognized as a column name. This error occurs if an Excel file (.xlsx) has too many columns, since it exceeding the allowable width of an Excel sheet. To input a wide Excel sheet, save it as a .csv (comma-delimited file) and then import the file as a .csv file.
-My integer features appear as continuous. This indicates that there were decimal points "." identified in your feature, or that the maximum number of unique values of the feature exceeded the default value of 10. To see the current setting, go Preferences→Default settings and look for the value under Data Transformations.
-All continuous features are listed as text features. Make sure there is no text like "N/A" entered as a missing code in the general defaults and settings option window.
Filtering and Grouped Analysis
-The sample size for a class during an independence run (T-test, MW, ANOVA) is NaN: NaN implies "not a number." When using Explorer, filtering and grouping do not work together. Filtering records should only be followed by non-grouped analyses such as summary statistics, correlation, or regression after the specified records have been selected. Whereas grouped analysis can be used for 2- and k-group hypothesis testing, summary statistics, regression, and Kaplan-Meier grouped analysis.
In summary: use the following sequence:
Filtering (gender=males): can run summary statistics, correlation, regression, but not independence tests (T-tests, MW, ANOVA, KW) or KM grouped survival analysis.
Grouped analysis (gender): can run summary statistics, independence (T-tests, MW, ANOVA, KW), regression, correlation, and KM survival analysis.
For example, if you want to run ANOVA with only male data, then perform a grouped analysis using gender as the grouping variable, and then run ANOVA using e.g. treatment arm as the categorical factor, and age, BMI, protein expression as continuous features.
Merging files
-Rule-of thumb when merging: Open the file with more repeated measurements per object first, and then merge with files with only one record per object second.
-There is an error that the primary key features are different. This is likely to be a subtle difference like upper case vs lower case or misspelling. Review the case of every character in both keys, and there is likely to be a subtle difference. It's not just spelling, since we have seen that caseness is usually the cause for this error.
Feature Transformations
Super-resolution root MUSIC, Recommendation. Root MUSIC is intended for supervised class prediction, not unsupervised class discovery. If you used super-resolution root MUSIC to collapse all of your input data features to a number of dimensions equal to the number of classes in the data, then this can result in poor quality graphical 2D and 3D plots for class discovery techniques, especially SOM (self-organized maps). It is best to use results of root MUSIC for class prediction, since you will have collapsed all of the structural information within many features down to a dimension which is equal to the number of classes. So for class prediction, either use feature selection, or use root MUSIC to identify new features which are informative.
Independence
Table output
-The mean and/or standard deviation for a group was reported as NaN. NaN means "not a number," implying that, for a given group or class, there were not enough data to generate an average or standard deviation. Recall that the equation for sample standard deviation has n-1 in the denominator. If n=0 for a group, then n-1 will be 0-1=-1, and dividing by -1 will result in a negative result -- and the square root cannot be taken on a negative value. In the case of n=1 for a group, the difference will be 1-1=0, and dividing by 0 is mathematically undefined. So either way, you cannot have n=0 or n=1 for calculation of the sample standard deviation.
Dependency
Multiple Linear Regression
-The standardized residuals don't agree with results from other software packages. NXG Logic Explorer calculates the standardized residuals using the equation e_i,rstd = e_i / sqrt(RMSE). Some packages use the hat leverage residual in the formulation, i.e., e_i,rstd = RMSE*sqrt(1-h_i), where h_i is the hat leverage residual for the ith observation.
Unsupervised Class Discovery
Self-Organizing Maps (SOM)
-There are several objects missing on the scatter plot of objects' SOM map coordinates. Recall that the object locations on the plot take on integer values only, such as 1,2,3,…. Therefore, confirm that an integer map position was assigned to each object by looking in the .csv file that lists the map coordinates for each object. You probably don't see a symbol for each object separately because one or more objects have the same map location - that is, they are sitting on top of one another in the map.
-The feature maps are very low resolution. This can happen if the number of input features used is too small, e.g., <10. If you used super-resolution root MUSIC to collapse all of your input data features to a number of dimensions equal to the number of classes in the data, then this can result in low resolution output maps. It is best to use the original input features for SOM, and the larger the number of features the better.
-The 2D score plot is not generated at the end of a running all methods. The 2D score plot for all methods can only be generated when the global number of nodes (clusters) is set to 2.
-3D score plots are not generfated for UANN and SOM. This is because the UANN method uses a fixed value of two hidden nodes, and SOM is two-dimensional.
-Program throws an exception:
1. Make sure there is nothing entered as a missing code in the general setting and preferences settings.
Hierarchical Cluster Analysis (HCA)
-Whenever I run HCA, the program crashes. This is probably because mean-zero standardization is selected for feature pre-conditioning, and there is one or more rows(columns) whose standard deviation is zero. Switch to normalization for feature transformations, and this should stop occurring.
Text Mining
-When inputting text in an Excel file for text mining, all of the words are grouped together without spaces. This occurs when the "Remove spaces and specials symbols on input" checkbox " of the Missing Data Code groupbox is checked. To uncheck, select Preferences→General Default Settings and uncheck "Remove spaces and specials symbols on input" in the Missing Data Code groupbox.
Supervised Class Prediction
Fisher's Discriminant Analysis (FDA)
-Why is FDA not allowed for Bootstrap Bias or AUC runs? Because the lowest sample size sometimes becomes e.g. 6, and for an e.g. 4-class problem, although all classes can be represented, one or more of the classes could be represented by nc=1 training objects after random object sampling. Since the denominator of the class-specific covariance matrix is equal to nc-1, this will result in division by zero (undefined).
Decision tree classification (DTC)
-I can't generate a plot of node trees when using decision tree classification. In order to generate the node tree plot during a class discovery run using DTC, you must select the checkbox for "Use all samples" in the Performance group of options on the second popup windows that appears after feature selection.
-My treenode plot looks very jagged and uneven, and there are node splits which seem out of place. This is most likely due to the range and scale of the input features. If you transform the same input features used using van der Waerden scores, this issue will likely not occur.
-The first parent node is out of place, or there is a missing or misplaced daughter node. This can occur due to peculiarities of one or more feature values, such as having two classes for which the feature values and cutpoints are similar. To alleviate these issues, you can switch from Gini index to information gain, or vice-versa. In addition, it has been observed that use of one-against all (other) classes instead of all-possible-pairs of class comparisons will usually prevent this from occurring. Recall that DTC classifiers are "monothethic" and therefore only work with one feature at a time -- so if there is a feature that is not informative, DTC can breakdown. Many other classifiers work with all features, i.e., "polythetic," and won't run into the same issues that a DTC can have.
Monte Carlo Analysis
-Betapert distribution: My betapert distribution histogram is very jagged and noisy, and the distribution does not appear to be in agreement with what I want. This is because one of the differences between any pair of parameters values (a,b,c) needs to be greater than 0.5, In other words, a=0.9,b=1.4,c=3 will not work since b-a=0.5 is not greater than 0.5
-There is a large number of zeroes in the histogram or in the output data (either univariate of mixture of densities). This is likely because you have misspelled the name of a distribution in the input file.
-Sensitivity analysis: all of the sensitivities are equal to 1. This means the y-variable (outcome) is a linear combination of all of the x-variables - that is, only addition using ("+") operators were used in your equation. There needs to a multiplication ("*") or division ("/") operation carried out in the objective function in order for the sensitivities to be different from 1.
Licensing
-Why isn't a perpetual license offered? Perpetual licenses increase the upfront licensing costs for customers, which is not a sustainable business model in light of today's ever-decreasing IT budgets. Perpetual licenses also carry additional costs for upgrades. As corporate margins and expenditures for R&D are reduced, IT departments are increasing their awareness of over-paying for unused functionality of software. Perpetual licensing also increases volatility in IT budgets, due to the unplanned episodes of larger capital outlays for perpetual licenses and their associated upgrades.
-Why are only term licenses via subscription available? With term licensing via subscription, IT departments can better plan for lower cost expenditures for its productivity software with less volatility. Additionally, there are no large upfront costs and users are guaranteed access to the most recent version, without paying for incremental upgrade costs.