Analytics is Width. Feature Selection is Depth.


Most organizations don't focus on Data Science or AI or Machine Learning as a single discipline - they group it together with the entire Analytics function. This includes everything from spreadsheets to Relation data, from documents stored in multiple locations to the structured business data in standard operations. While you might view your team independently, the business doesn't. They just want answers from data.

The first need in an organization is to define the data they create, acquire and store. With that data defined, the organization can begin to analyze it to make decisions. From there, many organizations move to categorization and predictive work with the data - Data Science.

The broad, historical analysis of data is the purview of Business Intelligence (BI). BI seeks to gather as much data as possible, from as many sources as it can, to create groups of pre-aggregated information the user can evaluate for patterns or more information in a given area based on various vectors. The key for success here is having a large width of data - data about stock, costs, sales, locations, whatever the organization interfaces with. It can even be beneficial to include seemingly non-business data such as weather, economic and geopolitical information and the like.

In Machine Learning (ML), you might start with a "wide" set of data, but you need to move quickly to only the variables (Features) that will categorize or predict the target you want. There are a few methods of doing this, with the two main factors being the time it takes to compute and the quality of the results. As you can imagine, these are often inverses of each other. This exercise in ML is called "Feature Selection" (or variable, attribute, or variable subset selection) and uses three main methods:

  1. Filter: Filter methods are based on standard statistical formulae which try to get as close a correlation value as possible to the target (If there are dark clouds, it will rain). These methods, including Pearson’s Correlation, Linear discriminant analysis (LDA), Analysis of variance (ANOVA), and Chi-Square, are computationally inexpensive but can be misleading, and result in the correlation-equals-causation errors - not all dark clouds produce rain. Still, they form a good "first pass" or check value on your work.
  2. Wrappers: Wrapper methods are a kind of step-wise search program that adds or subtracts features to improve the model, keeping the features that improve the model results, and removing the ones that don't. A few examples of wrapper methods are Forward Selection, Backward Elimination and Recursive Feature elimination.
  3. Embedded: Embedded methods include the advantages of both Filters and Wrappers. These are often placed directly within the model creation process and include the LASSO method, Bolasso, Elastic net regularization, FeaLect, and Recursive Feature Elimination.

Whichever method is used, once the desired features are identified, you need a large Depth of data rather than width. The more pertinent examples you can provide (for supervised models in particular, with accurate, specific label values) the better the model accuracy.

So it's a combination of data that you need - width for general analysis, and then depth for predictive and classification models. As you work together with other analytic teams, it's important to keep this data distribution in mind.

 

Comments (0)

Skip to main content