Now that we have reviewed the basics of data driven decision making categories and have discussed a few differences about how data science will require data processing, we are ready to jump into smaller subset of data mining techniques that are foundational to the data science process.
Following are brief descriptions of data mining techniques:
- Regression or Estimation: Generally you would use regression to predict value of a variable (such as readmission probability for a patient). This technique is quite useful when you are trying to predict one trustworthy value for a variable.
- Similarity matching: Often used to match an individual or group with another individual or group given a finite set of dimensional and measurable attributes. A lot of times organizations can use this to identify customer groups or peer groups
- Classification: This technique is useful when you are attempting to segment or categorize a population of candidates/things. Generally used by marketers to identify positioning and targeting of segments.
- Clustering: There is a fundamental difference between similarity matching (which is for a specific purpose) and clustering (typically used for identifying “natural” groups)
There are additional techniques that are less commonly used such as co-occurrence grouping, profiling, link prediction etc. but more on that in the next post.
Stay tuned and find out how you can implement these techniques to quickly create data based decisions…