Moving on to the next topic – mostly related to data processing. It is important to understand that data processing and data science are two separate yet related entities. Data processing is almost critical to maturation of data science.
We previously identified two separate classes of data based decisions.
- “Discover” or understand data: This group requires somewhat traditional approaches to data processing. Generally speaking, data have to be sourced from a wide variety of applications and/or systems. These data tend to be in a wide array of formats (but tends to be mostly structured data). These formats make it difficult to process data. In the past, data warehouses were typically used for data discovery. Now with Big Data, a wider variety of toolsets are available for data processing.
- Decisions that repeat: This type of decision requires slightly different approach to data processing. Generally reporting/monitoring and alerting tools are required and should be used for repeating decisions based on well understood data. However, data warehouses/data lakes or other architectural approaches can be used as well. These type of decisions are also based on data in motion (as opposed to data at rest).
With this basic difference in data processing and data science in mind, it will be interesting to figure out data science approaches and what can be done to fulfill the promise of pure data based decision making.
I will summarize the data science segments (and a few solutions) in the next post. Stay tuned….