According to our sources, big data management and analytics has become more strategic. This is driven further by digital transformation initiatives, efforts to understand data for a competitive advantage, and even moves to monetise data assets.
We’ve all heard that ‘Data is the new oil’, but raw data does not have the natural value of oil. It is only when data is prepared and cleaned that the value becomes apparent.
Clive Humby, UK Mathematician and architect of Tesco’s Clubcard put it perfectly when he said:
“Like oil, data is valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity. so, must data be broken down, analysed for it to have value.”
With that in mind it is worth us understanding some of the key data trends for 2022.
Here we discuss just FIVE…
1. Analysing Data Across Multiple Clouds
94% of organisations are now using the cloud and 84% percent are using a multi-cloud data strategy.3 However it is becoming apparent that analytics can be a challenge when data is housed on multiple platforms.
In 2022, we will see increased use of new software tools that provide a view of data from multiple on-premise and cloud systems. We will also be able to access that data to perform analytical tasks.
Gaining a virtual view of dispersed data and being able to access it with everyday business intelligence tools is increasingly seen as a viable alternative to the traditional data warehouse, where data is collected from multiple sources and managed in a central location. Companies have to find a way of linking these disparate sources together. This can be done with relevant “keys” .
A keys based matching process uses some sort of defined key or keys to help find the potential duplicates when data is collected from multiple sources. In an ordinary database, a simple key is often added to each record for indexing purposes, which is just a number. Clearly, there should never be a duplicate of this number… Alternatively, a key could be a simple identifier, such as the Unique Delivery Point Reference Number (UDPRN), from the Royal Mail, which will pull data together, as long as the address has been matched (correctly).
However, there are a myriad of other ways to build a key or multiple keys to help merge data and identify duplicates, such as taking the postcode + premise. Any company or person building a keys based matching system needs to identify the elements they require in the keys, then how they want to process those elements (in full or reduced/standardised forms). A key is a simplified version of the information, able to unlock the full record, which should be built in a consistent way.
2. The Shift To Predictive And Prescriptive Analytics Accelerates
Data analytics has traditionally been used to understand what happened. But there is growing use of data analytics and machine learning technology to predict what will happen.
This change is happening due to the growing availability of easy-to-use machine learning tools by analysts and data scientists, the ability to manage and deploy scale machine learning features with next-generation feature stores1, and a new generation of distributed frameworks for training and deploying machine learning models.
The gap is closing between analytics and machine learning, but to predict any business need it has to be done with good quality data.
Accuracy is especially important when organisations use AI to process personal data and profile individuals. If AI systems use or generate inaccurate personal data, this may lead to the incorrect or unjust treatment of a data subject.2 ICO
In addition to processing and profiling personal data, if AI is used to understand key locations for example to make a decision on where to send Organisational Resources or open Warehouses/Retail outlets – without accurate location data, the predictions could be wrong.
3. Data Fabric Vs. Data Mesh
“Data fabric” and “data mesh” are emerging architectures for integrating, accessing and managing data across multiple processing units of different types within a computing system.
But there are differences, and both these terms will be debated at some length in 2022.
Gartner defines data fabric as
“A design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilises continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilisation of integrated and reusable data across all environments, including hybrid and multi-cloud platforms.
Data fabric leverages both human and machine capabilities to access data in place or support its consolidation where appropriate. It continuously identifies and connects data from disparate applications to discover unique, business-relevant relationships between the available data points.”
Data fabrics weave together data from internal silos and external data sources to create data networks to power business applications, AI and analytics, according to a definition from Stardog.
The “data mesh” concept, developed by Zhamak Dehghani, a director at IT consultancy Thoughtworks, is focused on the logical and physical connections that enable companies to reliably transfer data between assets.
Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralised strategy of data mesh, distributes data ownership to domain-specific teams that manage, own, and serve the data as a product.
These methods are about better and faster data-driven outcomes, however those outcomes rely on data accuracy, so this has to always be the starting point before introducing any new architecture.
4. Data Observability Goes Mainstream
Data Observability provides a way to monitor data for its quality, behaviour, privacy and ROI, says Sanjeev Mohan, a consultant and advisor at data and analytics firm Eckerson Group.
Again Data Observability appears to be a growing trend, but from our research Data Quality is the stage before observability, so it’s a process that can’t be started unless an organisation is committed to Data Quality.
For Data Observability to work well it is good to include a status code. Looking at status codes will unveil lots of insights on how data flows between services and how you can maximise accuracy or delivery rates. For example in addressing, each status code will show whether the data is matched to a dataset such as Royal Mail Postcode Address File, (PAF). Or the status code may show an issue. The “issue” status codes can be flagged to a trouble shooting team and if necessary cleaned.
5. Data Marketplace Use Will Explode
Business analytics initiatives have traditionally focused on analysing internally generated data such as sales, market surveys and business performance. But increasingly, businesses are obtaining data from external sources and using it to supplement and enrich their own data:
IDC says that 75 percent of enterprises in 2021 used external data sources to strengthen cross-functional and decision-making capabilities.
All businesses have individual data needs in order to engage with their clients and stakeholders. Enriching your data by adding additional or missing information can support your business in its engagement and reach, whilst increasing the variety of data analysis options.
Hopewiser has always been data agnostic, using data from a variety of sources depending on the needs of the client and which brings the best return on investment.
Enriched, cleansed data provides many benefits. Not only enabling better more informed engagements, but also targeted marketing campaigns, profiling, insights and relationship building. Accurate analytics enables a business to make informed, cost effective decisions and reach clients in ways they were unable to previously.
Hopewiser hold a variety of datasets that can be used at the point of contact, so as an address is verified, then further information can be captured immediately, such as standard reference numbers, grid references, SIC codes, building type, allowing for business decisions and internal checks to find duplicates/link data together.
For example geographic data enrichment can involve adding postal data or latitude and longitude to an existing dataset that includes customer addresses, thereby enabling location analytics and intelligence. Adding this kind of insight into your data is useful for a number of reasons, such as; targeted marketing campaigns that are right in context and region; business planning, for research and demand e.g., it can be used to site plan for a new store and business forecasting which will enable a business to plan for future supply.
Adding SIC codes to organisation data, allows for a range of potentials, from classifying your clients and prospects, identifying key markets, insights into what services to deliver and even what rubbish requirements the organisation has, plus much more.
Why Use Hopewiser’s Data Quality Services
A range of professional data services to ensure that your records are accurate and can be used to build your business.
The knowledge we have acquired and the sheer amount of data we have processed in that time, sets us apart from everyone else in the market. So if we do things a little differently from others, it might be because we have information they don’t.
- 1 https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science-3f9156f7ab4
- 2 https://ico.org.uk/about-the-ico/news-and-events/ai-blog-accuracy-of-ai-system-outputs-and-performance-measures/
- 3 Multi-Cloud Data Analytics: What, Why, and How | Integrate.io
, updated 17th August 2022.