Reconciling Disparate Data Sources with aiMatch and eXpert-in-the-Loop

Last update: July 17, 2024

Introduction

The goal of applying AI/ML to enterprise data is to extract meaningful patterns and insights, enabling impactful decision making. To achieve this end, it is essential to understand how all of the data fits together, so meaningful relationships can be surfaced. However, enterprise data is often siloed across multiple data sources, primarily due to different aspects of the business being managed within different processes and tools, resulting in many partial sources of truth. To gain a holistic understanding of the data, it is necessary to stitch the information together across these disparate sources.

**Image 1.** Disparate data needed for informed and accurate enterprise decision-making

In an ideal scenario, where data sources are designed carefully by a centralized system designer, a natural way to stitch them together would be through traditional database or data warehouse operations such as joins relying on shared identifiers or keys.

However, the reality is that such data sources are rarely designed with easy integration in mind. To make matters worse, the data often contains missing values as well as errors and anomalies, making automation of data unification a challenging task.

**Image 2.** aiMatch and eXpert-in-the-Loop unify enterprise data

The goal of aiMatch, a generative AI solution from Ikigai, is to precisely address this challenge. With a computationally efficient approach via large graphical models (LGMs), Ikigai solutions – such as aiMatch -- are able to solve large and critical problems for the enterprise.

For the problem of data reconciliation, aiMatch is able to bring together previously disparate datasets by matching data across tables with AI and human oversight, known as expert in the loop (XitL). Below we walk through an example of a typical business scenario that is supported by aiMatch.

Data input: Two mismatched tables

Consider the following business scenario: a large business with a complex network of suppliers and vendors relies on its GL system to maintain its overall financial records, while using a separate cash management system to manage banking transactions and cash flows across multiple banking partners and payment processors.

When the company makes a sale, pays a vendor, or incurs an expense, the transaction is recorded in the GL system using the appropriate GL account names, debit amounts, and credit amounts. These transactions are summarized and batched before being entered into the GL system, with each batch assigned a unique Batch ID. The company’s bank transactions, such as customer payments, vendor payments, and bank charges, are recorded in the cash management system, where each transaction is assigned a unique statement item (SI) number, Reference ID, and other relevant details like value date, posting date, and transaction reference.

While each system serves its purpose, reconciling the data from any two systems is no easy task, with the resulting output of these systems being two mismatched tables that need to be reconciled by the business. In the next several paragraphs, we will show how aiMatch can be used in this example to reconcile data between two disparate sources.

**Table 1.** The General Ledger system records and organizes all financial transactions of a business. When the company makes a sale, pays a vendor, or incurs an expense, the transaction is recorded in the GL system using the appropriate GL account names, debit amounts, and credit amounts.

**Table 2.** The company’s bank transactions, such as customer payments, vendor payments, and bank charges, are recorded in the cash management system, where each transaction is assigned a unique statement item (SI) number, Reference ID, and other relevant details like value date, posting date, and transaction reference.

aiMatch is able to bring together previously disparate datasets by matching data across tables with AI and human oversight, known as expert in the loop (XitL).

Step 1: Schema mapping

In order to reconcile these datasets together, we must identify which rows in Table 1 match to which rows in Table 2.

In order to see matches or more general similarities between any pair of rows, we must first understand what columns in Table 1 have affinity with columns in Table 2. To accomplish this, aiMatch generates Schema Mapping out-of-the-box. For the example tables shown above, the following is output generated by aiMatch.

**Image 3**. Suggested matches are accompanied by the strength of the relationship, with a higher weight suggesting a stronger potential match. For example, with weights of 0.2972 and 0.2051 respectively, we can see that Narrative 1 and Narrative 2 are more likely to be related to Notes than to Debit Amount, which has been assigned much lower weights (0.0094 and 0.0161 respectively).

When a user runs aiMatch, the system suggests the top matches out-of-the-box, as shown in the example on the previous page. It shows that certain columns in Table 1 (under Left Column) are similar to the columns in Table 2 (under Right Column) with a similarity score(under Weights). For example, if two columns match perfectly then their similarity score will be 1, and if they do not match at all then their similarity score will be 0.

Step 2: Expert feedback to improve schema mapping

With the first attempt, the out-of-the-box column mapping is likely to have some errors. This can be corrected quickly with the help of an expert in the loop (XitL), who can remove suggested matches and add new matches to improve aiMatch’s output.

**Image 4.** In the above screenshot, we see the output of schema mapping upon few removals and an addition (Transaction Amount <> Credit Amount with 0.5 weight).

Step 3: Row matching

aiMatch uses the schema mapping to determine the matches of rows across Tables 1 and 2. This results in a certain number of rows being mapped while others remain unmatched. In the image below, we see that after the first round of matching, more than 10% of the rows remain unmatched across tables with the distribution of pair-wise row similarities depicted below.

**Image5.** By providing simple yes/no inputs as to the quality of the matches, it’s easy to quickly reduce the number of exceptions to a manageable set for internal business users to review.

Once a user clicks into the data to see the examples of exceptions across Tables, it quickly becomes clear that there are some obvious examples in both tables that should be matched.

**Image 6**. Matching opportunities identified.

Step 4: Expert feedback to improve row matching

While aiMatch uses ML to identify potential matches between the fields across different systems, the real power of the tool lies in the way it solicits expert feedback. Rather than requiring users to go through an entire collection of matched/not-matched sets, aiMatch solicits input from reviewers in terms of thumb up or thumbs down on very few carefully chosen matches and non-matches.

**Image 7.** Incorporating eXpert-in-the-loop for human-centric decision-making.

While aiMatch uses ML to identify potential matches between the fields across different systems, the real power of the tool lies in the way it solicits expert feedback.

As feedback is provided, aiMatch instantly incorporates and learns from the feedback, improving its ability to find additional matches.

**Image 8.** Results of expert intervention.

Using the new information provided from the expert review, the system will now go back to find all possible matches. In the image below, we can see that all data has been matched outside of 6 exceptions, which were only left unmatched because there was nothing left to match to in Table 1.

**Image 9.** A small set of exceptions from the general ledger exist; all the other data has been matched.

With data now cleanly reconciled, the retailer is now prepared to close out their books or perform analysis on the integrated dataset.

aiMatch: Use case example

Retail Promotions Effectiveness: Matching and Analysis Across Data Sources

‍

A retailer is looking to better understand how different promotions have driven demand across a wide set of products sold across online and in-person stores, as well as if there are trends across geographies and consumer categories. To accomplish this, the retailer will need to look across a wide variety of data sets including:

Promotion details such as start/end dates, discount amounts, and promotion types
Promotion targeting information such as customer segments and loyalty tiers, geographic regions, or channels
Promotion performance metrics such as sales revenue generated, units sold, redemption rates, and profit margins

‍

Bringing the data together – and ensuring it’s fit for analysis – is no easy task, as much of the data lives in different systems, and is governed by different schemas and naming conventions. Basic promotion details (name, dates, and eligible products) are stored in the ERP, while a recently implemented CRM, with its own schema for storing promotional data, houses targeted customer segments and creative assets. To further complicate matters, the retailer has recently acquired a smaller company that uses a legacy POS system. This system has a different schema for product information, which differs from the retailer’s custom PIM system. Integrating the acquired company’s data into the existing infrastructure adds another layer of complexity to the already diverse data landscape.

Across all these systems, the data schemas lack uniformity, and the degree of data completeness varies depending on the type of information each system stores. Despite these challenges, harmonizing these disparate data sources is crucial for gaining a comprehensive view of promotions and making data driven decisions.

This is the perfect application of aiMatch. Rather than spending valuable resources to manually reconcile data across sources, the retailer can use aiMatch to largely automate the task. With the review of an expert in the loop, the retailer will benefit from automation as well as accuracy, making quick work of identifying all possible matches, and preparing the data for further analysis.

Conclusion

Connecting and reconciling disparate data sources is a common problem for all businesses across all industries, often accounting for more than 80% of a data analyst's time. Ikigai automates this process by leveraging its patented Large Language Model, aiMatch, to harmonize data across tables for greater efficiency and accuracy. aiMatch integrates human intuition and expertise with its eXpert-in-the-loop feature to quickly address anomalies and exceptions, continuously improving model confidence for increased data quality and improved decision-making.

Additional resources

Glossary

aiCast

aiCast is a forecasting AI model based onpatented Large Graphical Models (LGM). It is designed to predict future trendsand outcomes based on both historical tabular and time series data andreal-time data. aiCast generates 20% more accurate forecasts than traditionalmodels and methods, even with sparse data.

aiMatch

aiMatch is a data reconciliation AI model based on patented Large Graphical Models (LGM). It automates the process of connecting and harmonizing disparate datasets, ensuring consistency and accuracy across multiple sources. By utilizing advanced pattern recognition and probabilistic techniques, aiMatch enables identification and resolution of inconsistent data and can synthesize new data to address missing or incorrect data.

aiPlan

aiPlan is a scenario planning AI model based on Large Graphical Models (LGM) which can generate and evaluate up to 10¹⁹ scenarios based on complex datasets. By simulating various potential outcomes and their likelihoods, aiPlan enhances scenario planning by providing insights into risks, opportunities, and strategic responses for organizations to navigate uncertainties.

eXpert-in-the-Loop

"eXpert-in-the-loop" (Xitl) refers to a hybrid approach in artificial intelligence where human expertise is integrated into the machine learning process. This methodology involves combining the capabilities of machine learning algorithms with human domain knowledge or judgment to improve the accuracy, efficiency, and interpretability of AI systems.

Large Graphical Model (LGM)

A Large Graphical Model is a generative AI model that produces a graph to represent the conditional dependencies between a set of random variables. It is designed to work with enterprise-specific or proprietary data sources, such as tabular and time series data used in data reconciliation, forecasting, and scenario planning.

To learn more about the Ikigai platform, visit here.

Download PDF

To download the eBook as a PDF, click here.

‍

On this page

Text Link

Get started

See the power of LGMs on your data.

Chat with one of our AI experts and dive into the power of the Ikigai platform today.

Get started