Our client has a large database with the addresses of the clients that has been gathered over many years. Over time, numerous addresses may have been entered incorrectly or inconsistently, leading to data integrity issues. This can result in inefficiencies and inaccuracies in various operations that rely on precise address information. To address these issues, we have developed a robust process to systematically verify and repair these addresses, ensuring the data is accurate and up-to-date.
The process begins with a CDS database that contains millions of addresses. To manage and verify these addresses efficiently, a bulk address verification tool is used. This tool queries addresses in batches of 500 at a time, allowing users to specify a date range for the queries. Once the addresses are retrieved, they are passed to an address verification tool for further processing.
The address verification tool is a third-party solution connected via an API, and it uses a batch server for enhanced performance. This system includes two servers: a transactional server for normal address lookups and a batch server specifically for validation purposes. The batch server can process 500 addresses in a few seconds, returning a file with verified addresses.
The verification process involves two main steps. First, the “verify” step checks existing addresses and attempts to fix minor issues. If an address cannot be fixed in this step, it is then passed to the “repair” endpoint, which adjusts specific fields to correct the address. This two-step process ensures that most addresses are accurately verified and repaired.
The tool generates files containing the verified addresses, which are then reviewed and approved. Once approved, these addresses are updated in the database. For cross-checking purposes, a file is generated that includes both the original and processed addresses, highlighting any corrections made during verification.
This process is not a regular occurrence, but rather a one-off exercise aimed at addressing issues with previously entered addresses. After the initial cleanup of addresses, the tool can be run periodically to fix any errors (primarily due to manual entry of addresses in the portal) from the last address cleanup. Typically, these processes are run over weekends to minimise the impact on system performance and to handle the underlying sync process efficiently.
The tool itself is command-line based, incorporating several commands for syncing records and bulk merges. It generates logs for any errors or issues encountered during processing. A user guide is available, detailing the commands and providing sample usage scenarios.
Overall, this high-level tool is designed to streamline the verification and repair of addresses in the CDS database, ensuring data accuracy while managing system load effectively.