Client Profile.
The California-based B2B data aggregator provides comprehensive and current business data; right from early-stage startups to Fortune 1000 companies. They enable companies to tap into competition, uncover startup trends, get company funding data, find new prospects and opportunities etc.
Business Need.
The B2B Data Management Company wanted to strengthen its 50 Million-record database, make it more comprehensive and weed out data decay issues. While it was important to keep adding new company profiles and scale up the database, it was equally important for them to keep the existing database continuously validated and updated.
This entailed voluminous omni-channel sourcing to capture business records from listed, unlisted, traditional and nontraditional sources as well as an extensive and well-orchestrated validation and data enrichment process.
They were looking for a partner with extensive experience and expertise in the area of data management to support their database building operations.
Challenges.
- Company data changing at incredible speed because of new CXOs, location change, M&As etc. required updates and validity checks done at an equally fast pace; necessitating robust processes and automation.
- Data capture of privately held smaller and medium size companies with no audited information made public or that of unregistered startups.
- Bring together public and private data from multiple sources while adhering to privacy norms.
- Manage huge volumes of omni channel data acquisition and validation on an ongoing basis to ensure currency.
Solution.
Update and strengthening of B2B database through ongoing addition of new company profiles as well as enrichment, validation and cleansing of existing database. A robust data management workflow powered by manual research, ML algorithms and custom rules drove the omni channel data sourcing and multi-layered validation process. Enrichment of existing records with hundreds of additional data points was an integral part of the solution.
Approach.
- A two-pronged approach was designed to meet the continuous database update need:
- Data acquisition and validation of new multi-sourced company profiles to populate and build existing database.
- Continuously update the existing data by validating it and further enriching with additional information.
- Cleansing and de-duplicating the database at regular intervals.
Implementation:
- Scheduled bots and crawlers were deployed to harvest raw data on new businesses from public and private sources-both structured and unstructured – across time zones and geographies.
- Business rules-based validation process was used to authenticate founder name, acquisition amount, IPO launch, revenue, employee increment etc. against reliable online sources like business directories, news sites, blogs, forums, social media sites, etc.
- To enrich records with contact-level and firmographic data, programmable bots were scheduled to collect missing and outdated data points.
- ML driven Robotic process automation (RPA) was deployed to ensure uninterrupted addition of new company records regularly to the database post verification and validation against trusted data sources.
Quality Check and Audit:
- Multi-layered validity checks ensured authenticity of firmographic, technographic and behavioral data points harvested.
- Data quality audits were conducted in integrated environments to find out missing records which were then appended to ensure that the dataset is complete and authentic.
Technology Used:
Custom tools, Macros, Bots and Scripts