Client Profile.
A leading real estate portal where 200 million home buyers, sellers, renters, agents and property managers browse home and apartment listings, shop for mortgages, and find information about 110 million homes across U.S.
Business Need.
The client hosted a huge volume of aggregated parcel data on its portal and was confronted with the highly resource intensive task of maintaining high volume of records to meet user expectations as well as ensuring accuracy and updated status of records.
In order to maintain a robust and high performing database, the company was looking to expand the depth and width of data capture as also increase parcel data accuracy by extensive validation and verification activities. This would include verification of parcel data against latest information available on USPS website (United States Postal Services) and various county websites. As a next step the property parcel data was to be mapped and standardized in a format compatible with their system.
HitechDigital was approached to capture and verify property and owner addresses, standardize owner names, append owner and situs zip codes, review primary situs addresses, APN (Assessor’s Parcel Number), legal description etc.
Challenges.
- Huge volumes of 6,50,000+ records to be updated/validated every month.
- Managing 225 full time resources with knowhow of real estate terminologies.
- Capturing, standardizing, validating and integrating structured and unstructured data from disparate real estate sources like USPS and county websites, public records, property and mortgage documents, etc.
- Replace erroneous and missing property information with validated and verified property addresses, street no, building names, boundary mapping, GIS mapping, location tracking etc.
Solution.
Deployed a seamless workflow using macros, scheduled bots and rule-based scripts to scrape and validate property data from USPS and other county websites, quality check it, and route it to client system through API credentials.
Approach.
After studying the as-is process, our team of data professionals designed and documented a workflow in consultation with client to capture, validate, and verify property data using a mix of manual and automated steps.
Implementation:
The entire workflow was broken into several modules according to the expertise required for various activities and resources were mapped to modules according to skills.
- Classification and Tagging
- FTP folders were used to receive land images (source images), property database, list of county websites, property documents such as deeds, mortgages etc. and database which needed validation, updation and enrichment.
- Each of the document was classified and tagged county wise.
- Rule based macros ensured accuracy of document routing to folders.
-
Data collection
- Real estate data was scraped from multiple real estate sites and collected from property document images as well.
- Scheduled web crawlers paced up online data capture process.
- Rule based macros ensured that only legitimate document images moved into the data capture process.
- Parcel data was validated and mapped against GIS mapping, location tracking details, and data from other real estate multi listing sites.
- New property information was added with help of programmable bots.
-
Data Entry
- Data entry fields were classified in critical and non-critical data fields:
- Gray Entry – Simple fields like buyer info and property data were entered.
- Yellow Entry – Complex data fields requiring comprehension skills and domain knowledge.
- Double D Entry – Two separate teams entered the same data simultaneously and later the files were merged and verified using macros.
-
Data Mapping
- Rule-based scripts verified and mapped extracted data to fulfill parcel data requirements and improve overall data quality on real estate website.
- Missing information was appended to ensure accurate parcel mapping and update property database to be used by end customers.
Quality Check and Audit:
- QC done logically through automation, where errors raised alerts.
- P.G. Admin Tool was used to identify junk characters in the data entered.
- Multi-layered QC process ensured that county wise property parcel information was converted to appropriate format.
- Audits were done to ensure that data was bucketized in buy, sell or rent category before being uploaded in client’s system.
Delivery and Dispatch:
- Append property parcel data on client’s portal through API credentials.
- Reports/dashboards at granular level were shared with client.
Technology Used
MS Access, PG Admin, Macros, Bots and Scripts