July 23, 2014

Prepping Your CRE Data (Nerd Guide)

You’ve got the relationships set, know what data you have and what you want.  Now comes the hard part.  Cleaning up your data and quantitatively creating relationships using ID numbers (those Primary Keys you saw in last week’s post).

There are 2 basic ways to create these ID numbers for the data you already have.  Which one you choose depends on how you want future ID numbers generated.

Most databases have an option to automatically create an ID number as the Primary Key for each row.  If you use this method, you’ll need to import the “top” tables in your hierarchy to get these ID numbers to place in your other tables.  Or can create your own IDs then write/use a function that will automatically generate the next ID number.  So when importing your table, do not select the option to let the database create the Primary Key for you and specify your own Primary ID column instead.   (Resources for auto-increment functions here -  http://www.w3schools.com/sql/sql_autoincrement.asp – and just in case, for Oracle here –   http://blog.lishman.com/2009/02/auto-generated-primary-keys-in-oracle.html)

First, in your original sheet, add columns for PropertyID, ListingID, ComparableID, CompanyID and Contact ID.  Also add a record ID and fill this in from 1 until the end.  Don’t rely on Excel’s numbering and if you’re using a formula to generate the record ID, remember to copy, then PASTE SPECIAL –>Values as the record ID must not change.

One important note.  Always keep a copy of your original data spreadsheets and save an extra copy when you reach each milestone of your data organizing/cleansing project.

Data Cleanup Tips

Clean up first, then do the sorting/organization.

– Apply your data standards to your fields and fill in missing data and/or verify formats (zip code, city, county, state, etc…).  Say you want only 2 letter state codes.  Sort the state column then use the FIND/REPLACE function or copy the value you want then paste it in the appropriate cells.  Sort and standardize EVERY data column.  To you, ABC Company is the same as ABC Co. So is 123 Main St and 123 Main Street.  But to a database, those are two different companies and two different property addresses.

–Speaking of the property address, it’s typically stored in separate columns – street number and street name.  Some systems will break down street name into directions, street type, and on and on.  If you think it will help keep your address data entry standardized, go ahead -  but from what I’ve seen, it doesn’t.

For street numbers,  I would recommend adding a second column – street number, and number oddity (to accommodate 1/2, B, etc…).  If you don’t have this second “odd” column, your street number will have to be a text field.  Not a good option since you can’t properly sort numbers stored as text.  For instances where you need the address all together, you can combine the address fields in a query.

If your addresses are in one cell now, use Excel’s TEXT TO COLUMNS.  Select the column, your delimiter (usually a space, maybe a comma first…) then enter.  123 Main Street will be broken into 3 columns.  To get the street name back together, use Excel’s CONCATENATE formula.  Since it’s not likely that all your street names will be in just 2 columns, you’ll need to concatenate all the columns for each row.  To remove the extra spaces, use the TRIM function.

Again…REMEMBER when using formulas that you’ll need to copy the column with the formula results first, then PASTE SPECIAL (Values) into a new column.

–Remove line breaks, special characters (okay in comment fields), punctuation/alpha characters in numeric fields (feet, inches).  Also use Excel’s CLEAN function to remove stuff you can’t see…but the database can…

Isolate Listings/Comparables

Flag each of your rows as either a listing or a comp if don’t have a column for this already.  While you’re at it, add a column for listing/comp Type (sale or lease) and other listing/comp related status info (current/active, pending, inactive/off market for listings;  verified/rumor/secured for comps).

Sort by the listing flag and give each listing row/record and ID.  Do the same for each comparable.  Each row will now have either a listing or comparable ID depending on the type – but not both.

You could conceivably stop here and separate the listings from the comparable data, adding a few more rows/deleting others to create your tables.  But if you’re creating a Property table, the next step is to identify the UNIQUE property records you have.

Finding Unique Property Records

Use Excel’s  ADVANCED FILTER feature and sort for unique values on  your address column – or any other column (like APN/Tax ID) that holds a unique property identifier.

If you weren’t consistent with address/Tax ID formats, this could get messy.  You’ll need to “eyeball” your results to check for duplicates.  If you find any, correct and re-filter on address.  Once you’re sure you have all the unique property records, fill in the PropertyID column.

Copy these rows to a new sheet. This will be the base of your property table.

Keeping the ADVANCED FILTER in place on the original sheet, assign a property ID for each filtered property record.

Clear the filter on the original sheet, then sort by your address field.  You should see something like this – rows with the same address will be grouped together but only one will have a property ID filled in:
data set
You get to fill in all the empty cells!  Make it easier using a formula like the one below.  Fill in the first property ID in the appropriate column for all instances of that property.  Create a new column (C/PC2) and copy that code (see orange text below) in the first few cells.

Place the formula highlighted below in the PC2 column after the last entry (IF(A4=0,C3, C3+1)).  Copy it and then paste it all the way down the column.  Next, select the PC2 column, copy it, then PASTE SPECIAL –> Values in place.  Rename the column to PropCode/delete the original one.  Now all your property codes are filled in.

data set 2If you’re importing your data into your database selecting the option for the database to auto-generate a primary key DO NOT create a property code column.   Use the ADVANCED FILTER then select all the rows and copy to a new sheet.   Import this into your database selecting the auto Primary Key feature.  You’ll need to get those numbers and match them to your original spreadsheet.

Alternate Addresses
If you have alternate addresses for some of your records or want to add some now, create a new sheet with the matching property ID and fill in the alternate address (street number, street name) for the Property Multiple Addresses table.

Contacts/Companies
First, decide whether you want one or 2 tables for companies and contacts.  Some systems don’t have a separate Companies table, instead adding the company name in the Contact table.  Nothing wrong with this, especially if you’re not using this system for advanced CRM.  If you do want a separate company table, you’ll need to decide if a company record is required for each contact, i.e, each contact record MUST also have a company record.

If you have lots of company/contact columns in your original sheet, this will be incredibly messy and tedious.  If you’ve only got a few columns with companies/contacts lumped together, you’re going to have to sort the companies from the contacts.

The process is basically the same as above – isolate each company and/or contact using the ADVANCED FILTER but you’ll need to copy each isolated column to a new sheet since the same company or contact can be in multiple columns.  Once you get all those together/sorted, you’ll add IDs on your new sheets and basically have the base for your Company and Contact table.

Now, the bad news – you have to go back and fill those IDs into the proper columns – most likely NOT the original columns if you lumped everything together in your original sheet (and property table sheet).

If you don’t have a lot of properties/listings/comps, you might want to manually link companies/contacts to their respective properties, listings and comps. Or just get the owners in…or…outsource!

At the very least, you should try to deal with broker company/contact records so you can create the Multiple Agencies/Agent tables.  Each row will have either a ListingID or a CompID with the Company (Multiple Agencies) or Contact ID (Multiple Agents).

Create Property, Listings, Comparables Table Data

You’ve already got the base for your Property table on the separate sheet you saved previously when filtering for unique properties.  You’ll just need to fill in owner company/contact info, remove any non-property related columns (listing, comp related data).

For the Listings and Comparables table, sort by the ListingID/ComparableID respectively and place each subset in a new sheet.

Remove any columns you don’t want from each sheet.table and add any columns for data that you want in each table.  If you have that data, add it now.  If not, the columns are there for new database entries.

In the end, your tables/sheets should look something like this:

Now you’re ready to import each sheet as a table into your database.