The term “big data” has been thrown around a lot lately. CRE is no exception with startups claiming to be harnessing its power.
What is big data? It refers to large data sets of both structured and unstructured data sources. You’re dealing with structured data when you’re filling out forms. Each field in the form goes into a column in a table in a database.
Unstructured data is typically text – like Tweets – where meaning has to be derived from analysis of the text. That’s where algorithms come in – trying to make sense by deriving the subject matter, intent, sentiment and so on from a bunch of words.
I don’t know of any aggregator or data provider that uses unstructured data but most have large structured listing data sets. That’s enough for them to call it big data especially when they mix it with other structured data like demographics or other data sets. And for the most part, it’s a step forward to have many of the sources CRE has already been using in one place. But how good is it?
I looked for publicly available – something a user might see – where I could compare a “big data” generated report with one where the data was collected and crunched the old fashioned way. For the latter, I looked for two sources to see how those might vary in comparison to each other as well as to the “big data” reports. Here’s what I found for average asking lease rates in Rochester NY.
Data Provider Data
Savills-Studley/using CoStar and RCA data
$19.88/SF –class A, assume all submarkets, no lease type identified
$15.97/SF –class B/C, assume all submarkets, no lease type identified
$17.61 overall/all classes/assume all submarkets/no lease type identified
Locally Sourced Data
$21.94/SF class A, all submarkets, gross
$15.50/SF class B, all submarkets, gross
$19.75/SF overall, all classes, all submarkets, gross
You might be thinking that because Rochester isn’t that big a market, many aggregators don’t have a lot of data to work with. But I found the same pattern in large markets.
So which one is right? I can tell you that someone looking for office space in Rochester expecting to pay $13 a square foot is in for a shock.
Just because it’s from a self proclaimed big data sources doesn’t mean it’s good. That’s because most of what big data produces hasn’t been verified or checked for reliability. While it’s possible to generalize outcomes from limited data (political pollsters do it all the time) the validity of those outcomes can only be proved over time. In other words, you have to compare actual outcomes with projected ones to see if your algorithm or the data that’s being collected has any meaning or value.
Take those listing activity reports that compares how your listings fared against supposedly “like” listings in the market. The implication is that the space will be sold or leased faster but ask the developer to prove it – and with hard data, not anecdotes. How do tenant or transaction viability scores match up against traditional qualifying methods? While they may be a time saver, are the outcomes also better?
I’ve asked developers about reliability but all I got was the “we use best practices” spiel. So you get to be their guinea pigs. At least I hope they quantify the outcomes. The irony of validating their “big data” methods with “success stories” doesn’t say much about them or what they’re doing.
Everyone needs to remember that big data in all industries – and especially in commercial real estate – is in its infancy. What business want from it is to reduce risk. That the data that’s being crunched will lead to better, more profitable decisions. It will happen. But for now, don’t bet on it.