In general, the big breakthroughs in big data, AI, etc. over the past five years have involved techniques which are often called “deep learning.”
Following the Giants
Before I studied programming formally, I was enrolled in an econometrics program at UC Berkeley. I knew just enough statistics and just enough programming that I thought I could get by. I was thrilled that we were using a new statistical package from a well-known university. I’ll call that package “StatXY.”
Using StatXY, I struggled for weeks to get all the pieces of my research to line up: Good data? Check. Clean code? Check. Well thought-out analysis? Check. Good results? #fail. My instructor’s takeaway? Perhaps you don’t belong here.
By accident, I overheard some real programmers in the computing center discussing StatXY. “Can you believe the bugs in StatXY?” “How did university XY release this to us without telling us it was less than alpha?”
Since what I was doing was essentially simple, I replaced StatXY with some code from UCB and like magic, I now belonged.
For almost two decades, some companies have been getting billion dollar valuations by combining data, big data, cloud data, statistics, machine learning, and artificial intelligence. Knowing that few companies have the resources to combine all those pieces, organizations are looking for “major breakthrough” as a service. Indeed, that is a good direction, but there are a few pitfalls.
One is that the architecture required for examining, for example, the behavior of millions of Netflix users may be only partially applicable to small- and medium-sized firms. (Some things, like support for A/B testing, are often applicable to even small projects.) The assumption buried into many products, both on and off the cloud, is that support for terabytes of data is a good starting point even from day one and can lead to a focus on solutions rather than a focus on the problem to be solved.
There is a deeper, hidden problem. In general, the big breakthroughs in big data, AI, etc. over the past five years have involved techniques which are often called “deep learning.” This is great stuff, but it is very difficult and usually impossible to understand how systems based on deep learning make their decisions. If no one at Google or Baidu understands exactly why a certain word in English is translated to French in a certain way, I’ll probably be happy as long as my French-speaking friends can understand my intention. On the other hand, if my building’s AI decides the hot water heaters should increase their target temperature from 150 to 190 degrees F, I will want visibility into the decision.
Thinking About Pipelines
A few weeks ago, a close friend redesigned the plumbing in her house so that grey water could be extracted and sent to the garden. Simple idea, right? But with the changes, some sections of pipe were under increased pressure and they burst. That was not necessarily a big deal. What was a big deal was that many of the pipes were buried under a meter of concrete. The pipeline could not be inspected so there was no reasonable way to pinpoint or fix the leak. The entire house became non-functional for ten days.
The takeaway is that pipelines should be designed for inspection, testing and repair. Looking at the real world again, compared to rains, trucks, and boats, pipelines are unusually reliable. But they need to have those basic properties.
From IoT Devices to Actionable Data
There has been intense interest in botnets lately, largely as a result of the Mirai DDOS attack a year or so ago that brought down major sections of the Internet. What do these attacks have to do with big data?
When sensors, controllers, actuators, etc. are the beginning of a data pipeline, responsible designers, contractors, building product manufacturers (BPM), and owners will require higher standards of security for IoT devices. Microsoft’s Sopris and IoT/AI Labs are examples of the types of support the A/E/C industry should expect from major hardware and software players. The goals should be two-fold: prevent hacking and provide verification for data quality. Once the low-level devices are secure and monitor able, it is not enough to blindly hand over the data pipeline to a service that promises to turn data into information, actionable information, new insights, etc. Rather, the service needs to expose enough of its internals so that the entire pipeline can be inspected.
Taking these considerations into account, the rapidly expanding set of services which provide rich support for massive data, analytics, learning, etc. provide a realistic path for moving A/E/C firms into the age of big data, AI, etc.
However, since they are currently mainly centered around support for deep learning and other forms of AI which are hard to audit, organizations will benefit from pressuring providers to incorporate newer forms of AI which are more appropriate for small data sets, examination of decisions, etc. At this point, those fall mainly into the area of Probabilistic Programming.
So when Google translates a sentence from Chinese to English, it can utilize data and processing power spread across thousands of computers worldwide. Services that provide platforms like Hadoop are replicating a slice of that architecture.
But when a car needs to make an avoidance adjustment for a box that falls from a truck, there is not enough time, bandwidth, processing power, or data to work like a translation or Alpha Go, for example.
Likewise, a building that needs to act in real time because of surprising data from internal sensors or external signals must find ways of acting that don’t depend on processing terabytes of data.
Beginning in the ’80s, IT departments had access to Relational Databases which eliminated the need for a lot of procedural programming. When databases were well designed, the relational model is almost like magic. The words well designed are key.
As we attempt to turn data, especially data originating in IoT, into systems that can both act and learn, we will find that diving into the details of design still matters.
Organizations should not assume they can buy or rent a service that hides all the details of their systems.
Blaine Wishart is a senior principal of the Strategic Technologies practice of DI Strategic Advisors.