Intro to BIG Data in AECO 2017

AECO does not use (much) Big Data YET. Who will adapt? We all will if we want to be in business in 5, 10, 50 years!

To transform the status quo and push development and lead our firms into the world of the present as well as the future, technological progress will not stop to evolve unless humanity does. Therefore, to embrace data use and analytics to generate more intelligence and evidence-based decision making, business intelligence is of paramount and ongoing importance. 
This idea of evolutionary and revolutionary growth will be the intended framing for this article on what Big Data is, some uses, and why it is so important. 
Capturing Big Data is but a first step… we need to develop new methods for all aspects of inputting and using design-through-operations data from changing our understanding, perspectives, and creativity through developing unforeseen breakthroughs (asking big questions). Big questions like: How can Big Data transform AEC? Is leveraging Big Data necessary for my practice? Can my designs become qualitatively better by using Big Data to inform design? Are Artificial Intelligence and Machine Learning systems important to AEC uses? And the like…
What’s on Deck:

  1. Understanding Big Data 
  2. Capturing Big Data 
  3. Big Data Analytics
  4. Big Data: From Insights into Actions
  5. References & Links

Within those foci there are the few imperatives we’ll review toward gaining control of Big Data to truly benefit, if not revolutionize, the built environment process.  

01 | Defining Big Data

Big Data refers to large sets of data obviously, but large as in “all” Internet traffic or globally focused data sets. For instance: All the data from all systems (human systems included) in a project type—say, a hospital. Not just user preference data, but historical data of use patterns, design explorations, etc. nationally or globally aggregated. Having such data, then automating more and more of our processes will allow AEC more current as well as future abilities, growth, and relevance.
An “official” definition, as IBM states: 

“Big Data is being generated at all times. 
Every digital process and social media exchange produces it. 
Systems, sensors and mobile devices transmit it. 
Much of this data is coming to us in an unstructured form, making it difficult to put into structured tables with rows and columns. 


To extract insights from this complex data, Big Data projects often rely on cutting edge analytics involving data science and machine learning. 
Computers running sophisticated algorithms can help enhance the veracity of information by sifting through the noise created by Big Data's massive volume, variety, and velocity.” 

The definition continues to break down and add to these concepts with Volume, Variety, Velocity, and Veracity.
“Volume – Scale of the data 
The ability to process large amounts of data and what you do with that data. 
Variety – Different forms of data 
Making sense out of unstructured data by trying to capture all of the data that pertains to our decision-making process. 
Velocity – Analysis of streaming data 
The rate at which data arrives at the enterprise and the time that it takes the enterprise to process and understand that data. 
Veracity – Uncertainty of the data 
The quality or trustworthiness of the data. The quality or trustworthiness of the data. Tools that help handle big data’s veracity discard ‘noise’ and transform the data into trustworthy insights.”

Use Cases

Again, as the masters of Big Data over at IBM postulate use cases, the two main concepts that can be used to bring the most immediate benefit are perhaps Data Science Sandboxes and Data Lake Analytics, with Streaming Data/IoT Platform uses following closely thereafter.

“Data Science Sandbox 
Data Science is an interdisciplinary field that combines machine learning, statistics, advanced analysis, and programming. It is a new form of art that draws out hidden insights and puts data to work in the cognitive era.”
For AECO this is an arena that with collaboration can become the basis of exponential growth of great design. The human iterative processes are antiquated and a major limiting factor to finding “better” solutions than we currently achieve. Cognitive realities of the expanding data-centric world we are in can be truly enhanced if we break out of our current limitations, and advanced “data-play” is the path. I like play conversationally, so we don’t scare the anti-science crowd ;)

“Data Lake Analytics 
A data lake is a shared data environment that comprises multiple repositories and capitalizes on Big Data technologies. It provides data to an organization for a variety of analytics processes.” While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. 

For instance:
Can 10 years of hospital internal traffic pattern data inform better design?
Can thousands of iterations, as opposed to a handful, help create better projects?

“Streaming Data / IOT Platform 
Stream computing enables organizations to process data streams which are always on and never ceasing. Stream computing helps organizations spot opportunities and risks across all data.”

A Few Next Steps

Industry, architects, engineers, contractors, and owners develop tool sets to aggregate, analyse, and leverage Big Data sets, including real-time capture and analysis, then…

Transpose “design knowledge,” rules and requirements into machine learning algorithms to enable more iterations, thousands of times faster and better than human abilities alone.

CAD is Dead, Long Live HAD: Human Aided Design is our future! 

The last “step” from above is potentially huge. It signals a few things: namely the necessity of creating protocols for making, capturing and applying new design algorithms as well as removing fears. Fear of this sort of concept: “If Artificial Intelligence (AI) and Machine Learning (ML) are doing the thinking for us, we will have no jobs.”   
While it may very well be true that most industries will see disruptions and adjustments with every fundamental, game-changing technology upshift, I would like to forward a different perspective on approaching (if not developing) into the unknowns: Relax!
Consider, for a moment, art—before the camera and after.
Before the adoption of the camera, artists were mainly fixed to one or two possible outlets of expression: landscapes or portraits. Personal expression was more a hidden function, or at best a minor outcome of the subject matter, especially.

Did artists freak out at the prospect of “losing their jobs” to the camera? Anecdotally, we can say yes, at least to some degree this was feared (by some). 

Cut to a few years later and I would contend (as many scholars do) that the camera’s invention was precisely not only the impetus for but the liberation of art! The camera allowed if not made art explode into an arguably more dramatic, creative, and even more professionally, personally, and culturally fulfilling endeavour. 
So as some of the current methods used in AECO are replaced by AI and machine learning systems, this should not be feared, rather viewed as a liberation. A liberation allowing us to focus more on the “high value” parts of our jobs… and on our personal lives :)
I ponder this potential creativity explosion in AECO will render unrecognizable the current state and limited creativity we oft find… I can almost guarantee that if we (all AECO) could assess the value of 1,000 design variations on a theme, those that are finally chosen to develop will be far superior in almost every case over the projects that compare a handful of options. Especially when (not if) we have algorithms in place that consider budget, municipal codes, human flow patterns, and every other requirement known. All of this, our future, relies on our abilities to change and adapt. The drivers of change will find more success, if history is any example. Not a guarantee, but a consideration at the least.
A Brief Note on Change

“Often, cultural issues are the hardest to overcome when deploying new technologies. Successful organizations seek executive sponsorship, develop a proof of concept to illustrate the value of the technology, and make a point of getting everyone on board. They communicate by highlighting accomplishments and pushing the innovation message, and continue to evangelize.”

Quote from: 

02 | Capturing Big Data DATA

Q: Data Warehouses or Data Lakes?

A: Yes!

Data lakes store all data, no matter if it’s structured or not. 
Data warehouses require modelled data also known as structured data. 


Processing  differences between Data Warehouses or Data Lakes appear to fall into one or two of two concepts: Schema-on-Write (Data Warehouse methodology) and Schema-on-Read (Data Lake approach). This last example can be thought of as structuring the data upon use, rather than prior. 


Data Warehouse costs can prove to be great, but in comparison other forms of storage, such as Hadoop, are much more cost effective. Hadoop is open source with a great community as well as design focused on low-cost commodity hardware.
What is known as an Apache “Hadoop Distributed File System” or HDFS combined with a MapReduce (or a sort of organizing, queueing) programming model. This method splits large data sets into smaller bits for processing. “Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality, [3] where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.” 

Apache Spark, coming from being a component of Hadoop ecosystems is becoming a prime Big Data choice. ML is viewed as a growing market force and thus important to begin to develop, so as these systems become smarter and more comprehensively implemented we can be prepared, involved, and in business. 

A most exciting prospect is in the capturing of human Design-Intelligence and turning it into virtualized systems so “what-if” scenarios can be run to produce exponentially more analytical options than we are currently capable. We CAN make design, procurement, construction, and operations more effective… more responsive… all the stuff we sign on for when entering careers in AECO… but it WILL take our bravery in defining and developing such systems and processes.


Data Warehouses can be restructured easily enough, pending the business services already tied to the data. Such reconfiguring of this highly structured data can create the need for a lot of effort to be brought to bear. 

Data Lakes being unstructured, as it were, can create much more agile and “on-the-fly” abilities to retool any queries and/or apps.


Data Warehouses being more mature and developed when compared to Data Lakes can be seen currently to be more secure, but the changes occurring every day are rapidly building next-generation security, focused on Big Data. One technology that is taking security to new heights are BlockChain systems.

BlockChains are a larger conversation, mostly for another time although there are links in section 05 below. Suffice to say the concept of Block Chains hold promise for many industries (currently, mainly in finance) and if AEC programmers utilize this approach, we may (I see this as “will”) find a future where building models are not only decentralized models but are at once secure, efficient, as well as on-demand data rich—tying planning and design to procurement, online data stores, operations, etc. We will not only embed information and data directly into our models, but will also be able to drive data from remote sources such as manufacturers and others on demand. 


Though Business Intelligence systems and general analytics have been decried as open to everyone, the statistics point to only about 25 percent of potential users involved in using it (see AEC) If we are looking at future opportunities to generate business through producing creative, efficient, data-rich and connected projects (shortly the mainstream), then I suggest we become users and developers of Big Data and Analytics thereof.

03 | Analysing Big Data

Data science’s four types of data analytics: Descriptive, Diagnostic, Predictive, and Prescriptive. Now, with large sets of data there can become a sense of “what can we do with so much data”… a kind of data overload, to put it simply. This is also referred to as the “Insights to Action Gap.” We will look a bit more at that gap in the next section so let’s drill down into analytics a bit first.

It can be conceptualized that with the nearly ubiquitous nature of the IoT and other data stores and potential opportunities, it will be perhaps peculiar to AECO that ALL types of data will need to be used and analysed in order to generate truly great designs. If we can agree that it is useful to have staff that are “really” expert at planning and designing a given building type, then it naturally follows suit that having AECO Big Data, AI, ML (and the like) Systems and experts in place will create better design and more business opportunities. Plus, Big Data, AI, and ML will (yes, I say will) help us reduce the amount of waste that AEC is still so adept at creating.

Business Intelligence (BI) is a sub-field that AEC is beginning to adopt and the faster the industry can take on these emerging technologies, the sooner we can drive the gains that the future will demand for businesses to remain competitive.

If we (AEC) can collaborate transparently (large firms AND small) to develop the necessary algorithms and “standardized” processes needed, the seemingly insurmountable Big Data integration can be rendered surmountable. If it’s only the hugest firms and the software developers, we might just re-visit those fears ;)

One of the tools now used with useful results is Power-BI, a Microsoft offering that can be used to aggregate data from many sources and present it in visual ways, allowing quick insight and response. Model health dashboards are but a beginning; the as-yet-unknown uses hold even greater promise. Data capture and leveraging is something AEC firms large and small want to develop openly if we are to reach the highest results. 

04 | Developing Insights into Actions

Okay, so once we get the data, whether Big or Small, we must ascertain what actions to take and when. This part is where the human factor plays a large part, where the rubber hits the road, as it were. The data and analysis will give us the potential “whats,” “whys,” and “whens,” but the final decisions, choices, and actions are ours to implement.
“Today, and certainly here, we look at the business, intelligence, decision and value/opportunity perspective. From volume to value (what data do we need to create which benefit) and from chaos to mining and meaning, putting the emphasis on data analytics, insights and action.”  

Machine Learning in the Mix 

Some of the projected top use-cases for Machine Learning in Data Science are below, and although AEC is not specified directly, let me ask one question: Which one of these foci do not utilize Architects, Engineers, and Construction? Correct… None. 

Machine Learning “process includes paths for descriptive, predictive, and prescriptive analysis, as well as simulation. Importantly, the machine learning process is explicitly noted as recursive, which is perhaps especially true of modeling large quantities of data.” 

As a primer to Big Data, using small data is one possible route. This will get one’s feet wet as to what may be useful, the prioritization of usefulness, etc. Aggregating and managing (and using) our model data like this can seem like a lot of data (and it is) but it’s just not big enough to be called Big Data. Now as we gain more input, more models, etc., we can tie that all into the other parts of Big Data, like sensors, usage patterns, etc. Go get a glimpse of a Flux.IO ( dashboard... that is a great start in visualizing one’s building data. Check them out :)

Though AEC is a bit late in adopting technologies industry-wide, even if Big Data adoption may be more lacking, there is still time to keep from being technology laggards, holding our industry back. Construction as well as the larger A/E firms are already testing and developing systems. Without at least becoming aware and then adept at using Big Data, the smaller firms and/or those without a path to the connected, Big Data future may just want to revisit those fears we spoke of earlier.
There is a time not very far off when firms will all be using Big Data, AI, and ML to refine their practices’ offerings. Otherwise we will run the risk of closing our doors or be swallowed up by those who can necessarily adapt to the ever-changing technology front. One or a few key hires, like an actual Data Architect (no relation;) or a Data Scientist can do wonders… either hired outright or in a consultative manner (to start, at least).

“As with other industries, an obstacle in construction is that much of the data which has been collected until now is effectively siloed – held in isolation by the business department or division which collected it, where it is useful for their own analytics but can’t contribute to the big picture, birds’-eye view needed for real Big Data analytics.”  

To leave off, the use of Big Data will be a mainstay of AEC, as it is with more and more in our societies. By combining human creativity with all possible systems: live sensors, models, futures, construction, and operations data sets; we can not only connect and report the status of components and systems, but also will be able to use predictive analytics to enhance design ideas better than any human alone can… or does currently.

05 | References 

(in addition to the links throughout this article)

Appears in these Categories