Big data continues to grow bigger every second. The internet-of-things (IOT) or the rapidly expanding network of devices and sensors, from smart devices worn by people to industrial sensors, is a major contributor to that growth.
OpenGov spoke to Mr. John Kreisa, Vice President for international marketing at Hortonworks about capturing and transmitting data securely and analysing it to generate actionable insights. He shared several big data use cases from the energy sector and government.
Founded in 2011, Hortonworks is a leading provider of big data platforms for capturing, storing, processing and analysing data. Mr. Kreisa said, “Our core DNA is all around big data technologies and bringing big data technologies to the market.” Hortonworks has two platforms. One is based on an open source project called Apache Hadoop, which is used for storing and analysing data at massive scale.
The second platform, Hortonworks DataFlow (HDF) is for capturing data in motion, data from remote locations, from sensors and devices. To bring that data safely and securely back to the core platform for processing and analysing. Hortonworks develops those technologies in open source and provides services, support and training around those.
Challenges in transmission of data from IOT devices
Picking up on the second platform mentioned by Mr. Kreisa, we asked about the challenges faced in transmission of transmission of data from IOT devices.
He replied that the ability to handle large volumes of data and to process the data as it’s in motion, to extract timely actionable insights as it’s coming back is critical.
Mr. Kreisa explained with an example, “If you are talking about energy, it might be some kind of an indicator of a potential problem in an energy transmission network. You have got a sensor out on a tower somewhere and it detects an anomaly. That might come over analysing thousands of signals and looking for a pattern, or it might be a spike. Either way, you need to be able to react to that in an actionable way.”
The second challenge is securing the data, bringing the data back in a manner that nobody else can have access to that data and there is full understanding of provenance and governance of that data.
“So that when it comes back to you, you can say that it was not accessed, it was not changed and if it had to be analysed, you know who did it and what were the circumstances around that,” said Mr. Kreisa.
The third challenge is the requirement of backward communication from the devices at the edge. With limited connectivity to the Internet, or interrupted network coverage, the ability to communicate back might be a challenge.
So, the remote device that’s capturing data needs to have the ability to understand the connectivity. It needs to have the ability to collect the insights and transmit the data back when the connectivity is available. If bandwidth is low, it should only transmit the most important information back.
The software on the device has to be intelligent enough to do all this. In addition, these devices often have to run at very low power. So, the software itself has to run on a very small footprint.
This open source technology that Hotonworks runs is based on an Apache project called Apache NiFi. It handles the transmisison. Mr. Kreisa said, “Apache NiFi runs on a small footprint, it has the smarts. But we have created an even smaller footprint called MiNiFi. It can run on devices as small as a handheld.”
Smart meters and big data in the energy sector
We moved on to discuss a very important use case in the form of smart meters. Smart meters are a critical component of energy saving strategies in many countries around the world today.
A typical regular meter would take one reading a month, saying you have consumed this many kilowatt hours (kWh) of energy this month. Smart meters could be reading data every few minutes.
“And you are not just measuring the power consumption. You are also looking at things like the quality of the power, what were the high and low pieces of consumption within that 10-minute period. You are transmitting not just one, but 6 or 7 pieces of data at much higher frequency,” Mr. Kreisa said trying to provide an idea of the volumes and velocity of data involved.
Government, or private industry need massive scale-up of ability to analyse and store that historical data. But they can use it to get better insights into power consumption. They can do a better job of designing power generation plants. Government which regulates power generation plants and sometimes runs them, can do a better job of projecting and predicting power consumption.
Mr. Kreisa talked about an interesting London-based company called Open Energi. They have what they call a virtual power station. They work both with the UK government and the National Power Grid. They are connected to major industrial and commercial power consumers. They put smart devices on those consumption mechanisms and they monitor power consumption. As consumption starts to get close to power generation capacity they can go and remotely turn those motors and pumps off for a little bit of time, which has no impact on the business. Suppose you have a refrigeration unit. They can turn that motor off for 5 minutes without affecting the temperature of the refrigerator at all.
Mr. Kreisa added, “As consumption comes up, there’s no need to create more power. They can actually sort of slow the consumption curve on the major industrial consumers of energy. The energy providers don’t have to fire up a coal plant to try to meet a spike in demand.”
The dynamic demand technology automatically and invisibly adjusts power consumption to help manage fluctuations in electricity supply and demand. And it is driven by big data.
Mr. Kreisa shared another use case of customer-oriented application of big data in the energy sector. Centrica, the largest supplier of gas to domestic customers in the UK, uses platforms from Hortonworks in two different ways. One is to drive operational efficiencies across the company.
Then they have also created a customer application, so that when their technicians go to a customer, they have a complete view of all of the interactions with the customers, including payment and complaints. It helps the technician to do a better job when they go on a service call.
In addition, they have created a consumer dashboard where the consumer can see at a fine level of detail how much power they are consuming. They found was that by giving the consumer more visibility into the power consumption, they naturally reacted to reduce the consumption. “They see the spikes and lows, they know that they might be using power at a more expensive time. It might be better if I run the washing machine late at night,” Mr. Kreisa said.
And all this is enabled because they can capture it in a big data platform. Centrica is able to define correlations between data sets across the different business units that were previously isolated and identify new data patterns to provide a better customer experience.
Use cases abound across public sector, from defence/ intelligence to transportation/ healthcare
Mr. Kreisa said that nearly every branch of government has the opportunity to exploit big data. He talked about working with Metro Transit of St. Louis, which operates the public transportation system for the St. Louis metropolitan region in the USA. There the challenge was to provide more safe, reliable and accessible transport, while reducing costs. The two objectives were somewhat at odds with each other.
A lack of ongoing, reliable machine data from its buses forced a trade-off between the two. Mr. Kreisa elaborated, “Based on say every 10,000 miles, you would replace the brakes, regardless of how much they had been used. If one part broke in one bus, they might replace it in all the buses. Because their one initiative was safer, more efficient network.” But that would lead to rising costs.
With the platform, they were able to capture all the data from the buses and do predictive and proactive maintenance. They were replacing the parts only on those buses which fit a predictable pattern where those particular parts might fail. Thus, the government was able to reduce the operational costs and making the network more efficient.
Mr. Kreisa shared another use case from Barcelona, Spain. The government has a bike sharing program, where they were facing challenges in placing the bikes and incurring heavy costs in moving them. The government used Hadoop technology to capture bike usage patterns and monitor the entire network of bike share to improve efficiency of the sharing network and increase ridership.
Mr. Kreisa said that the initial use cases for government, and for industry in general are around renovating the existing architecture and driving efficiencies. Usage could include keeping more data online for longer as active archive or combining existing data from multiple sources.
Then they use the cost savings to go into some of the more innovative things, such as predictive applications. They could also provide data back to citizens (open data) so that they might use it and build on it.
Ultimately, it is about the government using the data smartly to improve the lives of citizens.