How can you efficiently process real-time streaming data?

How can you efficiently process real-time streaming data? One of our customers was looking for a robust solution to process incoming stream of data from various IoT enabled devices (for example, a Windmill). We needed a highly scalable, real-time streaming data processor and a way to transform this data real-time and store the transformed data in NoSQL Database for further usage and analysis. We designed the solution using AWS Kinesis Firehose (Streaming data Ingestion), AWS Kinesis Analytics (Stream processor) and AWS Lambda (send transformed data to downstream systems such as Database or Analytics engine like QuickSight).

Traditionally, we run native SQL queries against static data in a relational DB system where rows get added, deleted and modified with respect to time. The whole paradigm shifts when you have an incoming data stream constantly changing data. AWS Kinesis Analytics allows us to run simple SQL queries on such streaming data. Important fact to consider here is that data keeps getting added at a rapid pace and therefore AWS allows us to run our SQL queries on time segregated chunks of streamed data called as Processing “Windows”.

We decided to build our solution as in below diagram using Kinesis Analytics (transformation of data in transit) and Lambda in Post-processing for sending transformed data to downstream services such as DynamoDB (for further analysis on data at Rest), Amazon S3 (raw data backup), QuickSight (Analytics and Dashboards). Kinesis analytics also supports pre-processing via Lambda Functions in case we want to prepare the incoming data or perform any data enrichment operations to add additional metadata which cannot be added while generating the original Data from IoT devices.


All the IoT enabled devices (like Raspberry Pi’s acting as Gateways and various sensors connected to it within the windmill) are generating time series data and with the help of AWS IOT Core and Greengrass are sending such data to cloud via Kinesis Firehose stream. Kinesis Firehose is a highly scalable fully manage service to deliver real-time streaming data to destinations such as S3, Redshift, Amazon Elastic Search. Here we are storing the data on S3 as destination for historical raw data backup.

Kinesis Data Analytics takes such streamed data and executes SQL queries to transform data to produce new output and to detect any anomalies. Some examples are below:

  • Given the Wind Speed, calculate approximate energy generated by the Wind Turbine.
  • Detect anomalies and take action on the alert if there is a huge gap between Calculated and Actual energy generated by the Wind Turbine.
  • Alert the Admin if the Wind Speed exceeds the threshold beyond which the Wind Turbine rotors cannot handle and would be damaged if continued to operate.
  • Calculate temperature of various components like Nacelle temperature, Gear Box temperature and act if it goes beyond threshold.
  • Monitor the relative speed of High and Low speed shafts for any wear and tear or any anomalies in them.

The transformed data is then sent to Lambda. Here original data will now have additional columns or gets filtered to ignore all noise and only output meaningful data on which we can act (alerts if any) or can be used for analytics (time series data depicting health of the machine). Lambda then stores the data in DynamoDB and can be sent to QuickSight for generating easily understandable Dashboards and Graphs.

  • DynamoDB has Streams enabled which will trigger a Lambda if a Rule or criteria matches. Alerts are generated at this point when data is stored in DynamoDB and admin can be notified via SNS-Email in case of any situation requiring human intervention.
  • Such alerts can also trigger a corrective action to the IOT gateway via Greengrass such as turning off the Rotor if wind speed exceeds operational capacity.


With the help of AWS services, we were able to build a resilient system for predictive and prescriptive maintenance of Wind Mill. Kinesis Stream, Firehose and Kinesis Data Analytics are very powerful services when combined with other services offered by AWS and can transform real-time domains like Air Traffic Control (ATC’s), IoT monitoring and maintenance, Industrial IoT solutions, Smart Home/Smart Societies, Website monitoring and Clickstream generation and processing for analytics.

These services grant us the super power to act upon anomalies and critical situations instantly with a fast response time thereby avoiding casualties or further damage in the system while reducing the maintenance cost with almost zero to Nil downtime.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.