fbpx

AWS Kinesis Data Analytics Stream Processing

AWS Kinesis Data Analytics Stream Processing

Analytics play a huge role in gathering insights from large data sets from various sources like Application Logs, User Input and more specifically machinery edge data in an IOT use case. Stored data sets can be processed on demand or at runtime while generating any user-friendly reports or dynamic dashboards. But it becomes challenging when there is large amount of streaming data that needs to be processed / transformed real-time with millisecond or lesser latency. If this latency is not taken care of, we would miss important events in the incoming stream thereby degrading the final analytical significance.

At PROLIM, there was a requirement of a real-time stream processor to collect incoming data with least possible latency, process it and store it in MySQL and Dynamo-DB for further use. We opted for below configuration using AWS Kinesis Analytics as Stream processor and Lambda as destination processor for our Windmill IoT use case.

  • Windmill Farm has many IoT sensors which captures the health of each windmill and sends data as payload to AWS IoT Core via MQTT protocol.
  • IoT Core pushes this data to Kinesis Data Firehose which is a fully managed auto scalable service which can also batch, compress, transform, and encrypt the data before loading it to S3, Data Lake or other AWS services.
  • Our IoT real-time data is reliably collected by Firehose and pushed to Kinesis Data Analytics where it’s processed and transformed.

The windmill sends values for wind speed, actual power generated and temperatures of various internal components like Gear Box, Rotor and Ambient temperature as time-series data every 15 seconds.

We have two in-application streams and SQL is used to transform or process data as follows:

In Application Stream 1

  • We are using wind speed of each data point to calculate the expected energy using below formula.
  • Compare actual Energy with expected energy to check if there is a large difference between them. Large Difference indicates inefficient storage of generated power or leakage.

Expected Energy = 0.5 * P * A * (windspeed ^ 3) * cP

P = air density
A = swept area of turbine
cP = co-efficient of Power

In Application Stream 2

  • We are comparing each value of wind speed, blade speed, ambient temperature, rotor temperature, gearbox temperature against a reference threshold and generate alerts if the values are outside the expected range.
  • We are using two different levels for alerts. If the values are in Warning range (windmill can sustain for a small amount of time) or critical (immediate action is required)
  • Output Stream data from Kinesis Analytics is sent to two different Lambda based on their functionality.
    • One Lambda is used to enhance (decimal formatting, data sanitization etc..,) each output record and further store them in Dynamo DB where windmill id and timestamp are acting as Partition and Sort Key respectively. Since these are time series data, in order to fetch easily with least latency timestamp should be used as partition or sort key.
    • Another Lambda is used to enhance data and store alerts (either critical or warning) in a AWS RDS MySQL Database along with windmill id and timestamp.
    • If there are constant alerts and in critical zone, an E-mail will be triggered to the windmill admin with windmill details and data range and a message action can be taken immediately.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu