In part 1 of this post, we considered the features that differentiate Kafka from other messaging products. In Part 2, let’s take a closer look at these features along with Kafka’s role in the IoT (internet of things) world.
Useful Features for IoT
Kafka is designed to handle fast data ingestion at scale. It has the flexibility to open data up to diverse tools and use cases, including real-time streaming workloads. Thus, Kafka can be integrated with stream-processing frameworks such as Storm, Samza, or Spark Streaming to provide in-flight transformations and processing. When integrated with the Hadoop ecosystem, it can be used in big data types of use cases. Its unique design makes it a choice solution for a wide range of architectural challenges, including the ingestion of new data that is streaming in from web, mobile, and social applications.
Benefits vs. Limitations
Though there are several good features, there are a few limitations of Kafka that need to be considered before using it in production systems. Some of them are listed below.
- Kafka does not currently support reducing the number of partitions for a topic.
- Kafka runs well only on Linux and Solaris systems. Windows is not supported as of now.
- There is no support for bidirectional messages, which would have been an advantage in the IoT world: for example, sending commands to the device in response to a ping message from it. However, this can be overcome by treating commands as messages and allowing the device to consume them on a topic (if the device has that capability).
- Kafka maintains order within a partition of a topic, not between different partitions of a topic.
Kafka in the IoT World
In the IoT world, there is massive amount of data coming from “things” (devices and sensors), which send data in real-time. There are three major steps in processing voluminous sensor data at scale: data ingestion, data storage and data analytics. A big concern is how these massive amounts of data coming from all of the IoT systems can be stored and analyzed in a timely manner, without losing any information.
Basically, at each layer, there is a need to have some kind of broker that maintains a balance between the data produced and data consumed. For example, if there are millions of devices sending data at regular intervals (which could be even sub-second intervals) to an IoT platform, the platform needs to buffer the data before it can process it. If there are multiple consumers for the device data, how do we connect the data to the right channel for the right kind of analysis?
In fact, there could be multiple consumers needing the same data for a variety of purposes. In some sense, this data can be treated as similar to the enormous event data from social networking sites and e-commerce sites where the data from multiple places is received at high velocity and volume to be processed and analyzed.
Kafka’s special features discussed above make it useful for IoT applications. IoT applications will find Kafka a very important part of architecture due to the performance and distribution guarantees that it provides. Along these lines, Kafka will also be useful for sending data from devices that can be aggregated in the form of files, or data being sent at a very high frequency. In this case, the producers could be device agents sending data at high frequencies, or device gateways/ aggregators aggregating and collecting data from multiple devices in the form of files. The consumers need not consume the messages at the same rate at which the data is published. Also, real-time pipelines can be created by using Kafka in association with stream-processing engines to process the data as it is coming from the device/gateway.
An important point to mention here is that there is a whole list of tools that integrate with Kafka outside the main distribution as mentioned earlier. This will allow the IoT applications to be built easily, and data from variety of devices visualized from multiple angles. In combination with data from other sources like web, mobile, and social media, this can be used for descriptive, predictive and prescriptive recommendation for connected and unconnected devices, something many organizations are aiming for in the current IoT landscape.
When Do We Go for Kafka as a Messaging Solution?
To answer this question, we must assess our requirements and the problem that needs solving:
When to go for Kafka:
- If the producers can accumulate data in memory and send out larger batches in a single request
- If you are not so concerned about latency of individual sensory data but want to capture data from multiple devices to reach the destination in a defined amount of time
- If your consumer is not able to consume the data as rapidly as the producer is sending data
When Kafka is not recommended:
- If the messages from devices are small in size, coming at a high frequency and have no way of aggregating at the source
- If your consumer is not savvy enough to manage the responsibility of checking what message to consume next, and the last message it had consumed before it had gone down
The reason Kafka is able to manage high throughput is because the brokers have been made lightweight. With Kafka, a lot of responsibilities have been imposed on the producers and consumers. So if the producers and consumers do not fit the criteria imposed by Kafka, it is better to go for another well-known message server.
In conclusion, Kafka is not a general-purpose solution for all messaging requirements. It has been built for solving specific problems, and can be used if your solution also has similar requirements as discussed above.
We at the IoT group in Wipro have used Kafka for a couple of use cases including data ingestion from devices and stream handling, and have found it advantageous over the conventional messaging systems.
To learn more about Kafka, you can visit these sites: