Skip to main content
Message Protocols

Title 2: The Evolution of MQTT: From SCADA to the Internet of Things (IoT)

MQTT (Message Queuing Telemetry Transport) began life in the late 1990s as a proprietary protocol for SCADA systems monitoring oil and gas pipelines. Its inventors at IBM and Arcom needed something that could survive satellite links with high latency and intermittent connectivity. The result was a lightweight publish-subscribe protocol with a tiny header (2 bytes minimum) and three quality-of-service levels. Fast forward to today, and MQTT powers everything from smart light bulbs to industrial sensor arrays. But that journey from SCADA to IoT wasn't automatic—it required changes to the spec (MQTT 3.1.1 became an OASIS standard in 2014, and MQTT 5.0 added features like session expiry and user properties) and a shift in how developers think about message delivery. This guide is for engineers and architects who are evaluating MQTT for a new project or troubleshooting an existing deployment.

MQTT (Message Queuing Telemetry Transport) began life in the late 1990s as a proprietary protocol for SCADA systems monitoring oil and gas pipelines. Its inventors at IBM and Arcom needed something that could survive satellite links with high latency and intermittent connectivity. The result was a lightweight publish-subscribe protocol with a tiny header (2 bytes minimum) and three quality-of-service levels. Fast forward to today, and MQTT powers everything from smart light bulbs to industrial sensor arrays. But that journey from SCADA to IoT wasn't automatic—it required changes to the spec (MQTT 3.1.1 became an OASIS standard in 2014, and MQTT 5.0 added features like session expiry and user properties) and a shift in how developers think about message delivery.

This guide is for engineers and architects who are evaluating MQTT for a new project or troubleshooting an existing deployment. We'll cover the core mechanism, patterns that work, anti-patterns that fail, and the long-term costs that often catch teams off guard. By the end, you'll have a clear framework for deciding when MQTT is the right tool and when you're better off with something else.

Where MQTT Shows Up in Real Work

MQTT's strongest use case is constrained devices sending small, frequent messages over unreliable networks. Think temperature sensors in a greenhouse that report every 30 seconds over a cellular modem with variable signal strength. Or a fleet of delivery trucks sending GPS coordinates and engine diagnostics over satellite links that drop out in tunnels. In both cases, MQTT's persistent session feature lets the broker queue messages for offline clients, and its QoS levels let the sender choose between fire-and-forget (QoS 0), at-least-once delivery (QoS 1), and exactly-once delivery (QoS 2).

But MQTT isn't just for low-power devices. It's also used in backend systems for event-driven microservices, where multiple services need to react to the same event without polling. For example, an e-commerce platform might publish an 'order.placed' message to a topic that both the inventory service and the shipping service subscribe to. This decouples the services and reduces latency compared to HTTP polling. However, MQTT's pub-sub model introduces broker dependency—if the broker goes down, all communication stops. That's a trade-off many teams don't consider until it's too late.

Common Deployment Patterns

In practice, we see three main deployment patterns: single-broker for small teams (up to a few thousand clients), bridge-connected brokers for multi-site setups (e.g., factories in different regions), and clustered brokers for high availability (using shared state or load balancers). Each pattern has its own cost and complexity. Single-broker is simplest but is a single point of failure. Bridged brokers add resilience but introduce latency and message duplication risks. Clusters offer the best uptime but require careful configuration of topic trees and shared subscriptions.

Another common scenario is using MQTT over WebSockets to connect browser-based dashboards to backend data streams. This works well for real-time monitoring but adds the overhead of the WebSocket handshake and frame encoding. For most dashboards, the trade-off is acceptable because the browser gains access to pub-sub semantics without custom protocols.

We've also seen MQTT used in edge computing setups where a local broker collects sensor data and forwards summaries to a cloud broker. This reduces bandwidth costs and allows local processing even when the cloud connection is down. The edge broker can buffer messages and replay them when the link is restored, thanks to MQTT's persistent session and will message features.

Foundations That Readers Often Confuse

One of the most common misunderstandings about MQTT is that it guarantees delivery. It doesn't—at least not in the way most developers expect. QoS 1 ensures the message is delivered at least once, but duplicates are possible. QoS 2 ensures exactly-once delivery, but it requires a four-way handshake that adds latency and overhead. QoS 0 is fire-and-forget with no guarantee. The choice of QoS affects not just reliability but also broker memory usage and network throughput. Many teams default to QoS 1 for everything, only to discover that their broker runs out of memory because it's storing message IDs for every unacknowledged message.

Another confusion point is the topic tree and wildcards. MQTT topics are hierarchical (e.g., 'factory/floor1/temperature') and support two wildcards: '+' matches one level, '#' matches all remaining levels. This is powerful but easy to misuse. For example, subscribing to 'factory/#' on a broker with thousands of devices will flood the subscriber with every message, including those it doesn't need. A better pattern is to subscribe to specific subtopics and use a separate topic for control messages (e.g., 'factory/floor1/control') to avoid mixing data and commands.

Retained Messages and Will Messages

Retained messages are another feature that trips people up. A retained message is the last message published on a topic, which the broker stores and sends to any new subscriber. This is useful for stateful sensors (e.g., a door sensor that reports 'open' or 'closed'), but it can cause confusion if the retained message is stale or if multiple publishers write to the same topic. We've seen cases where a sensor goes offline and the retained message still shows 'normal' even though the device is dead. The fix is to use will messages (a message published automatically by the broker when a client disconnects unexpectedly) to set a 'offline' or 'error' state.

Will messages themselves are often misunderstood. They are part of the client's last will and testament (LWT) configuration, which includes a topic, payload, QoS, and retain flag. When the broker detects an unexpected disconnection (e.g., network timeout), it publishes the will message. But if the client disconnects cleanly (sends a DISCONNECT packet), the will message is not published. This distinction is critical for building reliable systems—teams sometimes rely on will messages as a heartbeat, only to find that they don't fire when the client shuts down gracefully.

Patterns That Usually Work

After years of field experience, several patterns have proven reliable across industries. The first is to use a topic namespace that mirrors your physical or logical hierarchy. For example, 'sites/{site_id}/devices/{device_id}/sensors/{sensor_type}'. This makes it easy to grant permissions (via ACLs) and to subscribe to subsets of data. The second pattern is to separate data topics from command topics. Data topics carry sensor readings and are often published at high frequency; command topics carry control instructions and are published infrequently. This separation prevents command messages from being delayed by a flood of sensor data.

The third pattern is to use a dedicated broker per environment (dev, staging, prod) and to never share a broker between unrelated projects. Shared brokers lead to topic pollution, where one team's wildcard subscription accidentally consumes another team's messages. It also makes debugging harder because message flows are interleaved. If you must share a broker, use a unique prefix for each project's topics (e.g., 'projectX/…') and enforce it with ACLs.

Connection Management

For client connections, we recommend using persistent sessions with clean session = false. This tells the broker to store subscriptions and missed messages for the client when it disconnects. Combined with a short keepalive interval (e.g., 30 seconds), this allows clients to survive brief network outages without losing messages. However, persistent sessions consume broker memory, so you need to monitor the number of sessions and set a maximum. MQTT 5.0 introduced session expiry, which lets the broker clean up sessions after a configurable period.

Another reliable pattern is to use TLS for all connections, even on internal networks. MQTT brokers often handle sensitive data (e.g., medical device readings, industrial control commands), and plaintext MQTT is vulnerable to eavesdropping and injection. TLS adds a handshake overhead (typically 1–2 round trips), but for most IoT devices, this is acceptable. If your devices are extremely constrained (e.g., Cortex-M0 with limited flash), you might use DTLS over UDP or a custom encryption layer, but that's rare.

Anti-Patterns and Why Teams Revert

One of the most common anti-patterns is using MQTT for request-response interactions. MQTT is inherently asynchronous and pub-sub, so mimicking HTTP-style request-response requires correlation IDs and response topics. This works but adds complexity and latency. Teams that try this often end up with a tangled web of topics and timeouts, and eventually revert to HTTP or gRPC for synchronous calls. A better approach is to use MQTT for event notification and a separate protocol (e.g., HTTP or CoAP) for synchronous queries.

Another anti-pattern is overloading topics with too much data. MQTT payloads are binary and can be up to 256 MB in MQTT 5.0 (though practical limits are much lower). Sending large payloads (e.g., full images or logs) over MQTT defeats its lightweight purpose and can cause broker backpressure. Teams that do this often see increased latency and dropped messages, and eventually move large payloads to a separate file transfer protocol (e.g., HTTP upload or S3).

Ignoring Broker Capacity

Many teams underestimate broker capacity requirements. A single Mosquitto or EMQX instance can handle tens of thousands of clients, but only if the topic tree is flat and message rates are moderate. With deep topic hierarchies and high-frequency publishing (e.g., 100 messages per second per client), the broker's CPU and memory usage can spike. We've seen teams deploy a single broker for a smart building project with 10,000 sensors publishing every 5 seconds, only to have the broker crash after a few hours. The fix was to shard the sensors across multiple brokers using a consistent hash on device ID.

Another anti-pattern is using QoS 2 for all messages because 'it's the safest'. QoS 2 messages require the broker to store message IDs and track the four-way handshake, which consumes memory and CPU. For most sensor data, QoS 1 is sufficient because duplicates can be handled at the application layer (e.g., using sequence numbers). QoS 2 should be reserved for critical messages like financial transactions or emergency alerts where duplicates are unacceptable.

Maintenance, Drift, and Long-Term Costs

MQTT systems drift over time in ways that increase operational cost. Topic namespaces that start clean often become cluttered as new devices are added without following the naming convention. For example, a team might add a 'temperature' topic under 'sensors' instead of 'sites/{site_id}/devices/{device_id}/sensors/temperature'. This makes it hard to write ACLs and to debug message flows. Regular topic audits (every quarter) help, but many teams skip them until something breaks.

Broker configuration also drifts. Default settings like max_connections, max_packet_size, and persistence location are often fine for small deployments but become bottlenecks as the system grows. We've seen brokers run out of disk space because the persistent store (for QoS 1/2 messages) wasn't capped. MQTT 5.0's message expiry and session expiry help, but they need to be configured proactively. Another long-term cost is certificate management for TLS. If you use self-signed certificates, you need a process to rotate them before they expire. Many teams forget, and then clients can't connect.

Monitoring and Alerting

Most MQTT brokers expose metrics via a management interface (e.g., Mosquitto's $SYS topics, EMQX's HTTP API). Common metrics to monitor include: number of connected clients, message rate, dropped messages, session store size, and heap usage. Without monitoring, you won't know that your broker is about to run out of memory until clients start disconnecting. We recommend setting up alerts for when the session store exceeds 80% of available RAM, and when the message rate exceeds 70% of the broker's tested capacity.

Another maintenance cost is updating broker software. MQTT brokers receive security patches and feature updates, but upgrading can break client compatibility (e.g., MQTT 3.1.1 vs 5.0). You need a testing pipeline that verifies all client libraries against the new broker version before deploying. This is often overlooked in small teams, leading to production outages after a 'simple' upgrade.

When Not to Use MQTT

MQTT is not a universal messaging solution. There are several scenarios where it's a poor fit. First, if you need real-time control with sub-millisecond latency, MQTT's broker-based architecture adds unavoidable delay. For applications like robot arm control or high-frequency trading, a direct TCP or UDP socket with a custom protocol is better. Second, if your messages are large (e.g., >1 MB), MQTT's binary payload model becomes inefficient because the broker has to buffer and forward the entire message. Use HTTP, WebSocket, or a file transfer protocol instead.

Third, if you have a small number of clients (e.g., <10) that communicate synchronously, MQTT adds unnecessary complexity. A simple REST API or WebSocket connection is easier to implement and debug. Fourth, if your network is highly reliable and low-latency (e.g., a data center), MQTT's QoS features add overhead without benefit. HTTP/2 or gRPC with streaming may be more efficient.

Alternatives to Consider

For IoT scenarios, CoAP (Constrained Application Protocol) is a common alternative. CoAP runs over UDP, has a RESTful design, and supports multicast. It's lighter than MQTT for very constrained devices (e.g., Class 0 devices with <10 KB RAM), but it lacks MQTT's pub-sub model and QoS levels. For event streaming in backend systems, Apache Kafka or RabbitMQ are better suited because they support partitioning, replay, and long-term storage. MQTT's broker is not designed for message persistence beyond the session store.

Another alternative is AMQP (Advanced Message Queuing Protocol), which offers more robust queuing features (e.g., message routing, transactions) but is heavier and less suited for constrained devices. The choice between MQTT and AMQP often comes down to device constraints: if your clients are sensors with limited battery and CPU, MQTT wins; if your clients are backend services that need complex routing, AMQP wins.

Open Questions and FAQ

Is MQTT secure enough for production?

MQTT with TLS and authentication (username/password or client certificates) is secure for most use cases. However, MQTT does not define end-to-end encryption—the broker can read all messages. If you need end-to-end encryption (e.g., for medical data), you must encrypt the payload before publishing. Also, MQTT's ACLs are coarse-grained (per topic pattern), so fine-grained access control (e.g., per user per device) requires custom broker plugins or a proxy.

Can MQTT replace Kafka for event streaming?

No. MQTT is designed for device-to-broker communication with small messages and transient sessions. Kafka is designed for high-throughput event streaming with long-term storage and replay. Some architectures use MQTT at the edge and bridge to Kafka for backend processing, which combines the strengths of both.

What's the difference between MQTT 3.1.1 and 5.0?

MQTT 5.0 added several features: session expiry, message expiry, user properties (custom metadata), reason codes in acknowledgments, and shared subscriptions (for load balancing across subscribers). It also improved error handling and flow control. Most new projects should use MQTT 5.0 if their broker and client libraries support it.

How do I choose a broker?

Popular open-source brokers include Mosquitto (lightweight, C-based), EMQX (scalable, Erlang-based), and VerneMQ (clustered, Erlang-based). Commercial options include HiveMQ and AWS IoT Core. The choice depends on your scale, feature needs (e.g., clustering, bridging, plugin support), and budget. For small deployments (<1,000 clients), Mosquitto is often sufficient. For large-scale IoT (100,000+ clients), EMQX or HiveMQ are better.

What's the best way to handle backpressure?

MQTT has limited backpressure mechanisms. The broker can slow down publishing by using QoS 2 flow control, but the client must handle the acknowledgment delay. A better approach is to use a message queue at the subscriber side (e.g., a buffer that writes to disk) and to monitor queue depth. If the subscriber can't keep up, you may need to scale horizontally (more subscribers) or reduce publish rate.

Summary and Next Experiments

MQTT evolved from a SCADA protocol to an IoT standard because its design—small headers, pub-sub, QoS, and persistent sessions—solves real problems in constrained, unreliable networks. But it's not a silver bullet. The key takeaways are:

  • Use MQTT for small, frequent messages from many devices over unreliable networks.
  • Choose QoS levels carefully: QoS 0 for telemetry where loss is tolerable, QoS 1 for most data, QoS 2 only for critical messages.
  • Design your topic namespace early and enforce it with ACLs.
  • Monitor broker metrics (connections, message rate, session store) and set alerts.
  • Consider alternatives (CoAP, HTTP, Kafka) when your use case doesn't fit MQTT's strengths.

For your next experiment, try setting up a Mosquitto broker with MQTT 5.0 and a few ESP32 sensors publishing temperature data. Use QoS 1 with a 30-second keepalive and a retained message for the latest reading. Then add a subscriber that logs all messages to a file. Once that works, introduce a network disruption (e.g., unplug the sensor) and observe how the broker queues messages and replays them when the sensor reconnects. This hands-on test will solidify the concepts in this guide and reveal any gaps in your understanding.

Share this article:

Comments (0)

No comments yet. Be the first to comment!