SignalR Scaleout Using Service Fabric Actor Events

SignalR Scaleout Using Service Fabric Actor Events

Overview

ASP.NET Core SignalR provides an abstraction over websocket connections, making it very easy to get up and running with "real-time" pub/sub functionality. When using SignalR within a Service Fabric application, it's likely there will be multiple stateless service instances that a client could have a websocket connection with.

This means that when any service in our application publishes events intended for clients, we need a mechanism to distribute these events across each SignalR service instance to ensure all subscribed clients receive the event.

Backplanes

The easiest method of scaling SignalR across multiple servers is to use a backplane. Backplanes broadcast messages to each connected SignalR service, making them unsuitable for some situations, with the potential to become a bottleneck in high traffic environments.

In scenarios that generate high-frequency or user-specific events, it would be a waste of resources to use a backplane that broadcasts every event to every instance of our SignalR service.

In these scenarios, publishers should only send messages to the appropriate connected clients that have subscribed to that particular topic.  To allow scalability & availability while avoiding the overhead of a backplane, we need to broker events only between publisher and subscribers.

Reliable Actors And Topics

Service Fabric Actors and Actor Events provide a ready-made pub/sub event system that can be used between Actors & services.  Importantly, Actor Events supports the publishing of events to specific stateless service instances.  This is very useful in scenarios where SignalR is hosted in a stateless ASP.NET Core service.

The Actor model aligns nicely with pub/sub scenarios, as each Actor instance can be used to represent a specific topic.  For example, we might be building a chat system, where each user is represented by an Actor instance.

Services can subscribe to specific topics by subscribing to events on a specific Actor instance.  This gives us the ability to publish targeted events to specific service instances, rather than using a backplane.

This also keeps our network calls within our Service Fabric cluster, negating the requirement for an external backplane resource with associated cost.

Actor Services & Actor Events provide an internal pub/sub event system that can be used to target specific stateless service instances.

Implementation

Topic Client

In order to leverage Actor Events as a pub/sub system for SignalR, we need to orchestrate subscriptions and events between clients and Actors. We can wrap up this functionality into a TopicClient class, that can:

  • Subscribe to Topic Actors and persist actor proxies
  • Map subscribed client connection ids to the appropriate proxies
  • Receive Topic Actor events & forward to relevant clients using IHubContext
  • Unsubscribe from and remove unused Actor proxies

When a Topic Actor publishes an event, any connected TopicClient instances will receive the event.  The TopicClient uses the persisted connection ids and HubContext to forward the message on to the appropriate clients.

The TopicClient will not create multiple subscriptions to the same Actor, therefore multiple clients subscribed to the same topic, on the same service instance will be served by a shared actor proxy.

Topic Hub

This solution is designed for pub/sub scenarios, therefore we'll use a TopicHub base class to provide the common Subscribe, Unsubscribe and OnMessage functionality for a particular type of subscription.

All that's then required is to inherit this base class and specify the types used for each subscription, for example:

Full source, documentation and a working example can be found in this GitHub repository. There's also a Nuget package.

Trade Offs

It's worth noting that Actor Events are described in the documentation as only being "Best effort".  I presume this is down to the transient nature of Actors across nodes between upgrades and fail-overs.  I couldn't find any more detail on that distinction, however as this solution was designed to support some snazzy real-time UI features (nothing business-critical), this risk is acceptable.

Any Actor Proxies within a TopicClient will be lost if our SignalR service stops, however if this happens, client hub connections will also get disconnected and can then re-connect to another service instance which will subscribe to the appropriate Actors.

Event publishing services within our application are not aware of connected clients and will publish to the Topic Actor whether there are any subscribers or not.  This service remoting overhead is acceptable, given that after this point the Actor event will only propagate to Actor Event subscribers.

Summary 🏁

SignalR allows us to create websocket connections between client and server.  Within a Service Fabric application, services hosting SignalR can have multiple instances, however an individual client will only be connected to one instance.

Backplanes offer a quick and easy solution to replicate events across all service instances, however they can become a bottleneck when an application generates high-frequency, user-specific events.

Events can be directly brokered to only the appropriate SignalR service instances using Actor Events in combination with a TopicClient broker.

This solution is scalable and flexible enough to support many different event types that an application may generate.  It also leverages the existing Service Fabric cluster, rather than an external service with potential extra cost.

Full source, documentation and demo SF app can be found here.

Show Comments