Monday, October 16, 2017

A Simple Explanation of the KRACK WPA2 Security Vulnerability

What Has Happened




x
Security researchers have discovered a weakness in the Wi-Fi Protected Access 2 (WPA2) protocol that is used in all modern Wi-Fi networks.  A malicious attacker in range of a potential unpatched victim can exploit this weakness to read information that was previously assumed to be safely encrypted.  The vulnerability is within the Wi-Fi IEEE 802.11 standard itself, and is therefore not unique to any particular access point or client device vendor.  It is generally assumed that any Wi-Fi enabled device is potentially vulnerable to this particular issue.

A Summary of How WPA2 Security Works

WPA2-AES security consists of both authorization and encryption.   The authorization step is used to determine whether a particular client is allowed to access the wireless network, and comes in two flavors, Personal and Enterprise.   In WPA2-AES Personal, a pre-shared key or passphrase is used to provide the essential identifying credential.  In WPA2-AES Enterprise, the Extensible Authentication Protocol (EAP) is used to validate the client credentials against an external RADIUS or Active Directory server.  In either the WPA2-AES Personal or WPA2-AES Enterprise scenario, once the client’s authorization credentials are validated, a unique set of encryption keys are established between that particular access point and that particular client device, so as to encrypt the traffic between them.  This encryption process is done via a four-way handshake, where particular temporal (i.e. temporary) keys are passed back and forth between the access point and the client device so that each can derive the appropriate unique encryption key pair used for that connection.

A Summary of the Vulnerability

The security researchers discovered that they can manipulate and replay the third message in the four-way handshake to perform a key reinstallation attack (KRACK).  Strictly speaking, each temporal key that is passed in the four-way handshake should only be used once and never re-used.  However, in a key reinstallation attack, the attacker pretends to be a valid access point and tricks the client device into reinstalling a temporal key that is already in use, serving to reset the transmit and receive packet numbers.  For WPA2-AES, the attacker can then derive the same encryption key as the client device, and thus decode upstream traffic from the client device to the access point.  For the older (and less secure) WPA-TKIP, the attacker can go even further, and potentially forge and inject new packets into the data stream.

For an attack to be carried out to take advantage of this vulnerability, it must be done by a malicious actor conducting a man-in-the-middle attack (i.e. pretending to be an AP on your network and serving to be a relay between the client device and the legitimate wireless network).


How this Vulnerability Impacts Access Point Products and Networks

As the issue occurs on client devices, the first step for any network operator is to check with your client device manufacturers for security patches and updates and apply these updates as soon as they are available.

This particular vulnerability has no direct impact on any APs operating in “access point” mode.  However, access points that are being used as client devices (i.e. APs operating in “client bridge” mode) or any access points that are being used for point-to-multipoint communications (i.e. APs operating in “WDS bridge” or “WDS station” mode) are potentially impacted by this vulnerability in the IEEE 802.11 protocol.  Furthermore, some advanced applications and features, such as mesh networking and fast roaming (i.e. 802.11r), may also be potentially vulnerable to this issue.

Access point vendors are currently actively investigating the impact of this vulnerability across all of the products in our product portfolio, and will be issuing firmware releases in the coming days and weeks to address this issue.  In the interim, continue to use WPA2-AES Personal or WPA2-AES Enterprise for network security.  Do not use WEP and do not use WPA-TKIP, as the vulnerabilities of those deprecated security protocols are significantly more serious and easier to execute by a malicious attacker.

For More Information

The website https://www.krackattacks.com/ provides a detailed summary of the issue along with links to the research paper and tools detailing the vulnerability.

Friday, September 8, 2017

Oversubscription Ratios and the Types of Bandwidth Throttling

When we build networks, we need to allocate the available bandwidth amongst the client device population in, hopefully, a reasonably fair and equitable manner such that all users are happy (or at least not complaining).    We use bandwidth throttling for this purpose.  

Without bandwidth throttling, one or two abusive users could use applications like BitTorrent and consume the overwhelming majority of the available Internet bandwidth, leaving very little bandwidth for all of the remaining users on your network.  

Bandwidth Throttling 

Generally, three are two types of bandwidth throttling available on network equipment:

  • Per-User Bandwidth Throttling:   This limits the maximum amount of Internet bandwidth that each client device can consume
  • Per-Subnet/VLAN Bandwidth Throttling:  This limits the aggregate maximum amount of internet bandwidth that all client devices on the subnet / VLAN can consume at one time.
By means of a demonstrative example, let’s assume we have a subnet / VLAN with 5 client devices connected.  If the bandwidth throttling is 10 Mbps / 10 Mbps per user, then each user could potentially consume 10 Mbps / 10 Mbps simultaneously, making the total potential consumption the sum, or 50 Mbps / 50 Mbps.   Alternatively, if the bandwidth throttling is 10 Mbps / 10 Mbps per subnet / VLAN, then all users on that subnet / VLAN have to share a 10 Mbps / 10 Mbps bandwidth allocation, meaning each user would get 2 Mbps / 2 Mbps on average, and this average would decrease as more users connect to that VLAN / subnet.

In general, per-user bandwidth throttling is what you want in most practical circumstances.   Obviously, if there are too many users and/or the allocated bandwidth per user is set too high, you eventually run out of Internet bandwidth.

So how do you decide what limits are appropriate?   It ultimately depends on the type of network you are operating (i.e. its requirements) and the total amount of Internet bandwidth you have available (i.e. its constraints).  However, this can be treated quantitatively by using an oversubscription ratio.

Oversubscription Ratio

Oversubscription is a concept that dates back to very early telephony.  Statistically, not all connected users will actually consume their maximum available bandwidth at any particular instant of time.  For example, if we have 5 users and each of them has a per-user bandwidth cap of 10 Mbps / 10 Mbps, it is statistically unlikely that any of them, let alone all of them, will actually be consuming 10 Mbps / 10 Mbps simultaneously.   Most network applications are bursty in nature, meaning that your actual consumption is constantly fluctuating and rarely hitting the maximum allocation.  (Video streaming is, naturally, an important exception to this, as that consumes bandwidth at a fairly constant rate for an extended period of time.  That said, even today only a fraction of devices that are connected to your network are likely to be streaming video at a given instant.)   

Thus, as a service provider, I do not need to supply the additive sum in terms of bandwidth (i.e. # users * promised bandwidth per user), but rather some fraction thereof.  That fraction defines the oversubscription ratio.   

Unfortunately, setting the oversubscription ratio is an empirical exercise, and over time, the oversubscription ratio tends to decrease as more devices, each consuming more bandwidth, are connecting to your networks.  In the pre-smartphone days, a 30:1 or even 40:1 was common for most wired / wireless networks.  

The common oversubscription ratio I use for regular network usage (e.g. hotel, apartment building, etc.) is 20:1.  This would mean that if I promised 200 users each a 10 Mbps / 10 Mbps data rate (and throttled them each to that rate), I could get away with only providing a 100 Mbps Internet bandwidth connection.  The math is as follows:  200 users * 10 Mbps/user * 1/20 = 100 Mbps.  At any instant in time, the average consumption would be 10 Mbps/user / 20 = 500 kbps per user. In reality, some are obviously consuming more, while others will be consuming less (even 0).  

For student housing, which is fairly heavy network utilization, I typically use a 10:1 oversubscription ratio.   For larger high density environments, (e.g. conference centers, event spaces, etc.) you will have a few devices that are doing video streaming, but most attendees will be connected but are not likely to be heavily utilizing their devices.  I therefore typically use a 15:1 oversubscription ratio.


Determining Appropriate Bandwidth Throttling Values

In reality, one is generally constrained by the total amount of bandwidth available, as that is the most expensive part of your network.  Thus, the real calculation is to determine the appropriate bandwidth throttling per user that should be used.  To determine this, one needs to know the peak number of expected users and the bandwidth available.   

As an example, let's assume an event space where we are expecting 500 users and have a 300 Mbps / 300 Mbps Internet circuit available.   Using the 15:1 oversubscription ratio, for 500 users this comes out to a sustainable average service level of 9 Mbps (i.e. 300 Mbps / 500 users * 15:1 oversubscription = 9 Mbps / user).   

 Of course, in reality complex networks are an Animal Farm (i.e. while all client devices are equal, some client devices are “more equal” than others).  Thus, different classes of users will require different levels of service.


https://northwindsjourney.files.wordpress.com/2013/04/animal_farm.jpg

In most commercial environments, it is vitally important to operations that they have sufficient bandwidth available, though they usually represent a small fraction of the total number of clients. This is one very good use of having multiple VLANs / subnets, as you can put your different classes of users on to different VLANs, and then allocate bandwidth both per VLAN and per user accordingly.  Where operations activity is critical, we need to provide this small but more important operations segment of the client device population a higher per-user bandwidth allocation, and give the (proletariat) visitors a lower per-user bandwidth allocation.   

It is also useful to have two layers of bandwidth throttling.  The first layer is bandwidth throttling per VLAN / subnet.  For example, limit the guest network to 80% of the total bandwidth, ensuring that the staff / operations network(s) will always have access to at least 20% of the Internet bandwidth, no matter how crowded the guest network becomes.  The second layer is bandwidth throttling per user, to ensure that no abusive user on any VLAN / subnet can take up all of the bandwidth allocated to that VLAN / subnet.

Tuesday, August 15, 2017

The Emergence of Tri-Band APs

In a former blog post, I discussed the limitations of MU-MIMO and hinted at the pending emergence of a competing technology for high-density deployments, called "tri-band".   In this post, I'll be again comparing the technologies and encouraging the use of tri-band APs for high-density deployments.

What is Tri-Band?

Strictly speaking, this AP technology should be called "tri-radio", as two of the radios in the access point are on the 5 GHz band.  Essentially, a tri-band access point is just two co-located 5 GHz 802.11ac wave 1 APs in one box.   The tri-band AP has one 2x2:2 stream 2.4 GHz radio (IEEE 802.11b/g/n) and two 2x2:2 stream 5 GHz radios (IEEE 802.11a/n/ac), with one 5 GHz radio locked on the low portion of the band (channels 36-64) and the other 5 GHz radio locked on the high end of the band (channels 100-165).   

Tri-band is technically not part of the IEEE 802.11 standard.  Broadcom developed the original chipset for this in mid 2015, and QualcommAtheros recently introduced their own version.  Tri-band APs are intended for high density environments (e.g. lecture halls, conference centers, auditoriums, concert halls, etc.) and thus compete directly with IEEE 802.11ac wave 2 with MU-MIMO.  The tri-band approach, however, has several advantages over MU-MIMO.   Nonetheless, there are still very few vendors who have introduced tri-band access points to the market, despite their obvious advantages.   

Why is Tri-Band Better than MU-MIMO?

MU-MIMO requires the use of beam forming, which is a technique used to create particular zones of constructive and destructive interference at particular locations.  By maximizing the signal for each client device at the client devices location (and minimizing the signal for the other clients at each client’s location), a MU-MIMO AP can talk downstream to multiple client devices.   

Multi-User Multi-In Multi-Out (MU-MIMO)

The MU-MIMO technique requires the AP to know the position of the client devices (relative to itself).  The AP gathers that information by periodically transmitting “sounding frames”, essentially tones off of each AP antenna.  Compatible client devices will respond by sending a matrix indicating how well the client device heard the tone from each antenna.  Based on that matrix, the AP can calculate the relative position of the client device.    

MU-MIMO has the following limitations, which do not exist with the tri-band approach:

  1. Increased overhead:  The sounding frames and their responses consume airtime.  While this is less than the presumptive gains of talking to multiple client devices simultaneously, it does indicate a loss.  Most MU-MIMO access points only get a 1.7x - 2.2x increase in speed when talking downstream to three compatible client devices. 
  1. Client device compatibility:  The client devices need to be compatible with MU-MIMO in order to understand the sounding frames and to send the appropriate response.  As of August 2017, there are still surprisingly few MU-MIMO compatible client devices on the market.  There are some USB dongles available for PCs.  The flagship mobile client device for MU-MIMO had been the Samsung Galaxy Note 7, which failed in the market for unrelated incendiary reasons.  The Apple iPhone 7, while originally rumored to support it before its launch, quietly did not support MU-MIMO.  Given Apple's notorious secrecy, we still don't know whether or not the upcoming Apple iPhone 8 will or will not support MU-MIMO.

  1. Client separation:  MU-MIMO requires that the client devices it talks to simultaneously must be physically separated from each other.  If the client devices are in too-close proximity, the beam forming won’t be able to successfully maximize the signal at one client and minimize the signal of the other (neighboring) clients.  


  1. Downstream only:  MU-MIMO only works for downstream traffic, from the AP to the client device(s).  Upstream traffic from each client device to the AP must still happen one at a time, otherwise the AP will hear multiple client devices at once and won’t be able to distinguish between them.
In comparison, all 5 GHz Wi-Fi clients can communicate with the tri-band AP as they would with any other conventional access point, so it is backwards compatible with all current and future Wi-Fi client devices.  There is also no additional overhead on the channel, as sounding frames are not required, and the positions of the two 5 GHz client devices, both relative to the AP and to each other, doesn't matter.  Additionally, since the 5 GHz clients and the channels are independent, the traffic to each client can occur simultaneously in both directions.  The AP itself uses an internal mechanism called “client steering” to encourage 5 GHz clients to connect to one or the other 5 GHz radio, so as to balance the load across the two 5 GHz radios. 

The Takeaway Message

A current four-stream MU-MIMO access point can talk simultaneously to 2-3 compatible client devices on the 5 GHz band downstream, sometimes.   A two-stream tri-band access point, by comparison, can talk simultaneously to any two client devices on the 5 GHz band both downstream and upstream, all the time.

Monday, July 31, 2017

Appropriate RSSI for WDS Bridge Links

A customer recently approached me with a question on how much the minimum RSSI should be for a WDS bridge link between two locations in order to maximize data throughput.   (If you don't know what WDS bridging is, see my blog entry on Wireless Backhaul Best Practices.)

The short answer:  -40 dBm to -50 dBm for optimal performance.   This RSSI is readily achievable over distances of up to approximately 2500 ft when using high-gain directional antennas on each end.


This RSSI target is to ensure that the signal to noise ratio (SNR) can be safely above 37 dB.  This means that the highest MCS rate (MCS9) can be achieved and maintained with an 80 MHz channel, and the throughput of the link maximized.  (See Andrew von Nagy's Wi-Fi SNR to MCS Chart for further reference.)

For 802.11ac 2x2:2 APs, measured data throughputs of about 350-400 Mbps can be achieved with 80 MHz channels, when the RSSI of each link is consistently in the -40 dBm to -50 dBm range.  

If the RSSI is too strong (i.e. > -35 dBm), the electronic amplifiers in the AP start to get saturated and data throughout will actually decrease.  This generally only occurs in very short distance shots (< 50 feet), and in those instances we recommend turning down transmit power to minimum and, if still necessary, purposely misaligning the antennas.  

If the RSSI is too weak (i.e. < -70 dBm) the link speed will be very slow (low SNR leading to low MCS rates and slow speeds).  In the presence of any interference, the wireless backhaul link itself can often become unstable.

A Simplified Explanation of the Physics

A signal leaving a transmitter experiences free space path loss (FSPL), which is a geometric spread of the RF energy as it travels away from a transmitter; the further away the receiver is from the transmitter, the more the energy from the transmitter has spread out, so the less amount of energy is seen at the receiver. The FSPL goes as the square of the distance between the transmitter and receiver. Increasing the gain of the antennas at either (or both) ends of the wireless link help to focus the transmission energy and/or the receive sensitivity in a particular direction, allowing the distance to increase between the transmitter and receiver.  

Hence, for point-to-point wireless backhaul applications, you always want to use high gain directional antennas on both ends of the link (such as the integrated 19 dBi antenna on an EnGenius EnStationAC) to maximize the practical link distance and speed.

A Simplified Explanation of the Math

If you want to estimate the RSSI of a link, you can use the following formula:


or rearranged to compute link distance:


where:
  • RSSI = Received signal strength indicator [dBm]
  • d = distance of wireless link [m]
  • f =  operating frequency of wireless link [Hz]
  • c = speed of light (i.e. 300 billion m/s) [m/s]
  • PTx = transmit power of transmit radio [dBm]
  • GTx = gain of transmit antenna, less cable losses [dBi]
  • GRx = gain of receive antenna, less cable losses [dBi]
Based on these formulas, a point-to-point link utilizing EnGenius EnStationAC access points could achieve a -50 dBm RSSI on each end for a WDS bridge wireless link length of approximately 2500 feet.

Monday, April 10, 2017

Wireless Backhaul Best Practices

This blog post provides guidelines on best practices for configuring and deploying wireless backhaul on Wi-Fi networks, and goes through the differences between and appropriate scenarios for client bridges, repeaters, WDS Bridge links, and mesh networks.

The Options for Wi-Fi Backhaul

In a conventional wireless network, each access point (AP) requires a wired Ethernet connection to provide backhaul to the wired network infrastructure and ultimately the Internet.  In some environments, however, it is either impossible or prohibitively expensive to run an Ethernet cable to each AP.  In such cases, Wi-Fi itself can be used to provide wireless backhaul from the AP (or other network appliance, such as a remote IP camera) to the wired network.  Each Wi-Fi backhaul link is referred to as a hop, and it is possible to have a chain of multiple hops between the remote wireless AP to the root wireless AP that has a wired connection to the network.

There are multiple options for providing Wi-Fi backhaul to the remote APs.  Naturally, each option has both benefits and limitations.  Most critically, each wireless hop introduces latency, which adds in a linear fashion with the number of hops.  Repeaters and mesh also inherently lower with throughput and user capacity, often as a square of the number of hops. 

It is critical to understand your technical requirements and constraints, as well as the benefits and limitations of each wireless backhaul option, when designing a Wi-Fi network and selecting a particular Wi-Fi backhaul approach.

Option 1:  Client Bridge

An access point operating in Client Bridge mode provides Wi-Fi connectivity for a wired client device.  A Client Bridge is intended to connect an individual wired client device to a Wi-Fi network.   This is depicted in Figure 1.


Figure 1:  Example of a network utilizing a client bridge.

When multiple wired client devices are connected through a single Client Bridge, they share the same MAC address on the network, namely the WLAN MAC address of the Client Bridge itself.  The multiple wired client devices can still be configured with different Layer 3 static IP addresses, and each wired device may or may not be able to obtain an independent Layer 3 DHCP address, depending on the DHCP server.

Best Practice:  When using an AP in Client Bridge mode, only connect one wired client device.

For typical applications, Client Bridge mode is only utilized on single-band APs. For dual-band access points, one radio (typically 5 GHz) will be configured to operate in client bridge mode, while the other radio (typically 2.4 GHz) will be used for providing Wi-Fi connectivity on an independent SSID to wireless client devices.  Client Bridge mode is generally only available on standalone APs, meaning that each AP must be configured individually and cannot be managed or monitored from a centralized controller.  Client Bridge mode is available on all EnGenius® single-band Electron™ and EnStation™ access points, as well as dual-band APs in the Electron™ ECB series.

Option 2:  Repeaters

An access point operating in Repeater mode provides both Wi-Fi connectivity to client devices as well as providing a wireless backhaul connection to one or more wired APs.  This is depicted in Figure 2.  Repeaters are intended for very small networks (e.g. home environments), where individual repeater APs are used to fill in particular coverage gaps.  Individual client MAC addresses are preserved, though the VLAN (if any) is defined by the main access point’s SSID that is being repeated.

Figure 2:  Example of a network utilizing a wireless repeater.

For dual-band access points, one radio (typically 5 GHz) will be configured to operate in repeater mode, while the other radio (typically 2.4 GHz) will be exclusively for providing Wi-Fi connectivity to client devices.  Note that both Wi-Fi bands depend upon the repeater radio for backhaul.  Since the repeater radio must spend half its time providing Wi-Fi connectivity to client devices and half its time providing wireless backhaul, the data capacity of a repeater radio for both backhaul and for Wi-Fi client connectivity is reduced by 50%.  When there are multiple hops, the data capacity is reduced by 50% at each hop.  Thus, for two hops, the total data capacity is only 1/4, for three hops it is 1/8, for four hops it is 1/16, and so forth. 
Repeater mode is generally only available on standalone APs, meaning that each AP must be configured individually and cannot be managed or monitored from a centralized controller.  Repeater mode is available on all EnGenius® Electron™ ECB series access points.

Option 3:  Point-to-(multi)point WDS Bridge Links 

A dedicated pair of APs, usually with integrated directional antennas (such as the EnGenius® EnStationAC), are configured to operate in WDS Bridge mode to create a point-to-point link to provide wireless backhaul.  The WDS Bridge link on the remote end is connected to the remote AP via its wired Ethernet interface. From the perspective of the rest of the network, this wireless connection looks like a wired connection; in WDS Bridge mode, the wired Ethernet frame is encapsulated and encrypted in a Wi-Fi packet on one end, transmitted across the wireless link, and then de-encapsulated and decrypted on the other end.  Thus, all wired Layer 2 information (i.e. client MAC addresses, VLANs, etc.) are preserved across the WDS Bridge link. Point-to-multipoint WDS Bridge links ae also readily possible, though be aware the remote links collectively share the total available airtime bandwidth of the link.  This is depicted in Figure 3.


Figure 3:  Examples of point-to-point and point-to-multipoint networks utilizing WDS Bridge links.

The WDS Bridge links are statically established, so that each WDS Bridge AP only accepts connections from pre-defined radios.  WDS Bridge usually requires dedicated hardware at each remote location operating on independent channels, though some APs allow for one radio (typically the 5 GHz) to be in WDS bridge mode and the other radio (typically the 2.4 GHz) to be in AP mode to provide Wi-Fi service client devices.

Best Practice:  WDS Bridge with dedicated 5 GHz only access points is generally recommended for most networks requiring both wireless backhaul and high bandwidth and/or high user capacity Wi-Fi.   While each hop adds latency, there is no throughput or user capacity degradation, since the point-to-(multi)point backhaul link is solely dedicated to wireless backhaul, with Wi-Fi access for client devices being handled by separate access points.

For large networks consisting of multiple remote nodes, a WDS Bridge backhaul network requires its own design effort to ensure appropriate bandwidth capacity and channel utilization.  WDS Bridge mode is generally only available on standalone APs, meaning that each AP must be configured individually and cannot be managed or monitored from a centralized controller.  WDS Bridge mode is available on all EnGenius® Electron™ and EnStation™ access points.

Point-to-Multipoint WDS Bridge Network Example

Figure 4 shows an example of an outdoor Wi-Fi network at an RV park utilizing point-to-(multi)point links to provide wireless backhaul to APs mounted on light poles.  The colored lines indicate the point-to-(multi)point WDS Bridge links implemented with EnGenius® EnStationAC access points. 


Figure 4:  Example of a wireless network utilizing point-to-(multi)point links for backhaul to outdoor wireless APs.

Red markers indicate the location of outdoor dual-band APs, and yellow markers indicate the location of additional light poles that were available at the property.  To maximize wireless backhaul capacity, all of the WDS Bridge links utilized 80 MHz channels in the UNII-2 and UNII-2e bands (i.e. DFS channels 52-64, 100-112, and 116-128).  The 5 GHz radios on the dual-band APs were set to use 40 MHz channels on the UNII-1 and UNII-3 bands (i.e. channels 36-40, 44-48, 149-153, and 157-161), so as to avoid co-channel interference with the point-to-multipoint backhaul network.

Option 4:  Mesh Networks

In a mesh network, the AP uses its own radio to provide a wireless backhaul to other APs on the network, eventually reaching an AP with a wired Ethernet connection to the wired backhaul infrastructure and the network.  In this sense, a mesh network is a network of repeaters, though mesh is designed to operate automatically and more intelligently on a large scale.  A mesh network creates a set of “dynamic WDS Bridge” links, using routing algorithms to automatically calculate the most optimal wireless path through the network back to a wired root node.  This makes mesh networks relatively robust to the failure of an individual AP; in a process referred to as “self-healing”, the routing algorithms will automatically calculate the “next best” path through the network if an AP in the path goes offline.  Since the routing functions are done automatically within the mesh software, mesh networks are actually fairly straightforward to set up and are thus scalable to cover large geographic areas.  All wired Layer 2 information (i.e. client MAC addresses, VLANs, etc.) are preserved across the mesh link.  Examples of mesh networks are shown in Figure 5 (for home / SOHO environments) and Figure 6 (for larger campus-wide environments).


Figure 5:  An example of a home / SOHO mesh network, utilizing EnGenius® EMR3000 mesh routers.


Figure 6:  An example of a large campus mesh network, utilizing EnGenius® EWS1025CAM mesh cameras.

The mesh network control architecture can either be centralized or distributed.  With a centralized control architecture, an AP controller is required to calculate and coordinate the mesh parameters for each AP.  This architecture, however, limits the scalability of the mesh network to the capacity of the AP controller.   In a distributed control architecture, such as the EnGenius® Neutron™ series and EMR3000 product, each AP operationally acts like a router, continuously sharing information about its connection status to its neighbors, and each AP uses this information to compute its own optimal mesh path. In a distributed architecture, an AP controller can be optional, though is generally extremely useful in providing centralized real-time monitoring of the mesh network, as well as establishing the core initial mesh network parameters, such as mesh ID, encryption, etc.

Unfortunately, mesh networks have significant limitations, most notably in the loss of throughput and user capacity, which scales geometrically as the number of wireless hops increase, as well as the increase in latency, which scales linearly as the number of wireless hops increase.  

Accordingly, mesh networks are not suitable for high bandwidth or latency-sensitive applications.  Because of these performance limitations, it is generally recommended that mesh networks be avoided unless no other viable backhaul options are available. Mesh networks should only be used in environments where providing Ethernet data wiring to access points or cameras is impossible or cost-prohibitive. 

Mesh networks were originally trendy in the mid-2000s, as a way of both providing metropolitan Wi-Fi coverage as well as coverage for large outdoor properties where wiring was prohibitively expensive, such as RV parks, garden-style apartment complexes, marinas, etc.  While many mesh networks were successfully deployed, most of these efforts ultimately failed, especially in metropolitan Wi-Fi.  Early mesh networks relied upon single-radio APs on 2.4 GHz using 802.11g.  When dual-band APs were introduced, only 802.11a was available on the 5 GHz band, which still led to very low throughputs as the number of hops increased.  

With the wide adoption of dual-band access points with 802.11ac, there has been renewed interest in mesh for both Wi-Fi access and surveillance applications.  Accordingly, several startup companies, as well as established vendors like EnGenius®, have introduced mesh Wi-Fi products utilizing 802.11ac.  While the data rates of 802.11ac are approximately 25 times larger than the 802.11a data rates of a decade ago, the number of client devices and their bandwidth demands have also grown exponentially during that time.  The fundamental limitations of mesh networks are therefore still the same, and thus mesh may ultimately again prove to be a passing fad.

Nonetheless, mesh networks are the only viable option in many cases.  The sections below highlight how to best design and deploy mesh networks, so as to maximize their performance and mitigate their inherent limitations.

Mesh Network Terminology and Best Practices

The access points in a mesh network are categorized as either root nodes or remote nodes:
  • Root Node (a.k.a. Gateway Node):  This is an access point with a wired connection to the wired switch infrastructure.  The remote nodes establish wireless backhaul connections to the root node.  Note that the wired connection utilized by a root node can either be (1) a direct Ethernet or fiber-optic connection to the wired switch infrastructure or (2) a wired connection to a separate WDS Bridge wireless point-to-(multi)point link on an independent channel.
  • Remote Node:  This is an access point without a wired Ethernet connection.  Backhaul to the network is established via a wireless connection to a root node or to other remote nodes.  Note that the remote AP still requires electrical power, so an Ethernet connection to a PoE injector is common, though the “network” end of the PoE injector may not be connected at all or may only be connected to a wired client device, such as an IP camera.

The path from a particular remote node back to a particular root node can require connections via multiple intermediate remote nodes, and this wireless link in this chain is referred to as a hop.  The mesh routing algorithm selects the most optimal route through the network.  The optimization function used by the mesh APs is generally proprietary to each AP vendor, but typically attempts to balance several, often conflicting, parameters, such as the following:
  1. Minimize the number of hops, so as to minimize the total wireless latency and throughput penalty of the network
  2. Maximize the signal strength of each hop, so as to maximize the achievable Wi-Fi data rates between the mesh radios on each hop.    For maximum data rates in 802.11ac, the received signal strength indicator (RSSI) would ideally be in the -40 dBm to -50 dBm range, though this is usually unachievable in practice since omni-directional antennas are typically used to create the widest field of view to neighboring APs.  Data rates should be above -65 dBm for decent data rate performance between hops.
  3. Balance the load on each AP, so as to account for the number of associated client devices and the total throughput consumption on each AP.  The throughput load stacks as the number of hops increase, so intermediate remote nodes that are heavily utilized with client traffic will not give as many resources to downstream remote nodes. 

Because of the competing tradeoffs in this optimization process, mesh networks can often result in counter-intuitive and/or sub-optimal topologies.

Best Practice:  The network design should cluster the APs into groups consisting of up to four remote nodes that are only one hop away from a root node.  Thus, at least 20% of your APs, distributed roughly evenly throughout the property, should be root nodes.  Each remote node is therefore nominally only one hop away from a root node.  In the event of a failure of a root node, the nearby remote nodes will then only be 2-3 hops away from another root node.  This approach generally requires creating additional root nodes, which can be done either by running Ethernet or fiber-optic cable to the particular remote locations, or by establishing dedicated point-to-(multi)point WDS Bridge links to create “wireless wires” from the root AP back to the wired network.

Best Practice:  Each root node should be set on a static independent channel, and each remote node should be set to “auto channel”.  This is done to maximize the airtime capacity of the overall network, so that multiple neighboring root nodes do not create self-interference.  The remote nodes are set to auto-channel so that they can fail over to a different root nodes in the event of the failure of their primary root node.  When utilizing point-to-(multi)point WDS Bridge links to establish root nodes, these must also be on static independent channels, and thus must be accounted for in the overall channelization plan. 

Both root nodes and remote nodes can generally operate in one of two modes:
  • Mesh AP Mode: In this mode, the wireless radio acts like a repeater, providing both Wi-Fi connectivity to client devices as well as providing a backhaul connection to one or more remote APs.  For single-band mesh access points, this is the only operational mode available.  For dual-band access points, one of the bands (typically 5 GHz) will be configured to operate in this mode.  The other band (typically 2.4 GHz) will be exclusively for providing Wi-Fi connectivity to client devices.  Note that both Wi-Fi bands depend upon the mesh radio for backhaul.  Since the mesh radio must spend half its time providing connectivity to client devices and half its time providing backhaul, the data capacity of the mesh radio for both backhaul and for Wi-Fi client connectivity is reduced by 50%.   When there are multiple hops, the data capacity is reduced by 50% per hop.  Thus, for two hops, the total data capacity is only 1/4, for three hops it is 1/8, for four hops it is 1/16, and so forth.  
  • Mesh Point Mode:  In this mode, available only in dual-band APs, the wireless mesh radio (typically 5 GHz) only provides wireless backhaul, and the other radio (typically 2.4 GHz) only provides Wi-Fi connectivity to client devices. Operationally, the mesh radio operates like a dynamic WDS bridge link, so while each hop still introduces latency which adds linearly, there is no 50% throughput penalty per hop, since the mesh radio is not also servicing client devices on the same radio and can be devoted exclusively to backhaul.  Since Wi-Fi access to client devices is restricted to only one radio (typically 2.4 GHz), the overall client capacity of the AP is that of a single-band AP.  Furthermore, even dual-band 802.11ac client devices will only be able to connect at 802.11n data rates on the 2.4 GHz radio.

Best Practice:  Mesh APs should generally be configured to operate in Mesh Point mode.  The loss of bandwidth capacity from lacking wireless 5 GHz wireless connectivity is minor compared to the loss of bandwidth capacity from losing 50% of bandwidth per hop.  This also allows for the transmit power of the mesh radios to be set at their maximum value, so as to provide the maximum signal strength between nodes without being imbalanced with the low transmit power capability of most 5 GHz client devices.

In both operational modes, the overall data capacity of a mesh AP is reduced as compared to the same AP operating in a conventional configuration with a wired Ethernet connection to a wired switch infrastructure.  Accordingly, a mesh Wi-Fi network will never have the same level of throughput and client capacity of a conventional Wi-Fi network.

Mesh Network Example

Figure 7 shows an example mesh network deployed using the Best Practices highlighted above.  This is an RV park with 437 spaces spread across a roughly 2000’ x 1000’ area.  The main distribution frame (MDF) is in the southwest corner of the property, and trees in parts of the property preclude direct line-of-sight to many locations.
 

Figure 7:  Example of a mesh network, utilizing point-to-multipoint links to create additional root nodes.

The red links and bubbles indicate WDS Bridge links from the MDF to each of the root APs.  In some cases, multiple WDS Bridge links in series need to be established.  The point to point links are designated by Master or Slave with a letter and number index.  (For example, the WDS Bridge link going between the MDF and G8-R is designated link D, with [Master D] connected to [Slave D1]).

The other colors and bubbles represent the root and remote APs in Mesh Point mode, and the nominal mesh links between the remote APs and the root APs.  In the figure, each group is designated with a group number and an index to indicate that it is a root node or remote node.  (For example, in the right, the root node is designated [G8-R] and the nominal remote nodes are designated [G8-1] to [G8-4].) 

The point-to-(multi)point WDS Bridge utilizing 80 MHz channels on the UNII-2 and UNII-2e bands (i.e. channels 52-64, 100-112, 116-128).  Each root AP is set to a static 40 MHz channel on the 5 GHz band in the UNII-1 and UNII-3 bands (i.e. channels 36-40, 44-48, 149-153, and 157-161). 

Wednesday, March 22, 2017

When Do You Trust a Wi-Fi Predictive Model? The Battle of Accuracy vs. Precision

Most non-engineers tend to use the terms accuracy and precision interchangeably, but they are actually very distinct concepts.    Precision is based on the computational power of the software and the underlying mathematics.  Most mathematical models (such as Wi-Fi predictive modeling software using ray traces to compute absorption and reflections of walls and objects, as well as free space path loss over distance) will generally provide a high level of precision.  Accuracy is based on the level of complexity (and underlying assumptions) of the mathematical model, as well as the quality of the input parameters.

One of the tricks of performing any type of engineering modeling or simulation in Wi-Fi (or, for that matter, in any engineering discipline) is that you first need to estimate; i.e. you need know what the answer should be BEFORE you actually start the model, at least to a rough order of magnitude.  The engineering model serves only to add both accuracy and precision to your estimation.  
 
Without having a good estimate, however, the results of the predictive model can be extremely precise, while simultaneously being wholly inaccurate

The true art and skill of wireless design, therefore, is in understanding the initial estimate. A Wi-Fi predictive model, like any mathematical model, is a very sophisticated idiot!  The model will precisely compute what you tell it to. However, if the inputs to the model are wrong, the model doesn’t know that, and so the model will compute a very precise, and very wrong, answer.  Another common way of saying this is "garbage in, garbage out".

It should also be noted that there are also underlying assumptions in the model itself in the way it works that will impact and limit both precision and accuracy.  For Wi-Fi predictive modeling packages (e.g. Ekahau, Tamograph), the following assumptions and simplifications are typically used:
  • uniform walls with known dB loss values and known reflectivity percentages
  • the models only account for absorption and reflectivity, and generally not diffraction, scattering, or other effects (and when using the “attenuation zones”, the model does not even calculate reflectivity, but only attenuation as a a dB/ft loss coefficient).  This is done to keep computation times reasonable
  • we generally ignore the effects of furniture, wall decorations, appliances, mirrors, people, etc
  • the antenna signal propagation patterns modeled are based on either measurements or design predictions of the antenna manufacturer, which will have its own underlying errors and assumptions
  • we are generally only positioning / placing the access points with an accuracy of several feet at best, and these placements may change slightly during installation, based on how cabling is run

Thus, there are always intrinsic errors and simplifying assumptions in any model.  For most purposes, however, this limited level of accuracy is quite sufficient, so long as these simplifying assumptions are “reasonable”. We generally don’t need an exact model that predicts the absolutely correct signal level at every spot that we will get in the environment. What we need is a model that is “close enough”, so as to tell us how many APs are required, along with their locations and critical settings, such as channel and transmit power.  We can easily be off by +/- 5 dB in any particular location, and still it would be sufficient for most purposes.

So How Does One Create an Initial Estimate?

Unfortunately, the art of good estimation only comes with a lot of experience, usually a lot of negative experience where a project has gone horribly, horribly wrong and you need to fix it, often by trial and error.  

That said, there are several guidelines that can be used to point an engineer in the right direction to develop a reasonable initial estimate.   For Wi-Fi design, these are the guidelines I employ:

  1. Understand Your Requirements:  How is the network going to be used?  Approximately how many, and what type, of client devices?  Where are they going to be located?  What areas do (or do not) require signal coverage?  How many years is this network going to be deployed, and how is usage likely to change over that time?   

    These requirements vary substantially by vertical market, and can vary even by individual project.  Most Wi-Fi engineers tend to specialize in one vertical market or a small set of closely related vertical markets, so trends on previous projects will typically be applicable to your current one.  If, like me, your specialty is "SMB" which spans multiple vertical markets, you need to develop a good understanding of how Wi-Fi gets used across many different environments.

  2. Understand Your Constraints:  What are the building materials?  Can APs be deployed in rooms or can they only go in the hallways?  What kind of budget is allowed on this project?  Are there other Wi-Fi networks or non-Wi-Fi sources of interference in the environment?   Are aesthetics a concern (nobody likes seeing external antennas)? 

    Your requirements define what the network has to do.  Your constraints are what the network has to work around.  The distinction is subtle but important.  Requirements are always independent of each other and are generally inviolate. Constraints can be highly coupled and, in many cases, self-contradictory, but are also potential areas of compromise and push-back.
     
  3. Understand Your Solutions:  Wi-Fi engineers generally have one, or at most a few, preferred Wi-Fi vendors that they use in most deployments, and a smaller subset of products from that vendor or vendors that they use.  Every product will have its own capabilities, limitations, and idiosyncrasies. Understanding how these products have behaved on previous projects (especially if challenges had to be overcome) will help in knowing how many will be needed on the next product.  

    Most AP vendors will provide and promote a set of Best Practices, guidelines, case studies, etc. on how to best deploy their products in various scenarios.  It is important that the Wi-Fi engineer be able to separate the marketing hype from the technical capabilities of a product, and know how to best tune the configuration settings (i.e. "nerd knobs") for that product for their environment.

  4. Think Creatively; Don't be Constrained by Your Previous Solutions: This may seem contradictory with the prior point, but this is a key aspect of any successful Wi-Fi design.  My graduate thesis advisor, Professor Nam P. Suh (Mechanical Engineering Department Chair at MIT and later President of KAIST), was a controversial advocate of using a "clean sheet of paper" for every engineering design project, no matter the engineering discipline.   He promoted fully understanding your requirements and constraints up front, and then being unconstrained in terms of picking the right solution for the right problem.   Unfortunately, this approach is harder to achieve in practice than it sounds.

    AP vendors are specialized and generally target particular verticals - an AP or vendor that is very good for one environment is often very poorly suited for another.   This can work both ways, in terms of either lacking key features or having too many features and too much complexity, with a correspondingly high price tag. However, a Wi-Fi engineer (and the organization he or she works for) usually is heavily invested in particular vendors with education, experience, and infrastructure. Switching AP vendors can often be a logistical nightmare that most organizations won't employ unless forced to.  

    That said, there are often creative solutions that don't require a radical departure to a new vendor. The use of external directional antennas in some environments can provide some unique solutions that may be appropriate in some environments, especially in areas such as outdoor parks, parking lots, marinas, warehouses, etc.  Additionally, if you understand the "nerd knobs" (pursue your CWNA and beyond if interested), some settings can be tuned for particular environments to optimize performance.  

  5. Establish Good "Rules of Thumb": It can be useful to start with simple (and even "overly simple") assumptions to establish a starting point, so long as it is understood that this is ONLY a starting point and the estimate will need to be refined from there based on actual requirements and constraints.  

    A very good rule of thumb in Wi-Fi is to start by using a fixed transmit power level for 2.4 GHz and 5 GHz (I personally like using 14 dBm for 2.4 GHz and 20 dBm for 5 GHz in most environments), so that all of the APs have roughly the same coverage area on both bands and are not too much stronger than the transmit power of typical client devices such as smartphones and tablets.  This allows the APs to be spaced roughly evenly in the environment, making the development of a static channel plan simpler.

    For example, a common question I get is "how much area in square feet will an access point cover"?  The answer to this is not simple, as it depends upon building materials, building geometry, whether 2.4 GHz or 5 GHz is being discussed, expected client density, etc.   As one colleague of mine puts it, "Wi-Fi is not a can of paint!"   That said, a square footage estimate for a design primarily driven by coverage (vs. high capacity) can be a good starting point.  For an indoor environment (e.g. offices, private homes) with standard drywall, one omni-directional AP per 1500 - 2000 sq ft is not unreasonable.  Similarly, for open outdoor environments, an area of approximately 10,000 - 15,000 sq ft (i.e. a radius of approximately 50' - 75')  would be reasonable.
  6. The Estimate and the Model Must be Consistent: While one wants to make the initial estimate as good as possible, if we could estimate extremely accurately up front, there would be no need for predictive modeling.  You should expect that your estimate will be off by 20% - 25%.  The intent of the predictive model is to refine your estimate in terms of its accuracy and precision.  If the estimate and the predictive model are wildly divergent, then you have a problem somewhere and need to figure it out.

    For example, if you estimate that a project will require 10 APs, and the predictive model tells you that you need 8, or 12, then your estimate was pretty good.   If your model tells you that you only need 5 APs, or that you need 20 APs, then either there is a fundamental flaw in your estimate, or there is a fundamental flaw in your model or its inputs.   

An Example of Predictive Modeling Gone Awry

This is an actual case that came up a few weeks ago to provide Wi-Fi coverage in a 300’ x 300’ RV park with approximately 30-35 RV trailers, no trees, and excellent line of sight everywhere.  


In this case, our sales agent estimated and already sold the property six outdoor omni-directional access points.  The customer then asked us where to place the access points.  The customer was planning on installing 20' poles and running fiber from the MDF at the clubhouse to the poles, but needed to know where the poles should be placed.  My initial reaction was that this estimate was reasonable (and perhaps even slightly high) for such a small space.  I didn’t need the model to tell me this, but rather I know that from applying several years of experience deploying these types of networks.  

However, when I handed this to a junior engineer to model, the engineer came back with a predictive model solution requiring 12 APs, and even then, the coverage was marginal, especially on 5 GHz.

2.4 GHz predicted signal coverage (original model)

5 GHz predicted signal coverage (original model)

From prior experience deploying RV parks, I took one look at this and knew the answer was wrong. Quantifying WHY it was wrong, however, took some detective work.  In the end, this came down to misunderstandings of requirements and constraints, as well as how things should be modeled:
  1. This model assumed 11 dBm power levels on both bands, which we typically only use for high density deployments when many APs need to be co-located in one space for handling user capacity.  With only 30 RVs, even with 3-4 devices per RV, high user density of devices is clearly not a requirement.   

  2. This model assumes that the APs are going to be mounted between the RVs at a height of about 4', and not on poles approximately 12' - 15' in the air, several feet above the tops of the RVs.

  3. The RVs themselves were modeled as solid blocks of metal using the area attenuation zones.  This mode of modeling assumes just an attenuation loss of dB / ft, and not any reflectivity of signal.  While the RVs themselves have an exterior shell of metal or fiberglass, these are thin and have large windows through which the RF can penetrate.  It is my general assertion, therefore, that the walls of the RVs can be ignored from a modeling perspective.  Thus, we are essentially covering an empty field.  Even if you debate the wisdom of this assertion, the next simplifying assumption is to model the RVs as thin, hollow shells of metal that reflect 90% of the signal, not solid metal blocks that do not reflect at all.

Correcting for these misunderstandings and ignoring the outer shell of the RVs leads to only requiring four access points.

5 GHz predicted signal coverage (open field, 20 dBm transmit power)

One can debate that the outer shells of the RVs, even if thin, will have an appreciable signal attenuation, and for appropriate talkback of client devices in the RVs to the APs, more APs would be required.  This can be accomplished by lowering the transmit power levels of the AP, say to 15 dBm, in which case, six APs are needed.

5 GHz predicted signal coverage (open field, 15 dBm tramsmit power)

Personally, I would have used two outdoor APs with sector antennas on the main building to cover the entire RV park, with one indoor AP to provide coverage inside the main building, since it would be in the shadow of the sector antennas.  Such a solution would’ve been both cheaper and easier to install; not only are there fewer APs, but there is no need for any poles or fiber runs.   

5 GHz predictive signal coverage using sector antennas

Given the constraint that the customer already purchased the outdoor APs, and we felt we had already confused the customer enough, we presented the six AP solution with the appropriate configuration settings.

Aside from the 12 AP solution which had flawed requirements and assumptions, none of the other design options presented here are “wrong”.  There are several approaches to doing a design, and therefore several different solutions that will work, i.e. meet the true requirements and expectations of the customer, including adequate coverage, adequate capacity, minimal co-channel interference, etc. Different design solutions are thus “better” or “worse” in comparison to each other, usually based on parameters like cost, ease of installation, ease of maintainability, etc.  

There is the old adage “If all you have is a hammer, everything looks like a nail.”  The design approach we have and the solution we select will generally be constrained by the potential solutions you have to work with (i.e. the types and models of APs, antennas, etc.).   Nonetheless, understanding your requirements and constraints up front, along with the use of rules of thumb and experience, is essential to creating an initial estimate of a design solution, which then can be plugged into a predictive model for refinement.