Tuesday, April 28, 2020

Observations on Why You Should Avoid Auto-Channel and Auto-Power

Many AP vendors tout solutions to making the life of a Wi-Fi design and deployment easier for you by incorporating algorithms that are supposed to automatically, and dynamically, adjust and control both the transmit power and the channels on each band.   This is generally referred to as auto-power and auto-channel, respectively.  Cisco rebrands these into a package called "radio resource management (RRM)", and other vendors use different names.  Nonetheless, whatever the branding, these algorithms are all designed to do the same thing, namely optimize the AP power and AP channel settings on each band.

I will avoid doing any "vendor bashing" in this post.  I have and/or currently do work with most of the major AP vendors, and many of them devote marching armies of engineers legitimately working hard to come up with good solutions. Naturally, some vendors do auto-channel and auto-power  better or worse than others. All of that said, having worked for over 12 years in Wi-Fi designing and deploying well over 1000 separate networks with equipment from numerous AP vendors, I have yet to see any one of them atually get it "correct".  

There are no IEEE 802.11 standards on how to perform auto-channel and auto-power, so every vendor has their own "secret sauce" for doing this.  In. this context, "secret sauce" should be defined, and the best definition of Wi-Fi "secret sauce" I've encountered comes from a tweet from Daniel Johnson (@TheRFWrangler) on March 27, 2020 at 8:54 am CDT.  In was in the context of a conversation on roaming, but it is also applicable here and on several other Wi-Fi topics. The relevant portion of the Tweet, and the definition I use for "secret sauce", is as follows:  "[An AP vendor's] secret sauce [is] their special blend of bugs and bastardized standards"

Auto-channel is actually a very hard nonlinear optimization problem, nearly impossible to solve algorithmically.  Each AP only knows about the immediate neighbors it can “hear” via RF, so does not have the perspective of the whole network.  Thus, when one AP makes a change, it ripples to its neighbors, which ripples to its neighbors, and before you know it, changing the channel on one AP triggers channel changes on all APs across the network.    

Many vendors have therefore focused their efforts in recent years on having the controller collect the RF information from all of the APs on the network. Once collected, the controller can then attempt to optimize the channels for each AP across the whole network. This is an approach touted prominently in the marketing literature of several vendors.  Most vendors have only met with “limited success” in practice. Furthermore, such algorithms generally require the computing power of a dedicated hardware controller or cloud controller, so this is generally beyond the ability of "controller-less networks" where one AP on the network acts as a local controller for management and telemetry, though this may vary by vendor.

While this centralized channel optimization approach makes sense conceptually, it is actually a much harder problem than it sounds.  The fundamental problem is that the central controller doesn’t know anything about walls or other structure in the environment, or how the APs are actually placed in the environment relative to one another.  Thus, key information is really missing.  Without such knowledge, the full set of data collected from the APs tends to be misleading if not outright contradictory, making it obviously very hard to optimize in practice.

Transmit power levels are also coupled to channelization.  If you change the transmit power on any AP, the channelization solution you have is likely no longer optimized.  I’ve seen more than one vendor with an auto-power scheme that completely fails epically for even small networks.  In such cases, one AP is driven to its maximum allowed power where all of its neighbors are driven to their minimized allowed power.  Unless there are tight thresholds set (which is not always a feature available from certain vendors), this can create a large imbalance in transmit power levels, leading  to roaming issues in central overlapping areas as well as dead spots on the edge of your network.  Large networks are more likely to disguise this effect, but a 3-4 AP network demonstrates this failure quite vividly. Other vendors will just operate their APs at  maximum transmit power, even with “auto power” enabled.   

Like with channels, each AP only knows about its immediate neighbors and can only control itself.  Many auto-power algorithms have an implicit assumption of symmetry – an AP assumes that neighboring APs hear it as well as it can hear its neighbors.  This is very much not true in practice, even for simple networks.  In such cases, if an AP hears its neighbor over a certain RSSI threshold, it lowers its own volume.  The neighbor, however, is still just as loud, so the AP lowers its own volume further, until the transmit power of the AP reaches its minimum threshold.  Whichever AP happens to engage its auto-power algorithm “last” is not aware of a problem, since all of its neighbors are at sufficiently low (perhaps even TOO low) power levels, and thus the last AP continues to merrily blast away at maximum power.  A better approach would be for an AP hearing its neighbor too loudly to tell its neighbor to "shut up" (i.e. lower its volume), with the neighbor actually listening to and acting upon that request, telling the first AP to also "shut up" if necessary. If the APs are reasonably evenly spaced out in a regularly laid out environment (e.g. hotel, MDU, office building, etc.), such an approach would serve all drop all APs to a reasonably uniform and moderate transmit power level.  Theoretically, a controller-based algorithm collecting this data from all APs on a network could potentially figure this out.  However, again the data collected from large numbers of APs could be misleading or contradictory, depending on the actual environment and the actual (non-)uniformity of the placement of the APs.

Ironically, as a Wi-Fi network designer with a full view of the network, setting up a static channel and static transmit power plan is actually straightforward.  First, you set all of your APs to be at static transmit power levels, then you alternate your channels on both 2.4 GHz and 5 GHz both horizontally and vertically.   Granted in some environments it is not quite that simple, and occasionally transmit power levels need to be tweaked to boost (or reduce) coverage areas, necessitating an iteration of the channels.  Nonetheless, it turns out to be much easier for a properly-trained human to do it with full perspective of the network and the environment than for a computer algorithm lacking both intuition and key pieces of the puzzle.

Philosophically, there are four "degrees of freedom" (i.e. design knobs) that are available in a Wi-Fi design:   (1) Selection of the AP make and model, including its embedded internal antenna or the type of external antenna if the make/model allows (2) Location of the APs in the environment, (3) channel per band, and (4) transmit power per band.   Thus, selection of appropriate channel and transmit power are really part of the design process, and thus part of the control we have as Wi-Fi engineers in the quality of that design.  When electing to use auto-channel and auto-power, a designer cedes control of those two design knobs to an algorithm that itself varies by AP vendor, model, and firmware version.  

My personal recommendation, based on years of painful experience, is that one should always be doing static channel and static transmit power for any predictive model on all projects.  In Ekahau, the APs are all based on static power settings of 8 dBm and 14 dBm when you first select them, no matter the AP vendor or model, and you have to go and explicitly change each one if you want to use different values. Channels default on 1 and 36 for everything in Ekahau, but are easily adjusted.  I know Ekahau has a channelization feature, though I’ve never used it and admittedly I’m too “old school” to ever trust it.  That said, I’m “open to being convinced” that such an algorithm could provide a reasonable starting point.   

That said, one does not actually need Ekahau or any other software model to create a good static channel plan.  I learned how to set static channels appropriately on both 2.4 GHz and 5 GHz years before I ever even was exposed to predictive modeling software like Tamograph or Ekahau.  Spacing out the APs evenly, setting a consistent and moderate transmit power across all APs, and alternating channels horizontally and vertically on each band is usually sufficient, especially for properties with “regular layouts”.  Predictive modeling software is admittedly really useful at fine-tuning your static channel and transmit power plan so as to minimize or eliminate self-interference, but it is not strictly necessary for many environments.

It is also true that channelization takes a fair amount of time and effort, especially on large projects with hundreds or thousands of APs.  For “regular layouts”, there are methods of doing the vertical channel staggering in a fixed pattern; figure out the best pattern on one floor and then come up with a floor by floor alternating scheme, thus letting a spreadsheet compute the channels for the remaining floors. This can save a fair amount of time and effort, but is also something you always want to look at and check for sanity.  Additionally, one does not want all of that work being handed to a competitor in a bidding environment, so it makes sense to go through the channelization exercise only AFTER winning the deal.  

Thursday, December 12, 2019

Why Adding More APs isn't Always Better

Many Wi-Fi experts work on networks with hundreds of APs with thousands of devices, and thus rely heavily upon a vendor's auto-channel and auto-power features (a.ka. radio resource management or RRM).  Such algorithms shouldn't be trusted, though often understanding why can only be done by stepping back to relatively simple examples.

In this case, the example is a small warehouse fulfillment center consisting of 6' tall shelving, as well as some refrigerators, coolers, and walk-in freezers.  The facility is approximately 5,000 square feet, where we have deployed four Meraki MR33 (802.11ac wave 2 2x2:2, w/ internal 3 dBi antennas).  The APs are set to static 20 MHz channels (non-DFS) on both 2.4 GHz and 5 GHz (one AP has its 2.4 GHz radio disabled), and auto-power has been restricted to 8 dBm +/- 3 dB on the 2.4 GHz band and 14 dBm +/- 3 dB on the 5 GHz band.   The site has two SSIDs, one for facilities for Android-based dual-band barcode scanners and other facility PCs and devices, and one for guests, primarily for the personal cell phones and tablets of the employees who work at the facility.  Originally, the SSIDs were both setup for dual-band operation with band steering (i.e. dual band clients are “encouraged” to associate at 5 GHz).

The general manager at one particular site is complaining that the main order tracking PC, as well as a tablet used for employee time tracking, are having frequent disconnects.  Both devices are located in the Dispatch area in the upper left side of the floor plan.

This type of complaint is usually indicative of interference.  Since our own APs are set to static non-overlapping channels (which is in our control), I immediately started looking for external interference (which is out of our control).  

An initial Wi-Fi scan from the APs (using the “Air Marshal” feature of Meraki, but every AP vendor can do this) detected over 120 distinct SSIDs from 3rd party APs in the area.  Most of these are only on 2.4 GHz, and while most were at fairly low power levels (< 5 dB), there were enough at significantly higher levels (20 – 30 dB) to indicate that the 2.4 GHz band is quite saturated with external APs.   Hence, my initial troubleshooting solution was to set the facilities SSID to be on 5 GHz only, as all of their devices are dual-band.   This also eliminates the need for band steering on the facilities network, which can cause some delays and issues in roaming.  (In fairness to Meraki, I have not seen any problems directly related to band steering across approximately 120 such sites, so I’m inclined to believe Merkai band steering is working appropriately.  Nonetheless, it seemed prudent to remove a potential problem source.)

Alas, this change did not make any difference at all to the reported issues.   

Looking at the 5 GHz band, there is a fair amount of 5 GHz non-DFS interference from multiple cable routers (i.e. neighboring cable modems or local public hotspots).   We have already checked to make sure our own cable router has its Wi-Fi disabled, but we cannot do anything about neighboring businesses. I am also seeing a lot of strong 5 GHz interference surprisingly from SSIDs that correspond to Wi-Fi hotspots inside vehicles.  According to Google Earth, there is a nearby auto-dealership within 200 - 300 feet, although for all I know the APs are simply picking up passing cars, as the facility is located on a fairly busy road in a commercial area.  The resolution and the history of such external APs on the Meraki Dashboard is limited, making further diagnostics difficult.

My knee-jerk reaction at this point was to switch the APs over to DFS, which seemed to have no activity.  I temporarily did this, but then thought better of it.  I have been avoiding the use of DFS channels as it takes devices a lot longer to roam because the device must do a passive scan on the DFS channels (52-64, 100-140) vs. 10x – 20x faster active scans on UNII-1 (36-48) and UNII-3 (149-161).  Thus, I’d rather not use DFS channels at these facilities unless it is totally unavoidable. To verify if DFS was justified, I switched to a spectrum analysis of the APs, which showed somewhat surprisingly that, despite the large number of APs in the area, channel utilization on both bands was actually quite low (< 10% - 15%).   Thus, something else entirely is going on.

The other issue that intermittent client disconnects can be a symptom of is the transmit power of one's own networks being far too high.  We are performing predictive models in Ekahau in order to optimize AP placements, though detailed on-site post-deployment active surveys with Ekahau are not being done, due to both cost and time limitations.  Up until recently, we have not been able to get the wall materials from the customer in advance, so we have generally assumed the interior walls are cinder block.  This has proven to be the most typical indoor material at most sites, making it a reasonably conservative assumption.  If the walls are drywall, signal penetration will be better than expected; conversely, if the walls are poured concrete, signal penetration could be worse than expected. 

If the signal from the other APs on the network are really strong, client devices will hear multiple APs on our own network, all at very good signal levels (>> -67 dBm).   In such environments, if the client’s roaming algorithms are not very smart, a client device can wind up roaming between multiple APs on the network just due to minor signal fluctuations.  Alas, with most client devices, (a) they are fairly dumb when it comes to roaming, (b) roaming behavior can change from one firmware version to the next, and (c) as network engineers we ultimately have no control over how a client device roams.  

Fortunately, the spectrum analysis tool on the Meraki dashboard tells us about all of the other APs being seen, including and especially our own.  From AP01 (closest to the Dispatch room), I’m seeing signal from AP02 and AP04 in the mid -50’s dBm, which is really strong.  From AP02, which is reasonably centered in the facility, I can see all of our other APs on 5 GHz in the low -50’s to high -60’s dBm.

Based on the signal levels of our own APs as well as external APs, I must conclude that the walls at this facility are really thin, at least from an RF perspective.  In the predictive model, changing the internal walls from cinder block to drywall indicates that I could ostensibly cover most of the 5000 sq. ft. facility with an RSSI of -67 dBm or better from a single AP, and we have four of them in this space!  

The transmit power on the APs was already turned down fairly low, but I’ve now gone even lower on both 2.4 GHz and 5 GHz to see if that resolves the issue.  I’ve turned down the auto-power ranges to give 5 dBm +/-3 dB on 2.4 GHz and 8 dBm +/- 3 dB on 5 GHz.   Prior settings had been on 8 dBm +/- 3 dB for 2.4 GHz and 14 dBm +/- 3 dB for 5 GHz.  (Unfortunately, the auto-power algorithm on Meraki has proven to be surprisingly poor, generally driving one AP to its maximum and surrounding APs to their minimum.)  If this does not materially improve things, my next option is to start turning off auto-power entirely by setting fixed power values and switching off one or two of the APs entirely.  

Tuesday, January 2, 2018

Adding Customer Value with Systems Engineers

This blog is written based on my experiences in working for a Wi-Fi access point vendor.  However, the material presented here are general lessons that readily apply to any type of business with technology-based products that are sufficiently complex that customers may not necessarily know or understand how to deploy them optimally. 

Technology vendors are typically focused on the making a "product", i.e. a collection of hardware and software to perform specific tasks.  They generally are not focused on how the product gets deployed as part of a larger system.  To add value to customers, however, it is important for the technology vendor to understand the customer's perspective and speak the customer's language.

The Importance of System Engineers (SEs)

Source: http://images.wisegeek.com/guy-working-on-network-servers.jpg

The function of System Engineers is to work with customers on scoping out new projects (pre-sales) and assisting customers with existing deployments (post-sales).  Most technology enterprise equipment vendors call these SEs (System Engineers or Sales Engineers), though some vendors also use the term FAEs (Field Application Engineers). 

Such SEs need to understand how technology (in my case, Wi-Fi) gets deployed and how customers use the vendor’s products. While the SEs obviously need to have an expert understanding in the product portfolio, they don’t need to know how to design or write code for the product itself, but rather how to configure the product and how the products are implemented by customers in the market.

This is fundamentally different from Product Engineers, who need to understand the intricacies of hardware and firmware code and how to weave that into a functioning mass-producible product.

These are very different skill sets, and require different types of engineers.  Many technology vendors struggle because they don’t understand this distinction, and thus are entirely product-focused, lacking a good understanding of how customers actually deploy the products.  Thus, the technology vendors will push product on customers based on a product view of the market and a very limited understanding of what customers actually want and need.  Sometimes the vendors get lucky, and the product is successful.  More often, the product misses the mark in one or more ways, and is a flop in the marketplace.

While product engineers and system engineers need to work closely together, the technical flow needs to be from the customer to the system engineers to the product engineers, so that the product engineers can ultimately create products that will resonate with the customers’ needs. 

Thus, the role of SEs is to build direct and stable personal relationships between themselves and the customers, ideally all of them but at the very least those large "top-tier" customers.  The customers need to know they have a knowledgeable resource to call on when they run into issues.

Simultaneously, the SEs also need to make the customers as self-sufficient as possible, both for pre-sales designs and for post-sales maintenance, so they’re only reaching out to their designated system engineer when they really have a problem. 

Online training has been very popular with technology vendors over the last few years.  It is relatively cheap and easy to implement, but only goes so far.  On-site training, at least partially customized to the customer’s specific needs, helps build both that personal relationship and enhance the customer’s knowledge and self-sufficiency.  Whenever an SE can get face time with a customer, it is incredibly valuable.

Each customer is generally only going to need to focus on small subset of products that the vendor offers.  That said, most customers in a particular market  The vendor must therefore be extremely wary of having too large of a selection of products, especially if their differences are subtle, especially to those who are not themselves engineers (i.e. most customers).

New products and features should ONLY be introduced if they are going to enhance value for customers over existing offerings, or enable the vendor to move “upmarket” to either related companion products (e.g. offering PoE network switches with wireless access points) or products and features that enable the vendor to be attractive in adjacent markets (e.g. higher end products for larger enterprise deployments). 

It is important that the Product Engineers share this vision, and are responsive to customer needs for the introduction of big fixes, new features, and ultimately new products.  System Engineers can only be of limited effectiveness without full support of Product Engineering.

The Importance of Customer Support


Additionally, a strong customer support team is essential in fielding the more routine pre-sales and post-sales questions from smaller customers.  Not only do technology companies need to cultivate a consistent reputation of excellence in service, but at least some of these smaller customers will have the potential to become big customers.  Projects from smaller customers tend to be (but are not necessarily) simpler.  While the customer support team are not necessarily engineers, they need to be appropriately trained in the product and the customer applications of the product that they can answer relatively straightforward questions, and understand when they need to escalate to the engineering team.  The customer support staff also is the breeding / training ground for future system engineers.  

While tools like customer forums and online knowledge databases have their place, they should not be solely relied upon, as some vendors have advocated.  Our customers are human, so when customers have a problem, they want to get a human being, at least on the phone, to help them fix it as quickly as possible.  By the time the customer is calling into Customer Support, they have likely already searched online for a solution and couldn’t find what they needed.  Hence, they’re already aggravated from the original problem and frustrated that they couldn’t find an easy answer, so they are not starting from an “ideal state of mind”.  Alas, most technology vendors have poorly staffed and poorly trained customer support agents, and sometimes even finding the phone number can be an adventure in itself.  Accordingly, exceptional customer support is an area where a vendor can readily distinguish themselves from the competition.

In summary, to be truly successful, a technology vendor needs to understand the needs of their customers and structure their support and product offerings around those customer needs.

Friday, November 24, 2017

Antennas: Why They Matter and When Do You Want to Use APs with Internal vs. External Antennas

Here is a practical guide on when you want to use APs with internal antennas vs. APs with external antennas.   For the purpose of this blog, I'll be using two indoor APs, the EnGenius EAP1300 (internal antenna, ceiling mount) and the EnGenius EAP1300EXT (external antenna, wall or ceiling mount) for demonstrative purposes.  The content, however, applies to any vendor that has APs of comparable specifications with both internal and external antennas, for both indoor and outdoor applications.

What are Antennas

Antennas serve to shape and focus the radio signal in particular directions.   Antennas, therefore, act like a lens for RF frequency.   Every radio system (Wi-Fi, cellular, cordless, walkie-talkie, etc.) requires antennas on both the transmitter and receiver to shape and focus the signal.   Antennas are passive devices and work in both directions - i.e. an antenna equally increases the radio's ability to talk (transmit) and to hear (receive).  

The signal gain (i.e. strength) of an antenna is measured in "dBi", or decibels relative to an isotropic radiator.  An isotropic radiator is defined as a point-source of RF signal where the energy radiates spherically equally in all directions.  Such an antenna cannot physically be built, but it serves as a useful mathematical reference, as such an antenna has no gain, or a gain of 0 dBi.   

Antenna gain, therefore, is based on their deviation from a perfect sphere. A typical "rubber duck" dipole omni-directonal antenna typically has a doughnut-shaped pattern with the antenna sticking through the hole of the doughnut.  As this shape is "roughly spherical", these antennas typically have fairly low gain (i.e. 2 - 3 dBi).  The gain of such an antenna can be increased by lengthening it.  When doing this, you are increasing the energy propagated horizontally by stealing it from the energy propagated vertically.   One can also make directional antennas, which serve to focus the bulk of the RF energy in one particular direction.  Such antennas have very high gains as they deviate dramatically from a perfect sphere.   

Examples of increasing antenna gain.   Top:  Low gain dipole.  Middle:  High gain dipole.  Bottom:  High gain directional.  
A couple of notes on directional antennas: 
  1. Just as antenna designers cannot build a perfect sphere, they cannot build a perfect cone.   As a result, there is some amount of RF energy that is projected and received in the other directions.  These are known as backlobes and sidelobes, as seen in the figure above.   If two neighboring antennas are placed very close together, they can interfere with each other, thus a certain amount of separation distance (at least a few feet) is generally recommended when placing directional antennas next to each other.  
  2. The beamwidth is defined by where the energy of the antenna drops by 3 dBi (i.e. half) of the peak.   Thus, while the gain of the antenna is less beyond this beamwidth, it is also generally not zero, which needs to be accounted for in Wi-Fi design.

There are various types of antennas that can be constructed, as shown below.

Examples of different antenna types, with their typical potential for beamwidth and gain.
Like lenses, antennas are tuned to work at particular frequencies, as the length of the element is a function of the operational wavelength.  In Wi-Fi, this means that you will often see separate 2.4 GHz and 5 GHz antennas; these may look identical on the outside (i.e. the plastic radome that covers the antenna), but are actually different on the inside.  Some antenna manufacturers are able to make "dual-band" antennas that work at both 2.4 GHz and 5 GHz frequencies, though such dual-band antennas generally require a compromise on the gain of the antenna for each frequency.

APs with Internal Antennas

EnGenius EAP1300
(indoor 802.11ac wave 2, 2x2:2, internal 5 dBi / 5 dBi omni-directional antennae)

An indoor ceiling mount AP with internal antennas is generally designed to be mounted on a ceiling, with most of the antenna energy being projected outwards (horizontally) and downwards (vertically).  Such a device can naturally be mounted on the wall, but then the area of coverage changes to project most of the energy both up and down (vertically) and outwards primarily in one direction (horizontally).   Since the internal antennas are fixed, the area of coverage is also fixed based on how the AP is physically mounted.

Differences in coverage area when mounting an AP on a ceiling vs. on a wall.
(Figure source:  Ruckus Wireless™ ZoneFlex™ Indoor Access Point Release 9.5 User Guide)

For most indoor Wi-Fi deployments in environments such as schools, apartment buildings, hotels, offices, etc., the built-in internal antenna of a ceiling mount AP, like the EnGenius EAP1300 depicted here, provides sufficient coverage for the desired area. It also satisfies aesthetics constraints, as people generally do not like seeing external antennas, especially in most indoor environments. Additionally, for 802.11n/ac features like MIMO to work properly to achieve faster speeds, the relative alignment of the antennas to each other is critical.  For an AP with internal antennas, this alignment is fixed at the factory and cannot be altered.

Naturally, however, when you select an AP with an internal antenna, you lose the ability to change that antenna in your design, such as providing larger areas of bi-directional coverage in particular directions.   

APs with External Antennas

EnGenius EAP1300EXT

(indoor 802.11ac wave 2, 2x2:2, external 5 dBi / 5 dBi omni-directional antenna)

When an AP has external antennas, the antennas can naturally be adjusted to fit the coverage area.   A vendor will typically supply omni-directional dipole antennas.   

In this example, both the EAP1300 and EAP1300EXT include 5 dBi antennas.   Accordingly, the gain is the same, and the effective coverage area will be "approximately equivalent" for both models when configured with the same transmit power settings.  I say "approximately" because there will be some subtle differences in the antenna pattern between the internal and external antennas.  

As mentioned above, aesthetics are often a reason to not go with external antennas.   More importantly, however, the MIMO capabilities of 802.11n/ac take advantage of phase offsets between the antennas.  This necessitates that the multiple antennas are at a fixed separation distance from each other so as to be out of phase with each other.   With internal antennas, this phase offset is fixed at the factory and cannot be changed.  For external antennas, the relative alignment of the antennas can be easily altered, either during install or during the AP's normal lifecycle, which will corrupt the MIMO performance and therefore the ultimate performance of the AP.   

The primary advantage of an AP with external antennas, such as the EnGenius EAP1300EXT, is that you can replace the included antennas and use either higher gain omni-directional or directional antennas.  The antenna that you select will be highly dependent on the particular application.  Typically, external antennas are useful for applications where additional range is critical in a specific direction, such as when covering warehouse aisles or an outdoor area with limited AP / antenna mounting options.  In such applications, external sector or patch antennas are usually suitable. For MIMO APs, sector and patch antennas also have their individual antenna elements pre-aligned and fixed by design internally, thus meeting both aesthetic and MIMO constraints.

Takeaway Message

I generally only recommend using an AP with external antenna ports in cases where the dipole antennas packaged with the AP are NOT actually going to be used, but instead are going to be replaced with an external directional antenna, such as a sector or patch. If omni-directional coverage is the requirement, APs with internal antennas are typically the most suitable.

Monday, October 16, 2017

A Simple Explanation of the KRACK WPA2 Security Vulnerability

What Has Happened

Security researchers have discovered a weakness in the Wi-Fi Protected Access 2 (WPA2) protocol that is used in all modern Wi-Fi networks.  A malicious attacker in range of a potential unpatched victim can exploit this weakness to read information that was previously assumed to be safely encrypted.  The vulnerability is within the Wi-Fi IEEE 802.11 standard itself, and is therefore not unique to any particular access point or client device vendor.  It is generally assumed that any Wi-Fi enabled device is potentially vulnerable to this particular issue.

A Summary of How WPA2 Security Works

WPA2-AES security consists of both authorization and encryption.   The authorization step is used to determine whether a particular client is allowed to access the wireless network, and comes in two flavors, Personal and Enterprise.   In WPA2-AES Personal, a pre-shared key or passphrase is used to provide the essential identifying credential.  In WPA2-AES Enterprise, the Extensible Authentication Protocol (EAP) is used to validate the client credentials against an external RADIUS or Active Directory server.  In either the WPA2-AES Personal or WPA2-AES Enterprise scenario, once the client’s authorization credentials are validated, a unique set of encryption keys are established between that particular access point and that particular client device, so as to encrypt the traffic between them.  This encryption process is done via a four-way handshake, where particular temporal (i.e. temporary) keys are passed back and forth between the access point and the client device so that each can derive the appropriate unique encryption key pair used for that connection.

A Summary of the Vulnerability

The security researchers discovered that they can manipulate and replay the third message in the four-way handshake to perform a key reinstallation attack (KRACK).  Strictly speaking, each temporal key that is passed in the four-way handshake should only be used once and never re-used.  However, in a key reinstallation attack, the attacker pretends to be a valid access point and tricks the client device into reinstalling a temporal key that is already in use, serving to reset the transmit and receive packet numbers.  For WPA2-AES, the attacker can then derive the same encryption key as the client device, and thus decode upstream traffic from the client device to the access point.  For the older (and less secure) WPA-TKIP, the attacker can go even further, and potentially forge and inject new packets into the data stream.

For an attack to be carried out to take advantage of this vulnerability, it must be done by a malicious actor conducting a man-in-the-middle attack (i.e. pretending to be an AP on your network and serving to be a relay between the client device and the legitimate wireless network).

How this Vulnerability Impacts Access Point Products and Networks

As the issue occurs on client devices, the first step for any network operator is to check with your client device manufacturers for security patches and updates and apply these updates as soon as they are available.

This particular vulnerability has no direct impact on any APs operating in “access point” mode.  However, access points that are being used as client devices (i.e. APs operating in “client bridge” mode) or any access points that are being used for point-to-multipoint communications (i.e. APs operating in “WDS bridge” or “WDS station” mode) are potentially impacted by this vulnerability in the IEEE 802.11 protocol.  Furthermore, some advanced applications and features, such as mesh networking and fast roaming (i.e. 802.11r), may also be potentially vulnerable to this issue.

Access point vendors are currently actively investigating the impact of this vulnerability across all of the products in our product portfolio, and will be issuing firmware releases in the coming days and weeks to address this issue.  In the interim, continue to use WPA2-AES Personal or WPA2-AES Enterprise for network security.  Do not use WEP and do not use WPA-TKIP, as the vulnerabilities of those deprecated security protocols are significantly more serious and easier to execute by a malicious attacker.

For More Information

The website https://www.krackattacks.com/ provides a detailed summary of the issue along with links to the research paper and tools detailing the vulnerability.

Friday, September 8, 2017

Oversubscription Ratios and the Types of Bandwidth Throttling

When we build networks, we need to allocate the available bandwidth amongst the client device population in, hopefully, a reasonably fair and equitable manner such that all users are happy (or at least not complaining).    We use bandwidth throttling for this purpose.  

Without bandwidth throttling, one or two abusive users could use applications like BitTorrent and consume the overwhelming majority of the available Internet bandwidth, leaving very little bandwidth for all of the remaining users on your network.  

Bandwidth Throttling 

Generally, three are two types of bandwidth throttling available on network equipment:

  • Per-User Bandwidth Throttling:   This limits the maximum amount of Internet bandwidth that each client device can consume
  • Per-Subnet/VLAN Bandwidth Throttling:  This limits the aggregate maximum amount of internet bandwidth that all client devices on the subnet / VLAN can consume at one time.
By means of a demonstrative example, let’s assume we have a subnet / VLAN with 5 client devices connected.  If the bandwidth throttling is 10 Mbps / 10 Mbps per user, then each user could potentially consume 10 Mbps / 10 Mbps simultaneously, making the total potential consumption the sum, or 50 Mbps / 50 Mbps.   Alternatively, if the bandwidth throttling is 10 Mbps / 10 Mbps per subnet / VLAN, then all users on that subnet / VLAN have to share a 10 Mbps / 10 Mbps bandwidth allocation, meaning each user would get 2 Mbps / 2 Mbps on average, and this average would decrease as more users connect to that VLAN / subnet.

In general, per-user bandwidth throttling is what you want in most practical circumstances.   Obviously, if there are too many users and/or the allocated bandwidth per user is set too high, you eventually run out of Internet bandwidth.

So how do you decide what limits are appropriate?   It ultimately depends on the type of network you are operating (i.e. its requirements) and the total amount of Internet bandwidth you have available (i.e. its constraints).  However, this can be treated quantitatively by using an oversubscription ratio.

Oversubscription Ratio

Oversubscription is a concept that dates back to very early telephony.  Statistically, not all connected users will actually consume their maximum available bandwidth at any particular instant of time.  For example, if we have 5 users and each of them has a per-user bandwidth cap of 10 Mbps / 10 Mbps, it is statistically unlikely that any of them, let alone all of them, will actually be consuming 10 Mbps / 10 Mbps simultaneously.   Most network applications are bursty in nature, meaning that your actual consumption is constantly fluctuating and rarely hitting the maximum allocation.  (Video streaming is, naturally, an important exception to this, as that consumes bandwidth at a fairly constant rate for an extended period of time.  That said, even today only a fraction of devices that are connected to your network are likely to be streaming video at a given instant.)   

Thus, as a service provider, I do not need to supply the additive sum in terms of bandwidth (i.e. # users * promised bandwidth per user), but rather some fraction thereof.  That fraction defines the oversubscription ratio.   

Unfortunately, setting the oversubscription ratio is an empirical exercise, and over time, the oversubscription ratio tends to decrease as more devices, each consuming more bandwidth, are connecting to your networks.  In the pre-smartphone days, a 30:1 or even 40:1 was common for most wired / wireless networks.  

The common oversubscription ratio I use for regular network usage (e.g. hotel, apartment building, etc.) is 20:1.  This would mean that if I promised 200 users each a 10 Mbps / 10 Mbps data rate (and throttled them each to that rate), I could get away with only providing a 100 Mbps Internet bandwidth connection.  The math is as follows:  200 users * 10 Mbps/user * 1/20 = 100 Mbps.  At any instant in time, the average consumption would be 10 Mbps/user / 20 = 500 kbps per user. In reality, some are obviously consuming more, while others will be consuming less (even 0).  

For student housing, which is fairly heavy network utilization, I typically use a 10:1 oversubscription ratio.   For larger high density environments, (e.g. conference centers, event spaces, etc.) you will have a few devices that are doing video streaming, but most attendees will be connected but are not likely to be heavily utilizing their devices.  I therefore typically use a 15:1 oversubscription ratio.

Determining Appropriate Bandwidth Throttling Values

In reality, one is generally constrained by the total amount of bandwidth available, as that is the most expensive part of your network.  Thus, the real calculation is to determine the appropriate bandwidth throttling per user that should be used.  To determine this, one needs to know the peak number of expected users and the bandwidth available.   

As an example, let's assume an event space where we are expecting 500 users and have a 300 Mbps / 300 Mbps Internet circuit available.   Using the 15:1 oversubscription ratio, for 500 users this comes out to a sustainable average service level of 9 Mbps (i.e. 300 Mbps / 500 users * 15:1 oversubscription = 9 Mbps / user).   

 Of course, in reality complex networks are an Animal Farm (i.e. while all client devices are equal, some client devices are “more equal” than others).  Thus, different classes of users will require different levels of service.


In most commercial environments, it is vitally important to operations that they have sufficient bandwidth available, though they usually represent a small fraction of the total number of clients. This is one very good use of having multiple VLANs / subnets, as you can put your different classes of users on to different VLANs, and then allocate bandwidth both per VLAN and per user accordingly.  Where operations activity is critical, we need to provide this small but more important operations segment of the client device population a higher per-user bandwidth allocation, and give the (proletariat) visitors a lower per-user bandwidth allocation.   

It is also useful to have two layers of bandwidth throttling.  The first layer is bandwidth throttling per VLAN / subnet.  For example, limit the guest network to 80% of the total bandwidth, ensuring that the staff / operations network(s) will always have access to at least 20% of the Internet bandwidth, no matter how crowded the guest network becomes.  The second layer is bandwidth throttling per user, to ensure that no abusive user on any VLAN / subnet can take up all of the bandwidth allocated to that VLAN / subnet.

Tuesday, August 15, 2017

The Emergence of Tri-Band APs

In a former blog post, I discussed the limitations of MU-MIMO and hinted at the pending emergence of a competing technology for high-density deployments, called "tri-band".   In this post, I'll be again comparing the technologies and encouraging the use of tri-band APs for high-density deployments.

What is Tri-Band?

Strictly speaking, this AP technology should be called "tri-radio", as two of the radios in the access point are on the 5 GHz band.  Essentially, a tri-band access point is just two co-located 5 GHz 802.11ac wave 1 APs in one box.   The tri-band AP has one 2x2:2 stream 2.4 GHz radio (IEEE 802.11b/g/n) and two 2x2:2 stream 5 GHz radios (IEEE 802.11a/n/ac), with one 5 GHz radio locked on the low portion of the band (channels 36-64) and the other 5 GHz radio locked on the high end of the band (channels 100-165).   

Tri-band is technically not part of the IEEE 802.11 standard.  Broadcom developed the original chipset for this in mid 2015, and QualcommAtheros recently introduced their own version.  Tri-band APs are intended for high density environments (e.g. lecture halls, conference centers, auditoriums, concert halls, etc.) and thus compete directly with IEEE 802.11ac wave 2 with MU-MIMO.  The tri-band approach, however, has several advantages over MU-MIMO.   Nonetheless, there are still very few vendors who have introduced tri-band access points to the market, despite their obvious advantages.   

Why is Tri-Band Better than MU-MIMO?

MU-MIMO requires the use of beam forming, which is a technique used to create particular zones of constructive and destructive interference at particular locations.  By maximizing the signal for each client device at the client devices location (and minimizing the signal for the other clients at each client’s location), a MU-MIMO AP can talk downstream to multiple client devices.   

Multi-User Multi-In Multi-Out (MU-MIMO)

The MU-MIMO technique requires the AP to know the position of the client devices (relative to itself).  The AP gathers that information by periodically transmitting “sounding frames”, essentially tones off of each AP antenna.  Compatible client devices will respond by sending a matrix indicating how well the client device heard the tone from each antenna.  Based on that matrix, the AP can calculate the relative position of the client device.    

MU-MIMO has the following limitations, which do not exist with the tri-band approach:

  1. Increased overhead:  The sounding frames and their responses consume airtime.  While this is less than the presumptive gains of talking to multiple client devices simultaneously, it does indicate a loss.  Most MU-MIMO access points only get a 1.7x - 2.2x increase in speed when talking downstream to three compatible client devices. 
  1. Client device compatibility:  The client devices need to be compatible with MU-MIMO in order to understand the sounding frames and to send the appropriate response.  As of August 2017, there are still surprisingly few MU-MIMO compatible client devices on the market.  There are some USB dongles available for PCs.  The flagship mobile client device for MU-MIMO had been the Samsung Galaxy Note 7, which failed in the market for unrelated incendiary reasons.  The Apple iPhone 7, while originally rumored to support it before its launch, quietly did not support MU-MIMO.  Given Apple's notorious secrecy, we still don't know whether or not the upcoming Apple iPhone 8 will or will not support MU-MIMO.

  1. Client separation:  MU-MIMO requires that the client devices it talks to simultaneously must be physically separated from each other.  If the client devices are in too-close proximity, the beam forming won’t be able to successfully maximize the signal at one client and minimize the signal of the other (neighboring) clients.  

  1. Downstream only:  MU-MIMO only works for downstream traffic, from the AP to the client device(s).  Upstream traffic from each client device to the AP must still happen one at a time, otherwise the AP will hear multiple client devices at once and won’t be able to distinguish between them.
In comparison, all 5 GHz Wi-Fi clients can communicate with the tri-band AP as they would with any other conventional access point, so it is backwards compatible with all current and future Wi-Fi client devices.  There is also no additional overhead on the channel, as sounding frames are not required, and the positions of the two 5 GHz client devices, both relative to the AP and to each other, doesn't matter.  Additionally, since the 5 GHz clients and the channels are independent, the traffic to each client can occur simultaneously in both directions.  The AP itself uses an internal mechanism called “client steering” to encourage 5 GHz clients to connect to one or the other 5 GHz radio, so as to balance the load across the two 5 GHz radios. 

The Takeaway Message

A current four-stream MU-MIMO access point can talk simultaneously to 2-3 compatible client devices on the 5 GHz band downstream, sometimes.   A two-stream tri-band access point, by comparison, can talk simultaneously to any two client devices on the 5 GHz band both downstream and upstream, all the time.