Experienced extrem slow SAN traffic? It may be caused by severe latency bottleneck detected error on ISL link, trunking, or even a single port link.

How to diagnostic severe latency bottleneck detected error over SAN link? Here is one example shows how to troubleshoot SAN latency bottleneck issue.

Detect and check the error

If you don't have monitorying alert configured at you site, more likely you will get noticed by traffic drop over the port(traffic on single port, traffic between SAN switches, etc..) and application slowness. After other components check(server, HBA etc..), here are two from  troubleshooting commands for SAN switch diagnostics.

errdump

...
2016/01/20-08:04:32, [AN-1010], 297, FID 128, WARNING, san48b-5-sw2, Severe latency bottleneck detected at slot 0 port 10.
2016/02/04-13:14:04, [AN-1010], 298, FID 128, WARNING, san48b-5-sw2, Severe latency bottleneck detected at slot 0 port 10.

porterrshow

FID128:admin> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss 
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig  
 ...
  8:    0      0      0      0      0      0      0      0      0      0      0      0      0    
  9:    2.3g   2.7g   0      0      0      0      0      0      0      0      2      0      2  
 10:  249.0m  35.3m  59.2k  59.1k  59.0k   0      0     32     70.0k 816      0      0      0    
 11:  247.3m  27.3m   0      0      0      0      0      0      0      0      0      0      0   
 ...

In above output, as you can see port 10 enc_in,enc_out, crc err, crc g_eof all show high error counts.

Diagnostics

There could be different reasons that can cause SAN switch severe latency issue, first, make sure you don't have bandwidth issue, especially when using ISL/trunking.

Note: when SAN latency issue appears, mostly you won't see the port or links saturation. In opposity, the traffic will drop to very low due to errors.

In this case, according to the Brocade SAN switch porterrshow counters explaination, enc_in/enc_out more likely endicate external probem, cable or SFP.

This is a sign of a hardware problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.
enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply GBIC/SFP problem

Same as crc err and crc g_eof

- A mathematical formula generates counters at the sending port. The receiving port uses the same formula to check and compare. Generally speaking. crc_err and enc_out errors together imply GBIC/SFP problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.

Solve the problem

So, clearly, this case is related with cable or SFP, but not rush to replace cable yet. First, check the cable if there is a sharp curve, if there is , then recable it, clean it with professional tools. Followed by the following actions on both SAN switches.

portdisable <p>
slotstatsclear 
portenable <p>
FID128:admin> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy    c3timeout    pcs
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig                   tx    rx     err
 10:    2.4g   2.2g   0      0      0      0      0      0      0      1      0      0      0      0      0      0      0      0
 11:    1.8g  80.2m   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0

If doesn't solve the problem, change cable, then SFP to fix the problem.