Links:

Home

Contact us

Samples

CORES

Pseudo Random binary sequence (PRBS) and their Verilog implementation using a linear feedback shift register (LFSR) ultimately to connect to an SFI5 interface.

A PRBS sequence often come in handy in FPGA work. They're almost perfectly balanced in other words the duty cycle or one to zero ratio is very close to even. Although the pattern is known and predictable, it makes a good substitute for a true random number. In this discussion we will be talking about 'maximal LFSR' which produce a maximum length sequence. A PRBS is sometimes also referred to as a Pseudo random number sequence or PN. Usually the sequence is referred to by it's length such as a PN7 has a pattern length of 27-1

So lets start with an example for a PN7. To generate an LFSR we need a shift register and one or a few xor gates.

module prbs_generate ( // Outputs prbs, // Inputs clk, reset ); output prbs; input clk, reset; parameter PN = 7, TAP1 = 6, TAP2 = 5; reg [PN-1:0] prbs_state; wire prbs; integer d; assign prbs = prbs_state[PN-1]; always @ (posedge clk) if (reset) prbs_state <= 1; //anything but the all 0s case is fine. else prbs_state <= {prbs_state,prbs_state[TAP1]^prbs_state[TAP2]}; endmodule // prbs_generate

You can check out the Xilinx App Note for a similar approach using their SRLs. But we have more, so keep reading.

This code generates a PN7 by taking two 'taps' as they're called from the shift register, exclusive oring them together and feeding them back into the shift register. Left to run on it's own, it will generate the a 127 bit long pattern, and then repeat. In that pattern we have 64 1s and 63 0s. It is always one zero that is missing from the perfect balance. The pattern also contains all of the 7 bit combinations, except the all 0s case. This gives us a maximum run length of 1s as 7 bits, and the maximum run length of 0s as 6 bits. This is useful to know if you are working on a link that has a run length limit due to AC coupling for example. The all 0s case is a special case that should be avoided as if you start there you will never leave.

Next is optional depending on what you're doing. If you're looking at just checking the eye pattern or running an arbiter, this is fine, but you may want to check the data coming back. So we ned a checker. We start by building another PRBS machine, but this time we add the ability to load the state of the prbs to match the incoming data. When do you load, well you have a choice, you can load it with an external siganl but more commonly I find it easiest to check the error rate and if above a certain rate then reload you're probably lost. This way "It just works."

 


module prbs_check (
   // Outputs
   error,
   // Inputs
   clk, reset, prbs
   );

   output error;
   input  clk, reset, prbs;

   parameter         PN = 7,
                  TAP1 = 6,
                  TAP2 = 5;

   reg                        error;
   reg [PN-1:0] prbs_state;
   integer         d;
   reg [PN-1:0] prbs_pipe;
   reg [7:0]         error_pipe;
   reg                 load;
   reg [3:0]         check;

   always @ (posedge clk)
     if (reset)
       begin
          check <= 4'h0;
          d <= 0;
          error <= 1'h0;
          error_pipe <= 8'h0;
          load <= 1'h0;
          prbs_pipe <= 0;
          prbs_state <= 0;
          prbs_state <= 1;
          prbs_pipe <= 1;
       end // if (reset)
     else
       begin
          prbs_pipe <= {prbs_pipe,prbs};
          prbs_state <= (load) ? prbs_pipe : {prbs_state,prbs_state[TAP1]^prbs_state[TAP2]};
          error <= prbs_state[PN-1] ^ prbs;
          error_pipe <= {error_pipe,error};
          check = 0;
          for (d=0;d<8;d=d+1)
            check = check + error_pipe[d];
          load <= (check > 3);
       end // else: !if(reset)
endmodule // prbs_check

 

But what if we want to run faster than the FPGA clock and produce multiple bits per clock cycle that may be fed into a mux for higher data rates or if we just need more than one bit per clock for whatever we're doing. This time we're using a PN31.

module prbs_wide_generate (
   // Outputs
   prbs,
   // Inputs
   clk, reset
   );

  parameter         WIDTH = 128,
                  PN = 7,//not used but good to know
                  TAP1 = 30,
                  TAP2 = 27;

   output [WIDTH-1:0] prbs;
   input               clk, reset;

   reg [WIDTH-1:0]        prbs;
   reg [WIDTH-1:0]    d;//d is a temp variable
 

   always @ (posedge clk)
     if (reset)
       prbs <= 1; //anything but teh all 0s case is fine.
     else
       begin
          d = prbs; //blocking assignment used on purpose here
          repeat (WIDTH) d = {d,d[TAP1]^d[TAP2]};//again blocking is intentional
          prbs <= d;
       end // else: !if(reset)
endmodule // prbs_wide_generate
    

There are a few things to note in this code. Even though the PN31 only needs 31 bits of state, we need to have a much wider register to store the old state because we're calculating from much older bits than just the past 31. More confusing is the use of blocking assignments. The few lines of code between the begin end use both blocking and blocking. The idea is to have the complete 128 bits calculated all in one clock cycle. I like to think of it as spinning the generator 128 times per clock cycle. If we used non-blocking assignments, we would only have one rotation per clock cycle. This way, the current state is transferred to 'd', spun 128 times, and then transferred back to 'prbs'. Note you may get a warning on the register 'd' because it optimizes out in synthesis.

We will assume we have the above generator driving our signal that we want to check.

module prbs_wide_check (
   // Outputs
   error,
   // Inputs
   prbs, clk, reset
   );

  parameter       WIDTH = 128,
                  PN = 7,//not used but good to know
                  TAP1 = 30,
                  TAP2 = 27;

   output [8:0] error;
   input [WIDTH-1:0]  prbs;
   input               clk, reset;

   reg [8:0]                error;
   reg [WIDTH-1:0]    prbs_state, check, d;//d is a temp variable
   reg                       load;
   integer               i;

   always @ (posedge clk)
     if (reset)
       prbs_state <= 1; //anything but teh all 0s case is fine.
     else
       begin
          d = prbs_state; //blocking assignment used on purpose here
          repeat (WIDTH) d = {d,d[TAP1]^d[TAP2]};//again blocking is intentional
          prbs_state <= d;
          check <= prbs ^ prbs_state;
          d = 0;
          for (i=0;i<= d;
          load <= error > 25; //error rate to reload
       end // else: !if(reset)
endmodule // prbs_wide_check

This checker has an automatic reload built in. If you want to have an external load you can 'or' in an additional external load. Or you can remove the automatic reload and only reload based on an external signal.

The reset block is only for mostly for simulations. With the auto load, the checker will automatically align to the incoming data. The calculations for the error signals and reload have a few pipe stages that are optional depending on the speed you are running at.

Well that's great, but what if we want to go even faster like in a PRBS for a 40G link with an SFI5 interface, and still be able to make timing. It all comes down to one sentance at the heart of the explanation. A sub-sampled PRBS has the same pattern, but with a phase shift. So if a PRBS stream was going by and you only grabbed every other bit and checked the pattern, you would find you have the same pattern, but you would be at a different point in the pattern. If you take every 3rd bit, or every 57th bit, still the same thing. So what that means is we can run several smaller PRBS generators in parallel, and along as we get the alignment right, they will mux to a higher bit rate PRBS of the same pattern.

To begin, we build an 8 bit PRBS generator. This time I'm going to put in a few patterns with a pattern select. Athough we only need 8 bits in the output, we need up to 31 bits of state so the internals are running wider.

module tx_prbs_8 
  (
   // Outputs
   o_prbs_data,
   // Inputs
   i_clk, i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7,
   i_load, prbs_select
   );

   localparam WIDTH = 8;
   input              i_clk; 
   input [31:0]       i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7; 
   output [WIDTH-1:0] o_prbs_data; 
   input               i_load;
   input [1:0]               prbs_select;


   reg                       load;
   reg [31:0]               prbs_data_15, prbs_data_23, prbs_data_31, prbs_data_7, d;
   reg [WIDTH-1:0]    o_prbs_data;

   always @ (posedge i_clk)
     begin
        load <= i_load; //buffer for fanout
        if (load)
          begin
             prbs_data_15 <= i_ld_data_15;
             prbs_data_23 <= i_ld_data_23;
             prbs_data_31 <= i_ld_data_31;
             prbs_data_7 <= i_ld_data_7;
          end
        else
          begin
             d = prbs_data_15;
             repeat (WIDTH) d = {d,d[14]^d[13]};
             prbs_data_15 <= d;
             d = prbs_data_23;
             repeat (WIDTH) d = {d,d[22]^d[17]};
             prbs_data_23 <= d;
             d = prbs_data_31;
             repeat (WIDTH) d = {d,d[30]^d[27]};
             prbs_data_31 <= d;
             d = prbs_data_7;
             repeat (WIDTH) d = {d,d[6]^d[5]};
             prbs_data_7 <= d;
          end
        case (prbs_select)
          0: o_prbs_data <= prbs_data_7[WIDTH-1:0];
          1: o_prbs_data <= prbs_data_15[WIDTH-1:0];
          2: o_prbs_data <= prbs_data_23[WIDTH-1:0];
          3: o_prbs_data <= prbs_data_31[WIDTH-1:0];
        endcase // case(prbs_select)
        
     end// @ (posedge clk)
endmodule // tx_prbs_8 
Great, but now we have to instantiate 16 of them, and have their phase be interleaved to match the SFI5 interleaving pattern. Using 8 bits per channel to be handed to the SERDES, and 16 channels, means we need 512 bits of PRBS per clock. For the initialization, we start with a non-zero number and then run it through a PRBS geneator 512 times. Since the output never changes, synthesis will optimize this away. prbs_init_7, prbs_init_15, prbs_init_23, prbs_init_31 are the 4 initial values we use to preset the 16 PRBS generators to be synced up/interleaved to give a full speed (40Gbs) pattern. The next step is to stripe the data across the generators. This is done in a for loop. The outputs are in the form prbs_f_ld_7, where the f is the generator number, and the 7 is the pattern type. Although there's a bunch of code, again, it never changes so it should synthesis to nothing. Finally the 16 PRBS generators are instantiated with their 4 initial values for the 4 patterns.
module tx_prbs (/*autoarg*/
   // Outputs
   o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c,
   o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8,
   o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4,
   o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0,
   // Inputs
   i_clk, i_reset, prbs_select
   );

   input i_clk;
   input i_reset;
   input [1:0] prbs_select;
   output [7:0] o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c,
                o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8,
                o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4,
                o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0;

   reg [511:0]         d,prbs_init_15, prbs_init_23, prbs_init_31, prbs_init_7; //d needs to be a reg as integers are only 32 bits

   reg [31:0]         d_7, d_15, d_23, d_31;
   reg [31:0]         prbs_ld_7[0:15];
   reg [31:0]         prbs_ld_15[0:15];
   reg [31:0]         prbs_ld_23[0:15];
   reg [31:0]         prbs_ld_31[0:15];
   wire [7:0]         prbs_data[0:15];

   integer         i,j;

   always @ (posedge i_clk)
     begin
        /*****************************************************
         * first we generate 512 bits of each of the patterns for initialization
         * ***************************************************/
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[6]^d[5])};//prbs7 inverted
        prbs_init_7 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[14]^d[13])}; //prbs15
        prbs_init_15 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[22]^d[17])}; //prbs 23
        prbs_init_23 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[30]^d[27])}; //prbs31
        prbs_init_31 <= d;
        /******************************************************
         * then we stripe them across the generators
         * ********************************************/
        for (i=0;i<16;i=i+1)
          begin
             for (j=0;j<32;j=j+1)
               begin
                  d_7[j]  = prbs_init_7[j*16  + i];
                  d_15[j] = prbs_init_15[j*16 + i];
                  d_23[j] = prbs_init_23[j*16 + i];
                  d_31[j] = prbs_init_31[j*16 + i];
               end // for (j=0;j<32;j=j+1)
             prbs_ld_7[i]  <= d_7;
             prbs_ld_15[i] <= d_15;
             prbs_ld_23[i] <= d_23;
             prbs_ld_31[i] <= d_31;
          end // for (i=0;i<16;i=i+1)
     end // always @ (posedge i_clk)
   /***********************************************
    * then we instantiate the generators
    * *********************************************/
   generate
   genvar k;
      for (k=0;k<16;k=k+1)
        begin: tx_prbs_8_inst
           tx_prbs_8 tx_prbs_8_local 
             (.i_clk(i_clk), 
              .i_ld_data_7(prbs_ld_7[k]), 
              .i_ld_data_15(prbs_ld_15[k]), 
              .i_ld_data_23(prbs_ld_23[k]),
              .i_ld_data_31(prbs_ld_31[k]),
              .o_prbs_data(prbs_data[k]), 
              .i_load(i_reset), 
              .prbs_select(prbs_select));
        end
   endgenerate

   //and connect the generator outputs to the module outputs
   assign o_prbs_data_0 = prbs_data['h0];
   assign o_prbs_data_1 = prbs_data['h1];
   assign o_prbs_data_2 = prbs_data['h2];
   assign o_prbs_data_3 = prbs_data['h3];
   assign o_prbs_data_4 = prbs_data['h4];
   assign o_prbs_data_5 = prbs_data['h5];
   assign o_prbs_data_6 = prbs_data['h6];
   assign o_prbs_data_7 = prbs_data['h7];
   assign o_prbs_data_8 = prbs_data['h8];
   assign o_prbs_data_9 = prbs_data['h9];
   assign o_prbs_data_a = prbs_data['ha];
   assign o_prbs_data_b = prbs_data['hb];
   assign o_prbs_data_c = prbs_data['hc];
   assign o_prbs_data_d = prbs_data['hd];
   assign o_prbs_data_e = prbs_data['he];
   assign o_prbs_data_f = prbs_data['hf];

endmodule // tx_prbs


But wait, We need this to hook up to SFI5 which requires an additional deskew channel. No problem. We need to ship out 8 bytes, of each channel followed by a frame marker and an expansion header. This module wraps the tx_prbs and adds the deskew.

 

module sfi5_tx 
  (
   // Outputs
   o_ch_0, o_ch_1, o_ch_2, o_ch_3, o_ch_4, o_ch_5, o_ch_6, o_ch_7,
   o_ch_8, o_ch_9, o_ch_a, o_ch_b, o_ch_c, o_ch_d, o_ch_e, o_ch_f,
   o_ch_deskew, deskew,
   // Inputs
   i_clk_in, i_reset, prbs_select
   );

   input                i_clk_in;
   input                i_reset; 
   input [1:0] prbs_select;
   output [7:0]         o_ch_0;
   output [7:0]         o_ch_1;
   output [7:0]         o_ch_2;
   output [7:0]         o_ch_3;
   output [7:0]         o_ch_4;
   output [7:0]         o_ch_5;
   output [7:0]         o_ch_6;
   output [7:0]         o_ch_7;
   output [7:0]         o_ch_8;
   output [7:0]         o_ch_9;
   output [7:0]         o_ch_a;
   output [7:0]         o_ch_b;
   output [7:0]         o_ch_c;
   output [7:0]         o_ch_d;
   output [7:0]         o_ch_e;
   output [7:0]         o_ch_f;
   output [7:0]         o_ch_deskew;
output deskew;


   reg                         reset;
   reg [7:0]                 o_ch_0;
   reg [7:0]                 o_ch_1;
   reg [7:0]                 o_ch_2;
   reg [7:0]                 o_ch_3;
   reg [7:0]                 o_ch_4;
   reg [7:0]                 o_ch_5;
   reg [7:0]                 o_ch_6;
   reg [7:0]                 o_ch_7;
   reg [7:0]                 o_ch_8;
   reg [7:0]                 o_ch_9;
   reg [7:0]                 o_ch_a;
   reg [7:0]                 o_ch_b;
   reg [7:0]                 o_ch_c;
   reg [7:0]                 o_ch_d;
   reg [7:0]                 o_ch_e;
   reg [7:0]                 o_ch_f;
   reg [7:0]                 o_ch_deskew;


   reg [7:0]                 channel_0;                        
   reg [7:0]                 channel_1;                        
   reg [7:0]                 channel_2;                        
   reg [7:0]                 channel_3;                        
   reg [7:0]                 channel_4;                        
   reg [7:0]                 channel_5;                        
   reg [7:0]                 channel_6;                        
   reg [7:0]                 channel_7;                        
   reg [7:0]                 channel_8;                        
   reg [7:0]                 channel_9;                        
   reg [7:0]                 channel_a;                        
   reg [7:0]                 channel_b;                        
   reg [7:0]                 channel_c;                        
   reg [7:0]                 channel_d;                        
   reg [7:0]                 channel_e;                        
   reg [7:0]                 channel_f;
   reg [7:0]                 channel_deskew;

   reg [7:0]                 deskew_counter;


   wire [7:0] prbs_0,
              prbs_1,
              prbs_2,
              prbs_3,
              prbs_4,
              prbs_5,
              prbs_6,
              prbs_7,
              prbs_8,
              prbs_9,
              prbs_a,
              prbs_b,
              prbs_c,
              prbs_d,
              prbs_e,
              prbs_f;
              
reg deskew, deskew_int;


   tx_prbs tx_prbs_1
     (
      // Outputs
      .o_prbs_data_f                        (prbs_f),                 // Templated
      .o_prbs_data_e                        (prbs_e),                 // Templated
      .o_prbs_data_d                        (prbs_d),                 // Templated
      .o_prbs_data_c                        (prbs_c),                 // Templated
      .o_prbs_data_b                        (prbs_b),                 // Templated
      .o_prbs_data_a                        (prbs_a),                 // Templated
      .o_prbs_data_9                        (prbs_9),                 // Templated
      .o_prbs_data_8                        (prbs_8),                 // Templated
      .o_prbs_data_7                        (prbs_7),                 // Templated
      .o_prbs_data_6                        (prbs_6),                 // Templated
      .o_prbs_data_5                        (prbs_5),                 // Templated
      .o_prbs_data_4                        (prbs_4),                 // Templated
      .o_prbs_data_3                        (prbs_3),                 // Templated
      .o_prbs_data_2                        (prbs_2),                 // Templated
      .o_prbs_data_1                        (prbs_1),                 // Templated
      .o_prbs_data_0                        (prbs_0),                 // Templated
      // Inputs
      .i_clk                                (i_clk_in),                 // Templated
      .i_reset                                (reset),                 // Templated
      .prbs_select                        (prbs_select[1:0]));


   always @ (posedge i_clk_in)        
     begin
        reset <= i_reset;
          if (reset)
            deskew_counter <= 0; //for sim only                
          else
            deskew_counter <= (deskew_counter == 'h87) ? 0 : deskew_counter + 1;
        //deskew_counter_p1 <= deskew_counter;
        deskew_int <= (deskew_counter[7:4] == 'h7);
        deskew <= deskew_int;
          channel_0 <= prbs_0;
          channel_1 <= prbs_1;
          channel_2 <= prbs_2;
          channel_3 <= prbs_3;
          channel_4 <= prbs_4;
          channel_5 <= prbs_5;
          channel_6 <= prbs_6;
          channel_7 <= prbs_7;
          channel_8 <= prbs_8;
          channel_9 <= prbs_9;
          channel_a <= prbs_a;
          channel_b <= prbs_b;
          channel_c <= prbs_c;
          channel_d <= prbs_d;
          channel_e <= prbs_e;
          channel_f <= prbs_f;
          
          case (deskew_counter)
            'h00,'h01,'h02,'h03,'h04,'h05,'h06,'h07: channel_deskew <= prbs_f;
            'h08,'h09,'h0a,'h0b,'h0c,'h0d,'h0e,'h0f: channel_deskew <= prbs_e;
            'h10,'h11,'h12,'h13,'h14,'h15,'h16,'h17: channel_deskew <= prbs_d;
            'h18,'h19,'h1a,'h1b,'h1c,'h1d,'h1e,'h1f: channel_deskew <= prbs_c;
            'h20,'h21,'h22,'h23,'h24,'h25,'h26,'h27: channel_deskew <= prbs_b;
            'h28,'h29,'h2a,'h2b,'h2c,'h2d,'h2e,'h2f: channel_deskew <= prbs_a;
            'h30,'h31,'h32,'h33,'h34,'h35,'h36,'h37: channel_deskew <= prbs_9;
            'h38,'h39,'h3a,'h3b,'h3c,'h3d,'h3e,'h3f: channel_deskew <= prbs_8;
            'h40,'h41,'h42,'h43,'h44,'h45,'h46,'h47: channel_deskew <= prbs_7;
            'h48,'h49,'h4a,'h4b,'h4c,'h4d,'h4e,'h4f: channel_deskew <= prbs_6;
            'h50,'h51,'h52,'h53,'h54,'h55,'h56,'h57: channel_deskew <= prbs_5;
            'h58,'h59,'h5a,'h5b,'h5c,'h5d,'h5e,'h5f: channel_deskew <= prbs_4;
            'h60,'h61,'h62,'h63,'h64,'h65,'h66,'h67: channel_deskew <= prbs_3;
            'h68,'h69,'h6a,'h6b,'h6c,'h6d,'h6e,'h6f: channel_deskew <= prbs_2;
            'h70,'h71,'h72,'h73,'h74,'h75,'h76,'h77: channel_deskew <= prbs_1;
            'h78,'h79,'h7a,'h7b,'h7c,'h7d,'h7e,'h7f: channel_deskew <= prbs_0;
            'h80,'h81:                               channel_deskew <= 8'hf6; //A1
            'h82,'h83:                               channel_deskew <= 8'h28; //A2
            'h84,'h85,'h86,'h87:                     channel_deskew <= 8'haa; //EH
            default:                                 channel_deskew <= 8'hxx;
          endcase // case(deskew_counter)


          o_ch_0 <= channel_0;
          o_ch_1 <= channel_1;
          o_ch_2 <= channel_2;
          o_ch_3 <= channel_3;
          o_ch_4 <= channel_4;
          o_ch_5 <= channel_5;
          o_ch_6 <= channel_6;
          o_ch_7 <= channel_7;
          o_ch_8 <= channel_8;
          o_ch_9 <= channel_9;
          o_ch_a <= channel_a;
          o_ch_b <= channel_b;
          o_ch_c <= channel_c;
          o_ch_d <= channel_d;
          o_ch_e <= channel_e;
          o_ch_f <= channel_f;
          o_ch_deskew <= channel_deskew;

       end

endmodule // sfi5_tx

Now on to the receive side. Again we're set up for 4 patterns to match the transmit side. The errors are broken into ones and zeros errors. The code can be easily modified to just generate errors by the reader.

We start with an 8 bit prbs checker

module   rx_prbs_8 (
   // Outputs
   o_error, o_bad,
   // Inputs
   i_clk, i_reset, i_prbs_data, prbs_select
   );

   input                 i_clk;
   input                 i_reset;
   input [7:0]                 i_prbs_data;
   input [1:0]                 prbs_select;
   output [2:0]         o_error;
   output                 o_bad;

   reg                        o_bad;
   reg [2:0]                o_error;

   reg [31:0]                 prbs_15, prbs_23, prbs_31, prbs_7,d;
   reg [7:0]                 error_bit;
   reg [7:0]                 i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1;
   integer                 i;

   always @ (posedge i_clk)
     begin
        if (i_reset) //for sims only since the reload will put this in a known state
          begin
             error_bit <= 0;
             i_prbs_data_p1 <= 8'h0;
             i_prbs_data_p2 <= 8'h0;
             i_prbs_data_p3 <= 8'h0;
             o_bad <= 1'h0;
             o_error <= 3'h0;
             prbs_15 <= 32'h0;
             prbs_23 <= 32'h0;
             prbs_31 <= 32'h0;
             prbs_7 <= 32'h0;
          end // if (i_reset)
        else
          begin
             //although we only get 8 bits per clock, we need a 32 bit history
             i_prbs_data_p1 <= i_prbs_data;
             i_prbs_data_p2 <= i_prbs_data_p1;
             i_prbs_data_p3 <= i_prbs_data_p2;
             /*****************************************
              *reload if the errors are excessive 
              * *****************************************/
             if (o_bad)
               begin
                  prbs_15 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data};
                  prbs_23 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data};
                  prbs_31 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data};
                  prbs_7 <=  {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data};
               end // if (o_bad)
             else
               /*****************************************
                *compute next expected data for the prbs
                * **************************************/
               begin
                  d = prbs_7;
                  repeat (8) d = {d,d[6]^d[5]}; //prbs7
                  prbs_7 <= d;
                  d = prbs_15;
                  repeat (8) d = {d,d[14]^d[13]}; //prbs15
                  prbs_15 <= d;
                  d = prbs_23;
                  repeat (8) d = {d,d[22]^d[17]}; //prbs23
                  prbs_23 <= d;
                  d = prbs_31;
                  repeat (8) d = {d,d[30]^d[27]}; //prbs31
                  prbs_31 <= d;
               end // else: !if(o_bad)
             /*********************************************
              *compare the expected data to the received data
              * ************************************************/
             case (prbs_select)
               0:        error_bit <= prbs_7  ^ i_prbs_data_p1;//7 
               1:        error_bit <= prbs_15 ^ i_prbs_data_p1;//15 
               2:        error_bit <= prbs_23 ^ i_prbs_data_p1;//23 
               3:        error_bit <= prbs_31 ^ i_prbs_data_p1;//31 
             endcase // case(prbs_select)
             /*********************************************
              * count up the errors
              ********************************************/
             d=0;
             for (i=0;i<8;i=i+1)
               d =  (d + error_bit[i]);
             //more than 3 bit results is an error and will be reloaded anyway.
             o_error <= (d > 7) ? 7 : d ;//limit to 3 bit result
             o_bad <= d > 3;
          end // else: !if(i_reset)
     end // always @ (posedge i_clk)
endmodule // rx_prbs_8

Again we need to combine 16 of these into one module to check the whole 40Gbs stream. But what about teh deskew channel? This is a special case. We need to go back to teh subsample stuff. Each channel of the SFI5 is a subsample of the full 40Gbs stream. That means that each channel in turn is a PRBS stream with the same pattern. There is no requirement for each channel on the tx side to line up with a particular channel on the rx side. Depending on how the serdes line up likely the channels on the tx side will be split across rx channels. But each channel is a 1/16 sub sample of the full pattern, which means it in turn is the same pattern. That means we can ignore the deskew channel for this special case. More on the deskew channel later. But for now lets look at the 16 PRBS checkers being instantiated.

module sfi5_rx (
   // Outputs
   o_error, o_bad,
   // Inputs
   i_clk_in, i_reset, i_ch_0, i_ch_1, i_ch_2, i_ch_3, i_ch_4, i_ch_5,
   i_ch_6, i_ch_7, i_ch_8, i_ch_9, i_ch_a, i_ch_b, i_ch_c, i_ch_d,
   i_ch_e, i_ch_f, i_ch_deskew, prbs_select
   );


   input                i_clk_in;
   input                i_reset; //for sims
   input [7:0]                 i_ch_0, i_ch_1, i_ch_2, i_ch_3, i_ch_4, i_ch_5,
                        i_ch_6, i_ch_7, i_ch_8, i_ch_9, i_ch_a, i_ch_b,
                        i_ch_c, i_ch_d, i_ch_e, i_ch_f, i_ch_deskew;
   output [5:0]         o_error;
   output                 o_bad;
   input [1:0]                 prbs_select;
 
   wire [7:0]                 ch[0:15];
   wire [15:0]                 bad;
   reg                         o_bad;
   wire [2:0]                 error[0:15];
   reg [4:0]                 error_a, error_b, error_c, error_d;
   reg [6:0]                 o_error;                        
   
   //convert to an array for the inputs.
   assign                 ch['h0] = i_ch_0;
   assign                 ch['h1] = i_ch_1;
   assign                 ch['h2] = i_ch_2;
   assign                 ch['h3] = i_ch_3;
   assign                 ch['h4] = i_ch_4;
   assign                 ch['h5] = i_ch_5;
   assign                 ch['h6] = i_ch_6;
   assign                 ch['h7] = i_ch_7;
   assign                 ch['h8] = i_ch_8;
   assign                 ch['h9] = i_ch_9;
   assign                 ch['ha] = i_ch_a;
   assign                 ch['hb] = i_ch_b;
   assign                 ch['hc] = i_ch_c;
   assign                 ch['hd] = i_ch_d;
   assign                 ch['he] = i_ch_e;
   assign                 ch['hf] = i_ch_f;

   /*******************************************
    * instantiate the 16 prbs checkers
    * *****************************************/
   
   generate
      genvar                 k;
      for (k=0;k<16;k=k+1)
        begin: rx_prbs_inst
           rx_prbs_8 rx_prbs_8_local
             (.i_clk(i_clk_in), 
              .i_reset(i_reset), 
              .i_prbs_data(ch[k]), 
              .o_error(error[k]),
              .o_bad(bad[k]), 
              .prbs_select(prbs_select));
        end
   endgenerate

   /*********************************************
    * accumulate the error count
    * *****************************************/
   always @ (posedge i_clk_in)
     begin
        if (i_reset)
          begin
             o_bad <= 16'h0;
             error_a <= 5'h0;
             error_b <= 5'h0;
             error_c <= 5'h0;
             error_d <= 5'h0;
             o_error <= 7'h0;
          end // if (reset)
        else
          begin
             o_bad <= | bad;
             error_a <= error['h0] + error['h1] + error['h2] + error['h3]; 
             error_b <= error['h4] + error['h5] + error['h6] + error['h7]; 
             error_c <= error['h8] + error['h9] + error['ha] + error['hb]; 
             error_d <= error['hc] + error['hd] + error['he] + error['hf]; 
             o_error <= error_a + error_b + error_c + error_d;
          end // else: !if(reset)
     end // always @ (posedge i_clk_in)
      
endmodule // sfi5_rx

So that's a lot to take for granted. Lets run a simulation. This test bench combines the 16 SFI5 tx channels into one 40 Gbs stream. Checks that stream for correctness for a PRBS. Then adds a variable delay before demuxing the stream back into 16 rx channels. By adding a variable delay we can control how the data is striped across the incoming channels, and more importantly test that the rx side is correct independant of the alignement of the data between the tx and rx side. To test the locking of the checkers, the 'fiber delay' which changes the striping offset gets changed periodically, and the RX side relocks. Then the pattern is also changed through all 4 patterns as well. Each time, the rx side will relock. The reset on the rx side is optional for actual implementation as the relock will take care of putting all of the logic in the correct state. Unfortunately simulation needs some state to start with. Optionally, there is a commnted out direct connect between the tx and rx sides. Useful for simulation debug.

module test_tx_rx (
                );
   reg [31:0] prbs_40;
   reg               error_40;
   reg               clk;
   reg [6:0]  channel_counter, rx_channel_counter;
   reg               reset;
   reg [7:0] byte_shifter_f,
             byte_shifter_e,
             byte_shifter_d,
             byte_shifter_c,
             byte_shifter_b,
             byte_shifter_a,
             byte_shifter_9,
             byte_shifter_8,
             byte_shifter_7,
             byte_shifter_6,
             byte_shifter_5,
             byte_shifter_4,
             byte_shifter_3,
             byte_shifter_2,
             byte_shifter_1,
             byte_shifter_0,
             rx_byte_shifter_f,
             rx_byte_shifter_e,
             rx_byte_shifter_d,
             rx_byte_shifter_c,
             rx_byte_shifter_b,
             rx_byte_shifter_a,
             rx_byte_shifter_9,
             rx_byte_shifter_8,
             rx_byte_shifter_7,
             rx_byte_shifter_6,
             rx_byte_shifter_5,
             rx_byte_shifter_4,
             rx_byte_shifter_3,
             rx_byte_shifter_2,
             rx_byte_shifter_1,
             rx_byte_shifter_0;
   reg [15:0] bit_shifter, rx_bit_shifter;
   reg               data_out, data_in;
   reg               tx_clk, rx_clk;
   wire [7:0] tx_prbs_f,
              tx_prbs_e,
              tx_prbs_d,
              tx_prbs_c,
              tx_prbs_b,
              tx_prbs_a,
              tx_prbs_9,
              tx_prbs_8,
              tx_prbs_7,
              tx_prbs_6,
              tx_prbs_5,
              tx_prbs_4,
              tx_prbs_3,
              tx_prbs_2,
              tx_prbs_1,
              tx_prbs_0,
              tx_deskew;
   reg [7:0]  rx_prbs_f,
              rx_prbs_e,
              rx_prbs_d,
              rx_prbs_c,
              rx_prbs_b,
              rx_prbs_a,
              rx_prbs_9,
              rx_prbs_8,
              rx_prbs_7,
              rx_prbs_6,
              rx_prbs_5,
              rx_prbs_4,
              rx_prbs_3,
              rx_prbs_2,
              rx_prbs_1,
              rx_prbs_0;
   reg [31:0] fiber,fiber_p1;
   wire [6:0] error;
   wire       bad;
   reg [1:0]             prbs_select;
   reg [15:0]             fiber_delay;
   
   initial begin
      clk = 0;
      channel_counter = 0;
      rx_channel_counter = 5;
      reset = 0;
      prbs_select = 0;
      fiber_delay = 15;
      #1000 reset = 1;
      #2000 reset = 0;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 prbs_select = 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 prbs_select = 2;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 prbs_select = 3;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 fiber_delay = fiber_delay + 1;
      #10_000 $stop;
   end
 
   always clk = #0.0125 ~ clk; //the "40G" clk


/*******************************
 * tx mux side
 * output is data out which connects to "fiber" variable delay
 * ******************************/
   
   always @ (posedge clk)
     begin
        data_out <= bit_shifter[15];
        channel_counter <= channel_counter + 1;
        if (channel_counter[3:0]=='hf)
          begin
             bit_shifter <= {byte_shifter_f[7],
                          byte_shifter_e[7],
                          byte_shifter_d[7],
                          byte_shifter_c[7],
                          byte_shifter_b[7],
                          byte_shifter_a[7],
                          byte_shifter_9[7],
                          byte_shifter_8[7],
                          byte_shifter_7[7],
                          byte_shifter_6[7],
                          byte_shifter_5[7],
                          byte_shifter_4[7],
                          byte_shifter_3[7],
                          byte_shifter_2[7],
                          byte_shifter_1[7],
                          byte_shifter_0[7]};
             if (channel_counter[6:4] =='h7)
               begin
                  byte_shifter_f <= tx_prbs_f;
                  byte_shifter_e <= tx_prbs_e;
                  byte_shifter_d <= tx_prbs_d;
                  byte_shifter_c <= tx_prbs_c;
                  byte_shifter_b <= tx_prbs_b;
                  byte_shifter_a <= tx_prbs_a;
                  byte_shifter_9 <= tx_prbs_9;
                  byte_shifter_8 <= tx_prbs_8;
                  byte_shifter_7 <= tx_prbs_7;
                  byte_shifter_6 <= tx_prbs_6;
                  byte_shifter_5 <= tx_prbs_5;
                  byte_shifter_4 <= tx_prbs_4;
                  byte_shifter_3 <= tx_prbs_3;
                  byte_shifter_2 <= tx_prbs_2;
                  byte_shifter_1 <= tx_prbs_1;
                  byte_shifter_0 <= tx_prbs_0;
                  tx_clk <= 1;
               end         
             else
               begin
                  byte_shifter_f <= {byte_shifter_f,1'b0};
                  byte_shifter_e <= {byte_shifter_e,1'b0};
                  byte_shifter_d <= {byte_shifter_d,1'b0};
                  byte_shifter_c <= {byte_shifter_c,1'b0};
                  byte_shifter_b <= {byte_shifter_b,1'b0};
                  byte_shifter_a <= {byte_shifter_a,1'b0};
                  byte_shifter_9 <= {byte_shifter_9,1'b0};
                  byte_shifter_8 <= {byte_shifter_8,1'b0};
                  byte_shifter_7 <= {byte_shifter_7,1'b0};
                  byte_shifter_6 <= {byte_shifter_6,1'b0};
                  byte_shifter_5 <= {byte_shifter_5,1'b0};
                  byte_shifter_4 <= {byte_shifter_4,1'b0};
                  byte_shifter_3 <= {byte_shifter_3,1'b0};
                  byte_shifter_2 <= {byte_shifter_2,1'b0};
                  byte_shifter_1 <= {byte_shifter_1,1'b0};
                  byte_shifter_0 <= {byte_shifter_0,1'b0};
                  tx_clk <= 1'b0; 
               end
          end // if (channel_counter[3:0]=='hf)
        else
          begin
             bit_shifter <= {bit_shifter,1'b0};
             tx_clk <= 0;
          end // else: !if(channel_counter[3:0]=='hf)
     end // always @ (posedge clk)
   //instantiate the tx 
   tx_prbs sfi5_tx_test (
                         .i_clk(tx_clk),
                         .i_reset(reset),
                         .prbs_select(prbs_select),
                         .o_prbs_data_0(tx_prbs_0),
                         .o_prbs_data_1(tx_prbs_1),
                         .o_prbs_data_2(tx_prbs_2),
                         .o_prbs_data_3(tx_prbs_3),
                         .o_prbs_data_4(tx_prbs_4),
                         .o_prbs_data_5(tx_prbs_5),
                         .o_prbs_data_6(tx_prbs_6),
                         .o_prbs_data_7(tx_prbs_7),
                         .o_prbs_data_8(tx_prbs_8),
                         .o_prbs_data_9(tx_prbs_9),
                         .o_prbs_data_a(tx_prbs_a),
                         .o_prbs_data_b(tx_prbs_b),
                         .o_prbs_data_c(tx_prbs_c),
                         .o_prbs_data_d(tx_prbs_d),
                         .o_prbs_data_e(tx_prbs_e),
                         .o_prbs_data_f(tx_prbs_f)
                         );

/**********************************
 * here we add the variable delay between the mux and demux
 * *******************************/
   always @ (posedge clk)
     begin
        fiber <= {fiber,data_out};
        fiber_p1 <= fiber;
        data_in <= fiber[fiber_delay];  //this is the delay #
        //for checking the stream as one 40g prbs
        if (error_40)
          prbs_40 <= {fiber,data_out};
        else
          case (prbs_select)
            0:           prbs_40 <= {prbs_40,prbs_40[6]^prbs_40[5]};
            1:           prbs_40 <= {prbs_40,prbs_40[14]^prbs_40[13]};
            2:           prbs_40 <= {prbs_40,prbs_40[22]^prbs_40[17]};
            3:           prbs_40 <= {prbs_40,prbs_40[30]^prbs_40[27]};
          endcase // case(prbs_select)
        if (prbs_40[31:0] == fiber[31:0])
          error_40 <= 0;
        else
          error_40 <= 1;
     end
   
/******************************************************
 * and the demux on the recieve side
 * **************************************/
   
always @ (posedge clk)
  begin
     rx_channel_counter <= rx_channel_counter + 1;
     rx_bit_shifter <= {rx_bit_shifter,data_in};
     if (rx_channel_counter[3:0] == 4'hf)
       begin
          rx_byte_shifter_f <= {rx_byte_shifter_f,rx_bit_shifter['hf]};
          rx_byte_shifter_e <= {rx_byte_shifter_e,rx_bit_shifter['he]};
          rx_byte_shifter_d <= {rx_byte_shifter_d,rx_bit_shifter['hd]};
          rx_byte_shifter_c <= {rx_byte_shifter_c,rx_bit_shifter['hc]};
          rx_byte_shifter_b <= {rx_byte_shifter_b,rx_bit_shifter['hb]};
          rx_byte_shifter_a <= {rx_byte_shifter_a,rx_bit_shifter['ha]};
          rx_byte_shifter_9 <= {rx_byte_shifter_9,rx_bit_shifter['h9]};
          rx_byte_shifter_8 <= {rx_byte_shifter_8,rx_bit_shifter['h8]};
          rx_byte_shifter_7 <= {rx_byte_shifter_7,rx_bit_shifter['h7]};
          rx_byte_shifter_6 <= {rx_byte_shifter_6,rx_bit_shifter['h6]};
          rx_byte_shifter_5 <= {rx_byte_shifter_5,rx_bit_shifter['h5]};
          rx_byte_shifter_4 <= {rx_byte_shifter_4,rx_bit_shifter['h4]};
          rx_byte_shifter_3 <= {rx_byte_shifter_3,rx_bit_shifter['h3]};
          rx_byte_shifter_2 <= {rx_byte_shifter_2,rx_bit_shifter['h2]};
          rx_byte_shifter_1 <= {rx_byte_shifter_1,rx_bit_shifter['h1]};
          rx_byte_shifter_0 <= {rx_byte_shifter_0,rx_bit_shifter['h0]};
          if (rx_channel_counter[6:4] == 3'h7)
            begin
               rx_prbs_f <= rx_byte_shifter_f;
               rx_prbs_e <= rx_byte_shifter_e;
               rx_prbs_d <= rx_byte_shifter_d;
               rx_prbs_c <= rx_byte_shifter_c;
               rx_prbs_b <= rx_byte_shifter_b;
               rx_prbs_a <= rx_byte_shifter_a;
               rx_prbs_9 <= rx_byte_shifter_9;
               rx_prbs_8 <= rx_byte_shifter_8;
               rx_prbs_7 <= rx_byte_shifter_7;
               rx_prbs_6 <= rx_byte_shifter_6;
               rx_prbs_5 <= rx_byte_shifter_5;
               rx_prbs_4 <= rx_byte_shifter_4;
               rx_prbs_3 <= rx_byte_shifter_3;
               rx_prbs_2 <= rx_byte_shifter_2;
               rx_prbs_1 <= rx_byte_shifter_1;
               rx_prbs_0 <= rx_byte_shifter_0;
               rx_clk <= 1;
            end
          else
            rx_clk <= 0;
       end // if (rx_channel_counter[3:0] == 4'hf)
     else
       rx_clk <= 0;
    end // always @ (posedge clk)
   
/*************************************
 * and of course instantiate the rx sfi5
 * ********************************/
sfi5_rx sfi5_rx_test (
                      .i_clk_in(rx_clk),
                      .i_reset(reset),
                      .prbs_select(prbs_select),
                      .i_ch_0(rx_prbs_0),
                      .i_ch_1(rx_prbs_1),
                      .i_ch_2(rx_prbs_2),
                      .i_ch_3(rx_prbs_3),
                      .i_ch_4(rx_prbs_4),
                      .i_ch_5(rx_prbs_5),
                      .i_ch_6(rx_prbs_6),
                      .i_ch_7(rx_prbs_7),
                      .i_ch_8(rx_prbs_8),
                      .i_ch_9(rx_prbs_9),
                      .i_ch_a(rx_prbs_a),
                      .i_ch_b(rx_prbs_b),
                      .i_ch_c(rx_prbs_c),
                      .i_ch_d(rx_prbs_d),
                      .i_ch_e(rx_prbs_e),
                      .i_ch_f(rx_prbs_f),
/* skip the mux/demux                      
                      .i_ch_0(tx_prbs_0),
                      .i_ch_1(tx_prbs_1),
                      .i_ch_2(tx_prbs_2),
                      .i_ch_3(tx_prbs_3),
                      .i_ch_4(tx_prbs_4),
                      .i_ch_5(tx_prbs_5),
                      .i_ch_6(tx_prbs_6),
                      .i_ch_7(tx_prbs_7),
                      .i_ch_8(tx_prbs_8),
                      .i_ch_9(tx_prbs_9),
                      .i_ch_a(tx_prbs_a),
                      .i_ch_b(tx_prbs_b),
                      .i_ch_c(tx_prbs_c),
                      .i_ch_d(tx_prbs_d),
                      .i_ch_e(tx_prbs_e),
                      .i_ch_f(tx_prbs_f),

*/
                      
                      .i_ch_deskew(8'h0),
                      .o_error(error),
                      .o_bad(bad)
                      );

 always @ (error)
   $display("error is %d",error);
endmodule // test_tx


Still have questions?