For really fast PRBS, we run multiple wide generators in parallel

Well that's great, but what if we want to go even faster like in a PRBS for a 40G link with an SFI5 interface, and still be able to make timing. It all comes down to one sentance at the heart of the explanation. A sub-sampled PRBS has the same pattern, but with a phase shift. So if a PRBS stream was going by and you only grabbed every other bit and checked the pattern, you would find you have the same pattern, but you would be at a different point in the pattern. If you take every 3rd bit, or every 57th bit, still the same thing. So what that means is we can run several smaller PRBS generators in parallel, and along as we get the alignment right, they will mux to a higher bit rate PRBS of the same pattern.

To begin, we build an 8 bit PRBS generator. This time I'm going to put in a few patterns with a pattern select. Athough we only need 8 bits in the output, we need up to 31 bits of state so the internals are running wider.

module tx_prbs_8 
  (
   // Outputs
   o_prbs_data,
   // Inputs
   i_clk, i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7,
   i_load, prbs_select
   );

   localparam WIDTH = 8;
   input              i_clk; 
   input [31:0]       i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7; 
   output [WIDTH-1:0] o_prbs_data; 
   input               i_load;
   input [1:0]               prbs_select;


   reg                       load;
   reg [31:0]               prbs_data_15, prbs_data_23, prbs_data_31, prbs_data_7, d;
   reg [WIDTH-1:0]    o_prbs_data;

   always @ (posedge i_clk)
     begin
        load <= i_load; //buffer for fanout
        if (load)
          begin
             prbs_data_15 <= i_ld_data_15;
             prbs_data_23 <= i_ld_data_23;
             prbs_data_31 <= i_ld_data_31;
             prbs_data_7 <= i_ld_data_7;
          end
        else
          begin
             d = prbs_data_15;
             repeat (WIDTH) d = {d,d[14]^d[13]};
             prbs_data_15 <= d;
             d = prbs_data_23;
             repeat (WIDTH) d = {d,d[22]^d[17]};
             prbs_data_23 <= d;
             d = prbs_data_31;
             repeat (WIDTH) d = {d,d[30]^d[27]};
             prbs_data_31 <= d;
             d = prbs_data_7;
             repeat (WIDTH) d = {d,d[6]^d[5]};
             prbs_data_7 <= d;
          end
        case (prbs_select)
          0: o_prbs_data <= prbs_data_7[WIDTH-1:0];
          1: o_prbs_data <= prbs_data_15[WIDTH-1:0];
          2: o_prbs_data <= prbs_data_23[WIDTH-1:0];
          3: o_prbs_data <= prbs_data_31[WIDTH-1:0];
        endcase // case(prbs_select)
        
     end// @ (posedge clk)
endmodule // tx_prbs_8

Great, but now we have to instantiate 16 of them, and have their phase be interleaved to match the SFI5 interleaving pattern. Using 8 bits per channel to be handed to the SERDES, and 16 channels, means we need 512 bits of PRBS per clock. For the initialization, we start with a non-zero number and then run it through a PRBS geneator 512 times. Since the output never changes, synthesis will optimize this away. prbs_init_7, prbs_init_15, prbs_init_23, prbs_init_31 are the 4 initial values we use to preset the 16 PRBS generators to be synced up/interleaved to give a full speed (40Gbs) pattern. The next step is to stripe the data across the generators. This is done in a for loop. The outputs are in the form prbs_f_ld_7, where the f is the generator number, and the 7 is the pattern type. Although there's a bunch of code, again, it never changes so it should synthesis to nothing. Finally the 16 PRBS generators are instantiated with their 4 initial values for the 4 patterns.

module tx_prbs (/*autoarg*/
   // Outputs
   o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c,
   o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8,
   o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4,
   o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0,
   // Inputs
   i_clk, i_reset, prbs_select
   );

   input i_clk;
   input i_reset;
   input [1:0] prbs_select;
   output [7:0] o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c,
                o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8,
                o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4,
                o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0;

   reg [511:0]         d,prbs_init_15, prbs_init_23, prbs_init_31, prbs_init_7; //d needs to be a reg as integers are only 32 bits

   reg [31:0]         d_7, d_15, d_23, d_31;
   reg [31:0]         prbs_ld_7[0:15];
   reg [31:0]         prbs_ld_15[0:15];
   reg [31:0]         prbs_ld_23[0:15];
   reg [31:0]         prbs_ld_31[0:15];
   wire [7:0]         prbs_data[0:15];

   integer         i,j;

   always @ (posedge i_clk)
     begin
        /*****************************************************
         * first we generate 512 bits of each of the patterns for initialization
         * ***************************************************/
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[6]^d[5])};//prbs7 inverted
        prbs_init_7 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[14]^d[13])}; //prbs15
        prbs_init_15 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[22]^d[17])}; //prbs 23
        prbs_init_23 <= d;
        d = 512'h12345678;//needs to be non-zero in the lower bits.
        repeat (512) d = {d,(d[30]^d[27])}; //prbs31
        prbs_init_31 <= d;
        /******************************************************
         * then we stripe them across the generators
         * ********************************************/
        for (i=0;i<16;i=i+1)
          begin
             for (j=0;j<32;j=j+1)
               begin
                  d_7[j]  = prbs_init_7[j*16  + i];
                  d_15[j] = prbs_init_15[j*16 + i];
                  d_23[j] = prbs_init_23[j*16 + i];
                  d_31[j] = prbs_init_31[j*16 + i];
               end // for (j=0;j<32;j=j+1)
             prbs_ld_7[i]  <= d_7;
             prbs_ld_15[i] <= d_15;
             prbs_ld_23[i] <= d_23;
             prbs_ld_31[i] <= d_31;
          end // for (i=0;i<16;i=i+1)
     end // always @ (posedge i_clk)
   /***********************************************
    * then we instantiate the generators
    * *********************************************/
   generate
   genvar k;
      for (k=0;k<16;k=k+1)
        begin: tx_prbs_8_inst
           tx_prbs_8 tx_prbs_8_local 
             (.i_clk(i_clk), 
              .i_ld_data_7(prbs_ld_7[k]), 
              .i_ld_data_15(prbs_ld_15[k]), 
              .i_ld_data_23(prbs_ld_23[k]),
              .i_ld_data_31(prbs_ld_31[k]),
              .o_prbs_data(prbs_data[k]), 
              .i_load(i_reset), 
              .prbs_select(prbs_select));
        end
   endgenerate

   //and connect the generator outputs to the module outputs
   assign o_prbs_data_0 = prbs_data['h0];
   assign o_prbs_data_1 = prbs_data['h1];
   assign o_prbs_data_2 = prbs_data['h2];
   assign o_prbs_data_3 = prbs_data['h3];
   assign o_prbs_data_4 = prbs_data['h4];
   assign o_prbs_data_5 = prbs_data['h5];
   assign o_prbs_data_6 = prbs_data['h6];
   assign o_prbs_data_7 = prbs_data['h7];
   assign o_prbs_data_8 = prbs_data['h8];
   assign o_prbs_data_9 = prbs_data['h9];
   assign o_prbs_data_a = prbs_data['ha];
   assign o_prbs_data_b = prbs_data['hb];
   assign o_prbs_data_c = prbs_data['hc];
   assign o_prbs_data_d = prbs_data['hd];
   assign o_prbs_data_e = prbs_data['he];
   assign o_prbs_data_f = prbs_data['hf];

endmodule // tx_prbs

But wait, We need this to hook up to SFI5 which requires an additional deskew channel. No problem.

Still have questions?

Links:

Home

Contact us

Samples

Cores

For really fast PRBS, we run multiple wide generators in parallel