Pseudo Random binary sequence (PRBS) and their Verilog implementation using a linear feedback shift register (LFSR) ultimately to connect to an SFI5 interface.
A PRBS sequence often come in handy in FPGA work. They're almost perfectly balanced in other words the duty cycle or one to zero ratio is very close to even. Although the pattern is known and predictable, it makes a good substitute for a true random number. In this discussion we will be talking about 'maximal LFSR' which produce a maximum length sequence. A PRBS is sometimes also referred to as a Pseudo random number sequence or PN. Usually the sequence is referred to by it's length such as a PN7 has a pattern length of 27-1
So lets start with an example for a PN7. To generate an LFSR we need a shift register and one or a few xor gates.
You can check out the Xilinx App Note for a similar approach using their SRLs. But we have more, so keep reading.module prbs_generate ( // Outputs prbs, // Inputs clk, reset ); output prbs; input clk, reset; parameter PN = 7, TAP1 = 6, TAP2 = 5; reg [PN-1:0] prbs_state; wire prbs; integer d; assign prbs = prbs_state[PN-1]; always @ (posedge clk) if (reset) prbs_state <= 1; //anything but the all 0s case is fine. else prbs_state <= {prbs_state,prbs_state[TAP1]^prbs_state[TAP2]}; endmodule // prbs_generate
This code generates a PN7 by taking two 'taps' as they're called from the shift register, exclusive oring them together and feeding them back into the shift register. Left to run on it's own, it will generate the a 127 bit long pattern, and then repeat. In that pattern we have 64 1s and 63 0s. It is always one zero that is missing from the perfect balance. The pattern also contains all of the 7 bit combinations, except the all 0s case. This gives us a maximum run length of 1s as 7 bits, and the maximum run length of 0s as 6 bits. This is useful to know if you are working on a link that has a run length limit due to AC coupling for example. The all 0s case is a special case that should be avoided as if you start there you will never leave.
Next is optional depending on what you're doing. If you're looking at just checking the eye pattern or running an arbiter, this is fine, but you may want to check the data coming back. So we ned a checker. We start by building another PRBS machine, but this time we add the ability to load the state of the prbs to match the incoming data. When do you load, well you have a choice, you can load it with an external siganl but more commonly I find it easiest to check the error rate and if above a certain rate then reload you're probably lost. This way "It just works."
module prbs_check ( // Outputs error, // Inputs clk, reset, prbs ); output error; input clk, reset, prbs; parameter PN = 7, TAP1 = 6, TAP2 = 5; reg error; reg [PN-1:0] prbs_state; integer d; reg [PN-1:0] prbs_pipe; reg [7:0] error_pipe; reg load; reg [3:0] check; always @ (posedge clk) if (reset) begin check <= 4'h0; d <= 0; error <= 1'h0; error_pipe <= 8'h0; load <= 1'h0; prbs_pipe <= 0; prbs_state <= 0; prbs_state <= 1; prbs_pipe <= 1; end // if (reset) else begin prbs_pipe <= {prbs_pipe,prbs}; prbs_state <= (load) ? prbs_pipe : {prbs_state,prbs_state[TAP1]^prbs_state[TAP2]}; error <= prbs_state[PN-1] ^ prbs; error_pipe <= {error_pipe,error}; check = 0; for (d=0;d<8;d=d+1) check = check + error_pipe[d]; load <= (check > 3); end // else: !if(reset) endmodule // prbs_check
But what if we want to run faster than the FPGA clock and produce multiple bits per clock cycle that may be fed into a mux for higher data rates or if we just need more than one bit per clock for whatever we're doing. This time we're using a PN31.
module prbs_wide_generate ( // Outputs prbs, // Inputs clk, reset ); parameter WIDTH = 128, PN = 7,//not used but good to know TAP1 = 30, TAP2 = 27; output [WIDTH-1:0] prbs; input clk, reset; reg [WIDTH-1:0] prbs; reg [WIDTH-1:0] d;//d is a temp variable always @ (posedge clk) if (reset) prbs <= 1; //anything but teh all 0s case is fine. else begin d = prbs; //blocking assignment used on purpose here repeat (WIDTH) d = {d,d[TAP1]^d[TAP2]};//again blocking is intentional prbs <= d; end // else: !if(reset) endmodule // prbs_wide_generate
There are a few things to note in this code. Even though the PN31 only needs 31 bits of state, we need to have a much wider register to store the old state because we're calculating from much older bits than just the past 31. More confusing is the use of blocking assignments. The few lines of code between the begin end use both blocking and blocking. The idea is to have the complete 128 bits calculated all in one clock cycle. I like to think of it as spinning the generator 128 times per clock cycle. If we used non-blocking assignments, we would only have one rotation per clock cycle. This way, the current state is transferred to 'd', spun 128 times, and then transferred back to 'prbs'. Note you may get a warning on the register 'd' because it optimizes out in synthesis.
We will assume we have the above generator driving our signal that we want to check.
module prbs_wide_check ( // Outputs error, // Inputs prbs, clk, reset ); parameter WIDTH = 128, PN = 7,//not used but good to know TAP1 = 30, TAP2 = 27; output [8:0] error; input [WIDTH-1:0] prbs; input clk, reset; reg [8:0] error; reg [WIDTH-1:0] prbs_state, check, d;//d is a temp variable reg load; integer i; always @ (posedge clk) if (reset) prbs_state <= 1; //anything but teh all 0s case is fine. else begin d = prbs_state; //blocking assignment used on purpose here repeat (WIDTH) d = {d,d[TAP1]^d[TAP2]};//again blocking is intentional prbs_state <= d; check <= prbs ^ prbs_state; d = 0; for (i=0;i<= d; load <= error > 25; //error rate to reload end // else: !if(reset) endmodule // prbs_wide_check
This checker has an automatic reload built in. If you want to have an external load you can 'or' in an additional external load. Or you can remove the automatic reload and only reload based on an external signal. The reset block is only for mostly for simulations. With the auto load, the checker will automatically align to the incoming data. The calculations for the error signals and reload have a few pipe stages that are optional depending on the speed you are running at.
Well that's great, but what if we want to go even faster like in a PRBS for a 40G link with an SFI5 interface, and still be able to make timing. It all comes down to one sentance at the heart of the explanation. A sub-sampled PRBS has the same pattern, but with a phase shift. So if a PRBS stream was going by and you only grabbed every other bit and checked the pattern, you would find you have the same pattern, but you would be at a different point in the pattern. If you take every 3rd bit, or every 57th bit, still the same thing. So what that means is we can run several smaller PRBS generators in parallel, and along as we get the alignment right, they will mux to a higher bit rate PRBS of the same pattern.
To begin, we build an 8 bit PRBS generator. This time I'm going to put in a few patterns with a pattern select. Athough we only need 8 bits in the output, we need up to 31 bits of state so the internals are running wider.
module tx_prbs_8 ( // Outputs o_prbs_data, // Inputs i_clk, i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7, i_load, prbs_select ); localparam WIDTH = 8; input i_clk; input [31:0] i_ld_data_15, i_ld_data_23, i_ld_data_31, i_ld_data_7; output [WIDTH-1:0] o_prbs_data; input i_load; input [1:0] prbs_select; reg load; reg [31:0] prbs_data_15, prbs_data_23, prbs_data_31, prbs_data_7, d; reg [WIDTH-1:0] o_prbs_data; always @ (posedge i_clk) begin load <= i_load; //buffer for fanout if (load) begin prbs_data_15 <= i_ld_data_15; prbs_data_23 <= i_ld_data_23; prbs_data_31 <= i_ld_data_31; prbs_data_7 <= i_ld_data_7; end else begin d = prbs_data_15; repeat (WIDTH) d = {d,d[14]^d[13]}; prbs_data_15 <= d; d = prbs_data_23; repeat (WIDTH) d = {d,d[22]^d[17]}; prbs_data_23 <= d; d = prbs_data_31; repeat (WIDTH) d = {d,d[30]^d[27]}; prbs_data_31 <= d; d = prbs_data_7; repeat (WIDTH) d = {d,d[6]^d[5]}; prbs_data_7 <= d; end case (prbs_select) 0: o_prbs_data <= prbs_data_7[WIDTH-1:0]; 1: o_prbs_data <= prbs_data_15[WIDTH-1:0]; 2: o_prbs_data <= prbs_data_23[WIDTH-1:0]; 3: o_prbs_data <= prbs_data_31[WIDTH-1:0]; endcase // case(prbs_select) end// @ (posedge clk) endmodule // tx_prbs_8Great, but now we have to instantiate 16 of them, and have their phase be interleaved to match the SFI5 interleaving pattern. Using 8 bits per channel to be handed to the SERDES, and 16 channels, means we need 512 bits of PRBS per clock. For the initialization, we start with a non-zero number and then run it through a PRBS geneator 512 times. Since the output never changes, synthesis will optimize this away. prbs_init_7, prbs_init_15, prbs_init_23, prbs_init_31 are the 4 initial values we use to preset the 16 PRBS generators to be synced up/interleaved to give a full speed (40Gbs) pattern. The next step is to stripe the data across the generators. This is done in a for loop. The outputs are in the form prbs_f_ld_7, where the f is the generator number, and the 7 is the pattern type. Although there's a bunch of code, again, it never changes so it should synthesis to nothing. Finally the 16 PRBS generators are instantiated with their 4 initial values for the 4 patterns.
module tx_prbs (/*autoarg*/ // Outputs o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c, o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8, o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4, o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0, // Inputs i_clk, i_reset, prbs_select ); input i_clk; input i_reset; input [1:0] prbs_select; output [7:0] o_prbs_data_f, o_prbs_data_e, o_prbs_data_d, o_prbs_data_c, o_prbs_data_b, o_prbs_data_a, o_prbs_data_9, o_prbs_data_8, o_prbs_data_7, o_prbs_data_6, o_prbs_data_5, o_prbs_data_4, o_prbs_data_3, o_prbs_data_2, o_prbs_data_1, o_prbs_data_0; reg [511:0] d,prbs_init_15, prbs_init_23, prbs_init_31, prbs_init_7; //d needs to be a reg as integers are only 32 bits reg [31:0] d_7, d_15, d_23, d_31; reg [31:0] prbs_ld_7[0:15]; reg [31:0] prbs_ld_15[0:15]; reg [31:0] prbs_ld_23[0:15]; reg [31:0] prbs_ld_31[0:15]; wire [7:0] prbs_data[0:15]; integer i,j; always @ (posedge i_clk) begin /***************************************************** * first we generate 512 bits of each of the patterns for initialization * ***************************************************/ d = 512'h12345678;//needs to be non-zero in the lower bits. repeat (512) d = {d,(d[6]^d[5])};//prbs7 inverted prbs_init_7 <= d; d = 512'h12345678;//needs to be non-zero in the lower bits. repeat (512) d = {d,(d[14]^d[13])}; //prbs15 prbs_init_15 <= d; d = 512'h12345678;//needs to be non-zero in the lower bits. repeat (512) d = {d,(d[22]^d[17])}; //prbs 23 prbs_init_23 <= d; d = 512'h12345678;//needs to be non-zero in the lower bits. repeat (512) d = {d,(d[30]^d[27])}; //prbs31 prbs_init_31 <= d; /****************************************************** * then we stripe them across the generators * ********************************************/ for (i=0;i<16;i=i+1) begin for (j=0;j<32;j=j+1) begin d_7[j] = prbs_init_7[j*16 + i]; d_15[j] = prbs_init_15[j*16 + i]; d_23[j] = prbs_init_23[j*16 + i]; d_31[j] = prbs_init_31[j*16 + i]; end // for (j=0;j<32;j=j+1) prbs_ld_7[i] <= d_7; prbs_ld_15[i] <= d_15; prbs_ld_23[i] <= d_23; prbs_ld_31[i] <= d_31; end // for (i=0;i<16;i=i+1) end // always @ (posedge i_clk) /*********************************************** * then we instantiate the generators * *********************************************/ generate genvar k; for (k=0;k<16;k=k+1) begin: tx_prbs_8_inst tx_prbs_8 tx_prbs_8_local (.i_clk(i_clk), .i_ld_data_7(prbs_ld_7[k]), .i_ld_data_15(prbs_ld_15[k]), .i_ld_data_23(prbs_ld_23[k]), .i_ld_data_31(prbs_ld_31[k]), .o_prbs_data(prbs_data[k]), .i_load(i_reset), .prbs_select(prbs_select)); end endgenerate //and connect the generator outputs to the module outputs assign o_prbs_data_0 = prbs_data['h0]; assign o_prbs_data_1 = prbs_data['h1]; assign o_prbs_data_2 = prbs_data['h2]; assign o_prbs_data_3 = prbs_data['h3]; assign o_prbs_data_4 = prbs_data['h4]; assign o_prbs_data_5 = prbs_data['h5]; assign o_prbs_data_6 = prbs_data['h6]; assign o_prbs_data_7 = prbs_data['h7]; assign o_prbs_data_8 = prbs_data['h8]; assign o_prbs_data_9 = prbs_data['h9]; assign o_prbs_data_a = prbs_data['ha]; assign o_prbs_data_b = prbs_data['hb]; assign o_prbs_data_c = prbs_data['hc]; assign o_prbs_data_d = prbs_data['hd]; assign o_prbs_data_e = prbs_data['he]; assign o_prbs_data_f = prbs_data['hf]; endmodule // tx_prbs
But wait, We need this to hook up to SFI5 which requires an additional deskew channel. No problem. We need to ship out 8 bytes, of each channel followed by a frame marker and an expansion header. This module wraps the tx_prbs and adds the deskew.
module sfi5_tx ( // Outputs o_ch_0, o_ch_1, o_ch_2, o_ch_3, o_ch_4, o_ch_5, o_ch_6, o_ch_7, o_ch_8, o_ch_9, o_ch_a, o_ch_b, o_ch_c, o_ch_d, o_ch_e, o_ch_f, o_ch_deskew, deskew, // Inputs i_clk_in, i_reset, prbs_select ); input i_clk_in; input i_reset; input [1:0] prbs_select; output [7:0] o_ch_0; output [7:0] o_ch_1; output [7:0] o_ch_2; output [7:0] o_ch_3; output [7:0] o_ch_4; output [7:0] o_ch_5; output [7:0] o_ch_6; output [7:0] o_ch_7; output [7:0] o_ch_8; output [7:0] o_ch_9; output [7:0] o_ch_a; output [7:0] o_ch_b; output [7:0] o_ch_c; output [7:0] o_ch_d; output [7:0] o_ch_e; output [7:0] o_ch_f; output [7:0] o_ch_deskew; output deskew; reg reset; reg [7:0] o_ch_0; reg [7:0] o_ch_1; reg [7:0] o_ch_2; reg [7:0] o_ch_3; reg [7:0] o_ch_4; reg [7:0] o_ch_5; reg [7:0] o_ch_6; reg [7:0] o_ch_7; reg [7:0] o_ch_8; reg [7:0] o_ch_9; reg [7:0] o_ch_a; reg [7:0] o_ch_b; reg [7:0] o_ch_c; reg [7:0] o_ch_d; reg [7:0] o_ch_e; reg [7:0] o_ch_f; reg [7:0] o_ch_deskew; reg [7:0] channel_0; reg [7:0] channel_1; reg [7:0] channel_2; reg [7:0] channel_3; reg [7:0] channel_4; reg [7:0] channel_5; reg [7:0] channel_6; reg [7:0] channel_7; reg [7:0] channel_8; reg [7:0] channel_9; reg [7:0] channel_a; reg [7:0] channel_b; reg [7:0] channel_c; reg [7:0] channel_d; reg [7:0] channel_e; reg [7:0] channel_f; reg [7:0] channel_deskew; reg [7:0] deskew_counter; wire [7:0] prbs_0, prbs_1, prbs_2, prbs_3, prbs_4, prbs_5, prbs_6, prbs_7, prbs_8, prbs_9, prbs_a, prbs_b, prbs_c, prbs_d, prbs_e, prbs_f; reg deskew, deskew_int; tx_prbs tx_prbs_1 ( // Outputs .o_prbs_data_f (prbs_f), // Templated .o_prbs_data_e (prbs_e), // Templated .o_prbs_data_d (prbs_d), // Templated .o_prbs_data_c (prbs_c), // Templated .o_prbs_data_b (prbs_b), // Templated .o_prbs_data_a (prbs_a), // Templated .o_prbs_data_9 (prbs_9), // Templated .o_prbs_data_8 (prbs_8), // Templated .o_prbs_data_7 (prbs_7), // Templated .o_prbs_data_6 (prbs_6), // Templated .o_prbs_data_5 (prbs_5), // Templated .o_prbs_data_4 (prbs_4), // Templated .o_prbs_data_3 (prbs_3), // Templated .o_prbs_data_2 (prbs_2), // Templated .o_prbs_data_1 (prbs_1), // Templated .o_prbs_data_0 (prbs_0), // Templated // Inputs .i_clk (i_clk_in), // Templated .i_reset (reset), // Templated .prbs_select (prbs_select[1:0])); always @ (posedge i_clk_in) begin reset <= i_reset; if (reset) deskew_counter <= 0; //for sim only else deskew_counter <= (deskew_counter == 'h87) ? 0 : deskew_counter + 1; //deskew_counter_p1 <= deskew_counter; deskew_int <= (deskew_counter[7:4] == 'h7); deskew <= deskew_int; channel_0 <= prbs_0; channel_1 <= prbs_1; channel_2 <= prbs_2; channel_3 <= prbs_3; channel_4 <= prbs_4; channel_5 <= prbs_5; channel_6 <= prbs_6; channel_7 <= prbs_7; channel_8 <= prbs_8; channel_9 <= prbs_9; channel_a <= prbs_a; channel_b <= prbs_b; channel_c <= prbs_c; channel_d <= prbs_d; channel_e <= prbs_e; channel_f <= prbs_f; case (deskew_counter) 'h00,'h01,'h02,'h03,'h04,'h05,'h06,'h07: channel_deskew <= prbs_f; 'h08,'h09,'h0a,'h0b,'h0c,'h0d,'h0e,'h0f: channel_deskew <= prbs_e; 'h10,'h11,'h12,'h13,'h14,'h15,'h16,'h17: channel_deskew <= prbs_d; 'h18,'h19,'h1a,'h1b,'h1c,'h1d,'h1e,'h1f: channel_deskew <= prbs_c; 'h20,'h21,'h22,'h23,'h24,'h25,'h26,'h27: channel_deskew <= prbs_b; 'h28,'h29,'h2a,'h2b,'h2c,'h2d,'h2e,'h2f: channel_deskew <= prbs_a; 'h30,'h31,'h32,'h33,'h34,'h35,'h36,'h37: channel_deskew <= prbs_9; 'h38,'h39,'h3a,'h3b,'h3c,'h3d,'h3e,'h3f: channel_deskew <= prbs_8; 'h40,'h41,'h42,'h43,'h44,'h45,'h46,'h47: channel_deskew <= prbs_7; 'h48,'h49,'h4a,'h4b,'h4c,'h4d,'h4e,'h4f: channel_deskew <= prbs_6; 'h50,'h51,'h52,'h53,'h54,'h55,'h56,'h57: channel_deskew <= prbs_5; 'h58,'h59,'h5a,'h5b,'h5c,'h5d,'h5e,'h5f: channel_deskew <= prbs_4; 'h60,'h61,'h62,'h63,'h64,'h65,'h66,'h67: channel_deskew <= prbs_3; 'h68,'h69,'h6a,'h6b,'h6c,'h6d,'h6e,'h6f: channel_deskew <= prbs_2; 'h70,'h71,'h72,'h73,'h74,'h75,'h76,'h77: channel_deskew <= prbs_1; 'h78,'h79,'h7a,'h7b,'h7c,'h7d,'h7e,'h7f: channel_deskew <= prbs_0; 'h80,'h81: channel_deskew <= 8'hf6; //A1 'h82,'h83: channel_deskew <= 8'h28; //A2 'h84,'h85,'h86,'h87: channel_deskew <= 8'haa; //EH default: channel_deskew <= 8'hxx; endcase // case(deskew_counter) o_ch_0 <= channel_0; o_ch_1 <= channel_1; o_ch_2 <= channel_2; o_ch_3 <= channel_3; o_ch_4 <= channel_4; o_ch_5 <= channel_5; o_ch_6 <= channel_6; o_ch_7 <= channel_7; o_ch_8 <= channel_8; o_ch_9 <= channel_9; o_ch_a <= channel_a; o_ch_b <= channel_b; o_ch_c <= channel_c; o_ch_d <= channel_d; o_ch_e <= channel_e; o_ch_f <= channel_f; o_ch_deskew <= channel_deskew; end endmodule // sfi5_tx
Now on to the receive side. Again we're set up for 4 patterns to match the transmit side. The errors are broken into ones and zeros errors. The code can be easily modified to just generate errors by the reader.
We start with an 8 bit prbs checker
module rx_prbs_8 ( // Outputs o_error, o_bad, // Inputs i_clk, i_reset, i_prbs_data, prbs_select ); input i_clk; input i_reset; input [7:0] i_prbs_data; input [1:0] prbs_select; output [2:0] o_error; output o_bad; reg o_bad; reg [2:0] o_error; reg [31:0] prbs_15, prbs_23, prbs_31, prbs_7,d; reg [7:0] error_bit; reg [7:0] i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1; integer i; always @ (posedge i_clk) begin if (i_reset) //for sims only since the reload will put this in a known state begin error_bit <= 0; i_prbs_data_p1 <= 8'h0; i_prbs_data_p2 <= 8'h0; i_prbs_data_p3 <= 8'h0; o_bad <= 1'h0; o_error <= 3'h0; prbs_15 <= 32'h0; prbs_23 <= 32'h0; prbs_31 <= 32'h0; prbs_7 <= 32'h0; end // if (i_reset) else begin //although we only get 8 bits per clock, we need a 32 bit history i_prbs_data_p1 <= i_prbs_data; i_prbs_data_p2 <= i_prbs_data_p1; i_prbs_data_p3 <= i_prbs_data_p2; /***************************************** *reload if the errors are excessive * *****************************************/ if (o_bad) begin prbs_15 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data}; prbs_23 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data}; prbs_31 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data}; prbs_7 <= {i_prbs_data_p3,i_prbs_data_p2,i_prbs_data_p1,i_prbs_data}; end // if (o_bad) else /***************************************** *compute next expected data for the prbs * **************************************/ begin d = prbs_7; repeat (8) d = {d,d[6]^d[5]}; //prbs7 prbs_7 <= d; d = prbs_15; repeat (8) d = {d,d[14]^d[13]}; //prbs15 prbs_15 <= d; d = prbs_23; repeat (8) d = {d,d[22]^d[17]}; //prbs23 prbs_23 <= d; d = prbs_31; repeat (8) d = {d,d[30]^d[27]}; //prbs31 prbs_31 <= d; end // else: !if(o_bad) /********************************************* *compare the expected data to the received data * ************************************************/ case (prbs_select) 0: error_bit <= prbs_7 ^ i_prbs_data_p1;//7 1: error_bit <= prbs_15 ^ i_prbs_data_p1;//15 2: error_bit <= prbs_23 ^ i_prbs_data_p1;//23 3: error_bit <= prbs_31 ^ i_prbs_data_p1;//31 endcase // case(prbs_select) /********************************************* * count up the errors ********************************************/ d=0; for (i=0;i<8;i=i+1) d = (d + error_bit[i]); //more than 3 bit results is an error and will be reloaded anyway. o_error <= (d > 7) ? 7 : d ;//limit to 3 bit result o_bad <= d > 3; end // else: !if(i_reset) end // always @ (posedge i_clk) endmodule // rx_prbs_8
Again we need to combine 16 of these into one module to check the whole 40Gbs stream. But what about teh deskew channel? This is a special case. We need to go back to teh subsample stuff. Each channel of the SFI5 is a subsample of the full 40Gbs stream. That means that each channel in turn is a PRBS stream with the same pattern. There is no requirement for each channel on the tx side to line up with a particular channel on the rx side. Depending on how the serdes line up likely the channels on the tx side will be split across rx channels. But each channel is a 1/16 sub sample of the full pattern, which means it in turn is the same pattern. That means we can ignore the deskew channel for this special case. More on the deskew channel later. But for now lets look at the 16 PRBS checkers being instantiated.
module sfi5_rx ( // Outputs o_error, o_bad, // Inputs i_clk_in, i_reset, i_ch_0, i_ch_1, i_ch_2, i_ch_3, i_ch_4, i_ch_5, i_ch_6, i_ch_7, i_ch_8, i_ch_9, i_ch_a, i_ch_b, i_ch_c, i_ch_d, i_ch_e, i_ch_f, i_ch_deskew, prbs_select ); input i_clk_in; input i_reset; //for sims input [7:0] i_ch_0, i_ch_1, i_ch_2, i_ch_3, i_ch_4, i_ch_5, i_ch_6, i_ch_7, i_ch_8, i_ch_9, i_ch_a, i_ch_b, i_ch_c, i_ch_d, i_ch_e, i_ch_f, i_ch_deskew; output [5:0] o_error; output o_bad; input [1:0] prbs_select; wire [7:0] ch[0:15]; wire [15:0] bad; reg o_bad; wire [2:0] error[0:15]; reg [4:0] error_a, error_b, error_c, error_d; reg [6:0] o_error; //convert to an array for the inputs. assign ch['h0] = i_ch_0; assign ch['h1] = i_ch_1; assign ch['h2] = i_ch_2; assign ch['h3] = i_ch_3; assign ch['h4] = i_ch_4; assign ch['h5] = i_ch_5; assign ch['h6] = i_ch_6; assign ch['h7] = i_ch_7; assign ch['h8] = i_ch_8; assign ch['h9] = i_ch_9; assign ch['ha] = i_ch_a; assign ch['hb] = i_ch_b; assign ch['hc] = i_ch_c; assign ch['hd] = i_ch_d; assign ch['he] = i_ch_e; assign ch['hf] = i_ch_f; /******************************************* * instantiate the 16 prbs checkers * *****************************************/ generate genvar k; for (k=0;k<16;k=k+1) begin: rx_prbs_inst rx_prbs_8 rx_prbs_8_local (.i_clk(i_clk_in), .i_reset(i_reset), .i_prbs_data(ch[k]), .o_error(error[k]), .o_bad(bad[k]), .prbs_select(prbs_select)); end endgenerate /********************************************* * accumulate the error count * *****************************************/ always @ (posedge i_clk_in) begin if (i_reset) begin o_bad <= 16'h0; error_a <= 5'h0; error_b <= 5'h0; error_c <= 5'h0; error_d <= 5'h0; o_error <= 7'h0; end // if (reset) else begin o_bad <= | bad; error_a <= error['h0] + error['h1] + error['h2] + error['h3]; error_b <= error['h4] + error['h5] + error['h6] + error['h7]; error_c <= error['h8] + error['h9] + error['ha] + error['hb]; error_d <= error['hc] + error['hd] + error['he] + error['hf]; o_error <= error_a + error_b + error_c + error_d; end // else: !if(reset) end // always @ (posedge i_clk_in) endmodule // sfi5_rx
So that's a lot to take for granted. Lets run a simulation. This test bench combines the 16 SFI5 tx channels into one 40 Gbs stream. Checks that stream for correctness for a PRBS. Then adds a variable delay before demuxing the stream back into 16 rx channels. By adding a variable delay we can control how the data is striped across the incoming channels, and more importantly test that the rx side is correct independant of the alignement of the data between the tx and rx side. To test the locking of the checkers, the 'fiber delay' which changes the striping offset gets changed periodically, and the RX side relocks. Then the pattern is also changed through all 4 patterns as well. Each time, the rx side will relock. The reset on the rx side is optional for actual implementation as the relock will take care of putting all of the logic in the correct state. Unfortunately simulation needs some state to start with. Optionally, there is a commnted out direct connect between the tx and rx sides. Useful for simulation debug.
module test_tx_rx ( ); reg [31:0] prbs_40; reg error_40; reg clk; reg [6:0] channel_counter, rx_channel_counter; reg reset; reg [7:0] byte_shifter_f, byte_shifter_e, byte_shifter_d, byte_shifter_c, byte_shifter_b, byte_shifter_a, byte_shifter_9, byte_shifter_8, byte_shifter_7, byte_shifter_6, byte_shifter_5, byte_shifter_4, byte_shifter_3, byte_shifter_2, byte_shifter_1, byte_shifter_0, rx_byte_shifter_f, rx_byte_shifter_e, rx_byte_shifter_d, rx_byte_shifter_c, rx_byte_shifter_b, rx_byte_shifter_a, rx_byte_shifter_9, rx_byte_shifter_8, rx_byte_shifter_7, rx_byte_shifter_6, rx_byte_shifter_5, rx_byte_shifter_4, rx_byte_shifter_3, rx_byte_shifter_2, rx_byte_shifter_1, rx_byte_shifter_0; reg [15:0] bit_shifter, rx_bit_shifter; reg data_out, data_in; reg tx_clk, rx_clk; wire [7:0] tx_prbs_f, tx_prbs_e, tx_prbs_d, tx_prbs_c, tx_prbs_b, tx_prbs_a, tx_prbs_9, tx_prbs_8, tx_prbs_7, tx_prbs_6, tx_prbs_5, tx_prbs_4, tx_prbs_3, tx_prbs_2, tx_prbs_1, tx_prbs_0, tx_deskew; reg [7:0] rx_prbs_f, rx_prbs_e, rx_prbs_d, rx_prbs_c, rx_prbs_b, rx_prbs_a, rx_prbs_9, rx_prbs_8, rx_prbs_7, rx_prbs_6, rx_prbs_5, rx_prbs_4, rx_prbs_3, rx_prbs_2, rx_prbs_1, rx_prbs_0; reg [31:0] fiber,fiber_p1; wire [6:0] error; wire bad; reg [1:0] prbs_select; reg [15:0] fiber_delay; initial begin clk = 0; channel_counter = 0; rx_channel_counter = 5; reset = 0; prbs_select = 0; fiber_delay = 15; #1000 reset = 1; #2000 reset = 0; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 prbs_select = 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 prbs_select = 2; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 prbs_select = 3; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 fiber_delay = fiber_delay + 1; #10_000 $stop; end always clk = #0.0125 ~ clk; //the "40G" clk /******************************* * tx mux side * output is data out which connects to "fiber" variable delay * ******************************/ always @ (posedge clk) begin data_out <= bit_shifter[15]; channel_counter <= channel_counter + 1; if (channel_counter[3:0]=='hf) begin bit_shifter <= {byte_shifter_f[7], byte_shifter_e[7], byte_shifter_d[7], byte_shifter_c[7], byte_shifter_b[7], byte_shifter_a[7], byte_shifter_9[7], byte_shifter_8[7], byte_shifter_7[7], byte_shifter_6[7], byte_shifter_5[7], byte_shifter_4[7], byte_shifter_3[7], byte_shifter_2[7], byte_shifter_1[7], byte_shifter_0[7]}; if (channel_counter[6:4] =='h7) begin byte_shifter_f <= tx_prbs_f; byte_shifter_e <= tx_prbs_e; byte_shifter_d <= tx_prbs_d; byte_shifter_c <= tx_prbs_c; byte_shifter_b <= tx_prbs_b; byte_shifter_a <= tx_prbs_a; byte_shifter_9 <= tx_prbs_9; byte_shifter_8 <= tx_prbs_8; byte_shifter_7 <= tx_prbs_7; byte_shifter_6 <= tx_prbs_6; byte_shifter_5 <= tx_prbs_5; byte_shifter_4 <= tx_prbs_4; byte_shifter_3 <= tx_prbs_3; byte_shifter_2 <= tx_prbs_2; byte_shifter_1 <= tx_prbs_1; byte_shifter_0 <= tx_prbs_0; tx_clk <= 1; end else begin byte_shifter_f <= {byte_shifter_f,1'b0}; byte_shifter_e <= {byte_shifter_e,1'b0}; byte_shifter_d <= {byte_shifter_d,1'b0}; byte_shifter_c <= {byte_shifter_c,1'b0}; byte_shifter_b <= {byte_shifter_b,1'b0}; byte_shifter_a <= {byte_shifter_a,1'b0}; byte_shifter_9 <= {byte_shifter_9,1'b0}; byte_shifter_8 <= {byte_shifter_8,1'b0}; byte_shifter_7 <= {byte_shifter_7,1'b0}; byte_shifter_6 <= {byte_shifter_6,1'b0}; byte_shifter_5 <= {byte_shifter_5,1'b0}; byte_shifter_4 <= {byte_shifter_4,1'b0}; byte_shifter_3 <= {byte_shifter_3,1'b0}; byte_shifter_2 <= {byte_shifter_2,1'b0}; byte_shifter_1 <= {byte_shifter_1,1'b0}; byte_shifter_0 <= {byte_shifter_0,1'b0}; tx_clk <= 1'b0; end end // if (channel_counter[3:0]=='hf) else begin bit_shifter <= {bit_shifter,1'b0}; tx_clk <= 0; end // else: !if(channel_counter[3:0]=='hf) end // always @ (posedge clk) //instantiate the tx tx_prbs sfi5_tx_test ( .i_clk(tx_clk), .i_reset(reset), .prbs_select(prbs_select), .o_prbs_data_0(tx_prbs_0), .o_prbs_data_1(tx_prbs_1), .o_prbs_data_2(tx_prbs_2), .o_prbs_data_3(tx_prbs_3), .o_prbs_data_4(tx_prbs_4), .o_prbs_data_5(tx_prbs_5), .o_prbs_data_6(tx_prbs_6), .o_prbs_data_7(tx_prbs_7), .o_prbs_data_8(tx_prbs_8), .o_prbs_data_9(tx_prbs_9), .o_prbs_data_a(tx_prbs_a), .o_prbs_data_b(tx_prbs_b), .o_prbs_data_c(tx_prbs_c), .o_prbs_data_d(tx_prbs_d), .o_prbs_data_e(tx_prbs_e), .o_prbs_data_f(tx_prbs_f) ); /********************************** * here we add the variable delay between the mux and demux * *******************************/ always @ (posedge clk) begin fiber <= {fiber,data_out}; fiber_p1 <= fiber; data_in <= fiber[fiber_delay]; //this is the delay # //for checking the stream as one 40g prbs if (error_40) prbs_40 <= {fiber,data_out}; else case (prbs_select) 0: prbs_40 <= {prbs_40,prbs_40[6]^prbs_40[5]}; 1: prbs_40 <= {prbs_40,prbs_40[14]^prbs_40[13]}; 2: prbs_40 <= {prbs_40,prbs_40[22]^prbs_40[17]}; 3: prbs_40 <= {prbs_40,prbs_40[30]^prbs_40[27]}; endcase // case(prbs_select) if (prbs_40[31:0] == fiber[31:0]) error_40 <= 0; else error_40 <= 1; end /****************************************************** * and the demux on the recieve side * **************************************/ always @ (posedge clk) begin rx_channel_counter <= rx_channel_counter + 1; rx_bit_shifter <= {rx_bit_shifter,data_in}; if (rx_channel_counter[3:0] == 4'hf) begin rx_byte_shifter_f <= {rx_byte_shifter_f,rx_bit_shifter['hf]}; rx_byte_shifter_e <= {rx_byte_shifter_e,rx_bit_shifter['he]}; rx_byte_shifter_d <= {rx_byte_shifter_d,rx_bit_shifter['hd]}; rx_byte_shifter_c <= {rx_byte_shifter_c,rx_bit_shifter['hc]}; rx_byte_shifter_b <= {rx_byte_shifter_b,rx_bit_shifter['hb]}; rx_byte_shifter_a <= {rx_byte_shifter_a,rx_bit_shifter['ha]}; rx_byte_shifter_9 <= {rx_byte_shifter_9,rx_bit_shifter['h9]}; rx_byte_shifter_8 <= {rx_byte_shifter_8,rx_bit_shifter['h8]}; rx_byte_shifter_7 <= {rx_byte_shifter_7,rx_bit_shifter['h7]}; rx_byte_shifter_6 <= {rx_byte_shifter_6,rx_bit_shifter['h6]}; rx_byte_shifter_5 <= {rx_byte_shifter_5,rx_bit_shifter['h5]}; rx_byte_shifter_4 <= {rx_byte_shifter_4,rx_bit_shifter['h4]}; rx_byte_shifter_3 <= {rx_byte_shifter_3,rx_bit_shifter['h3]}; rx_byte_shifter_2 <= {rx_byte_shifter_2,rx_bit_shifter['h2]}; rx_byte_shifter_1 <= {rx_byte_shifter_1,rx_bit_shifter['h1]}; rx_byte_shifter_0 <= {rx_byte_shifter_0,rx_bit_shifter['h0]}; if (rx_channel_counter[6:4] == 3'h7) begin rx_prbs_f <= rx_byte_shifter_f; rx_prbs_e <= rx_byte_shifter_e; rx_prbs_d <= rx_byte_shifter_d; rx_prbs_c <= rx_byte_shifter_c; rx_prbs_b <= rx_byte_shifter_b; rx_prbs_a <= rx_byte_shifter_a; rx_prbs_9 <= rx_byte_shifter_9; rx_prbs_8 <= rx_byte_shifter_8; rx_prbs_7 <= rx_byte_shifter_7; rx_prbs_6 <= rx_byte_shifter_6; rx_prbs_5 <= rx_byte_shifter_5; rx_prbs_4 <= rx_byte_shifter_4; rx_prbs_3 <= rx_byte_shifter_3; rx_prbs_2 <= rx_byte_shifter_2; rx_prbs_1 <= rx_byte_shifter_1; rx_prbs_0 <= rx_byte_shifter_0; rx_clk <= 1; end else rx_clk <= 0; end // if (rx_channel_counter[3:0] == 4'hf) else rx_clk <= 0; end // always @ (posedge clk) /************************************* * and of course instantiate the rx sfi5 * ********************************/ sfi5_rx sfi5_rx_test ( .i_clk_in(rx_clk), .i_reset(reset), .prbs_select(prbs_select), .i_ch_0(rx_prbs_0), .i_ch_1(rx_prbs_1), .i_ch_2(rx_prbs_2), .i_ch_3(rx_prbs_3), .i_ch_4(rx_prbs_4), .i_ch_5(rx_prbs_5), .i_ch_6(rx_prbs_6), .i_ch_7(rx_prbs_7), .i_ch_8(rx_prbs_8), .i_ch_9(rx_prbs_9), .i_ch_a(rx_prbs_a), .i_ch_b(rx_prbs_b), .i_ch_c(rx_prbs_c), .i_ch_d(rx_prbs_d), .i_ch_e(rx_prbs_e), .i_ch_f(rx_prbs_f), /* skip the mux/demux .i_ch_0(tx_prbs_0), .i_ch_1(tx_prbs_1), .i_ch_2(tx_prbs_2), .i_ch_3(tx_prbs_3), .i_ch_4(tx_prbs_4), .i_ch_5(tx_prbs_5), .i_ch_6(tx_prbs_6), .i_ch_7(tx_prbs_7), .i_ch_8(tx_prbs_8), .i_ch_9(tx_prbs_9), .i_ch_a(tx_prbs_a), .i_ch_b(tx_prbs_b), .i_ch_c(tx_prbs_c), .i_ch_d(tx_prbs_d), .i_ch_e(tx_prbs_e), .i_ch_f(tx_prbs_f), */ .i_ch_deskew(8'h0), .o_error(error), .o_bad(bad) ); always @ (error) $display("error is %d",error); endmodule // test_tx
Still have questions?