HTR logic (Main FPGA)

FEATURES SUPPORTED

• One HTR board includes two identical logic sub-modules (Top and Bottom).
• Record a programmable number NDD of QIE time-samples per channel and NTP of TP samples in response to a trigger. In any case: $0 \leq NTP \leq NDD$; $NTP+NDD \leq 21$ and $0 < NDD$.
• Trigger arrives less than 6us after data (adjustable latency)
• Process TTC commands
• Testing features via VME
• Reject triggers violating TDR rule 1 [No more than 1 trigger per 3 BXs] and 2 [No more than 2 triggers per 25 BXs]
• Overflow Warning (when buffer occupancy $> 75\%$) and Overflow flags.
• Empty Events generation (with correct EV#, BC# and ORBIT#) after Overflow Warning.
• L1-trigger path with error and CapID check (reset data if there is an error)
• Investigate fiber-to-fiber alignment with BC0
• TP Latency = ? Requirement is $\sim$ 14 clock ticks (40 MHz clock) for the whole FPGA.
• Max trigger rate (assuming periodic triggers) $\sim 1 / \{ [24x(NDD+NTP)+20 +6] x 25 \text{ ns} \}$

Format of a single L2-Daq Data word:

```
L2Data_out[15:0] = { FiberAd[2:0]; QIEAd[1:0]; Link_Er, Link_DV , CapID[1:0], Range[1:0], Mantissa[4:0] }
```

FiberAd[2:0] indicates which fiber (0 to 7); QIEAd[1:0] indicates which QIE channel in the fiber (0 to 2)
FE-HTR Data format

Adapted from: http://www-ppd.fnal.gov/tshaw.myweb/CMS/TestBeam2002/Data_format_v2.pdf


NB: orbit message is different and not decoded yet in HTR

It has been agreed to send some IDLE patterns over the link during each Abort gap
## HTR-DCC Data Format - (output of the HTR DAQ-path)

<table>
<thead>
<tr>
<th>Word Type</th>
<th>S1 S0</th>
<th>Byte 1</th>
<th>Byte 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>HEADER</td>
<td>1 1</td>
<td>Zeroes</td>
<td>EvN [7:0]</td>
</tr>
<tr>
<td>Ext. Header3</td>
<td>1 0</td>
<td>PipeLength[7:0]</td>
<td>HS RL EE SR OW OV HM CM</td>
</tr>
<tr>
<td>Ext. Header4</td>
<td>1 0</td>
<td>OrN [7:0]</td>
<td>HTR_sub_module_Number</td>
</tr>
<tr>
<td>Ext. Header5</td>
<td>1 0</td>
<td>TrigType[3:0] + BCN[11:8]</td>
<td>BCN [7:0]</td>
</tr>
<tr>
<td>Ext. Header6</td>
<td>1 0</td>
<td>Total number of TP words[7:0]</td>
<td>DLL_lock TTCready</td>
</tr>
<tr>
<td>TP-DATA1</td>
<td>1 0</td>
<td>TP[8:0] = {FiberAd[2:0]; ChAd[1:0]; 0; 0; CapID[1:0], QIE1Data[6:0] }</td>
<td></td>
</tr>
<tr>
<td>…</td>
<td>1 0</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>TP-DATAm</td>
<td>1 0</td>
<td>TP[8:0] = {FiberAd[2:0]; ChAd[1:0]; 0; 0; CapID[1:0], QIE1Data[6:0] }</td>
<td></td>
</tr>
<tr>
<td>DAQ-DATA1</td>
<td>1 0</td>
<td>DAQ-Data[15:0] = {FiberAd[2:0]; QIEAd[1:0]; Er; DV; CapID[1:0], QIEData[6:0] }</td>
<td></td>
</tr>
<tr>
<td>…</td>
<td>1 0</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>DAQ-DATAAn</td>
<td>1 0</td>
<td>DAQ-Data[15:0] = {FiberAd[2:0]; QIEAd[1:0]; Er; DV; CapID[1:0], QIEData[6:0] }</td>
<td></td>
</tr>
<tr>
<td>Extra-Info1</td>
<td>1 0</td>
<td>Arrival time (BCN) of Bzero from Fiber1 [11:0] (to study the latency) or other info</td>
<td></td>
</tr>
<tr>
<td>Extra-Info8</td>
<td>1 0</td>
<td>Arrival time (BCN) of Bzero from Fiber8 [11:0] (to study the latency) or other info</td>
<td></td>
</tr>
<tr>
<td>Pre-Trailer</td>
<td>1 0</td>
<td>Zeroes</td>
<td>Zeroes</td>
</tr>
<tr>
<td>TRAILER</td>
<td>0 1</td>
<td>EvN [7:0]</td>
<td>Zeroes</td>
</tr>
</tbody>
</table>

\[ n = \# \text{ of DAQ-DATA words} \leq 24 \times 10 \text{ (depends on Zero-suppression).} \quad \text{NB: the number of words with [S1 S0] = [1 0] must be a multiple of 2.} \\
\text{CM} = \text{Counter\_Mode: if "0" real data; if "1" Internally generated counter data. A trigger is needed as in the real mode. Set from VME.} \\
\text{HM} = \text{Histogramming mode, need to change firmware to switch.} \\
\text{OF} = \text{Overflow Warning. For debugging purposes (it should never happen).} \\
\text{OW} = \text{Overflow Warning. It should be reported to the aTTS by the DCC.} \\
\text{EE} = \text{Empty Event (consequence of a past OW). An Empty Event includes only the first 5 header words and the last 3 words.} \\
\text{RL} = \text{Rejected previous L1A (when previous L1A violates the trigger rules i and ii of Trigger TDR 16.4.3 )} \\
\text{HS} = \text{when in Histogramming mode, indicates which Set of fibers are used: when this bit is 0, histograms are from fibers 1-4 .} \\
\text{TrigType: if (TTC\_L1A) \ trig\_type <= 1; if (VME\_L1A) \ trig\_type <= 2; ….} \\
\text{EvN} = \text{counted internally, should be a copy of the TTCrx EvN unless an L1A is rejected (RL flag). EvN doesn’t increment with VME\_L1A.}
Quality of HTR data

Users of HTR data should verify that:

- bits Er = 0, DV = 1 in all DAQ-Data words
- DLL_lock = 1, TTCready = 1 in Ext. Header6
- the actual event size = WordCount[9:0] = (# of Daq-Data samples + # of TP samples) x 24 + 18
- For a given half-HTR, EvN [23:0] increases by 1 at every event
- EvN [23:0] is equal across all HTRs, if an EventCounterReset was issued by the TTC.

If this conditions are not verified, some debugging is needed. Please report the data to Tullio@umd.edu
Main HTR FPGA – top level

**INPUT STAGE**
- 8 Synchronization FIFOs
- Data_Sync
- Change Clock
- 0
- 1
- Fake_Data
- Fake_Mode

**Trigger_PATH**
- 24 Linear LUTs
- 24 L1-Filter
- 24 Compr. LUTs
- Muon LUTs

**DAQ_PATH**
- 24 + 24 L1 latency pipelines
- 24+24 Derand. Buffers
- Event Buffer
- Output stage
- Control #s FIFOs
- Pointer FIFO
- Daq_Ctrl

**MAIN_CTRL**
- Input-Spy
- Control/Status
- Hard_rst
- Soft_rst
- Start
- Stop
- Run_status
- Other (LEDs, TPs, switches...)

**MAIN_CTRL**
- TTC signals
- LocalBus (from VME)

**OTHER**
- TP-Spy
- DAQ-Spy
Synchronization FIFOs and Prog. Delays

Each incoming FE-bus is synchronous with its own Recovered_CK. This stage synchronizes the data to the System Clock x 2.
Consider the “Self-Addressing” architecture for these FIFOs (improve timing uncertainties).
To align TPG data, a delay must be added to the corresponding channel. This value of each delay is the value of the gaps on the synchronization histogram. [Carlos].
Make sure that when the link is down there is a free-running RX_CK from the TLK2501, in order to write ER and DV.
Clock Change stage

transform one 16-bit word @80MHz into the transmitted 32-bit word @ 40MHz and finally into three QIE words.

Here is reconstructed the same word latched at the GOL input. Thus it’s easy to extract the sub-fields (Exp, Mantissa etc) of the 3 QIE channels.
Control module of Main FPGA - MAIN_CNTR

TBD: “Resync” : command interpreted as a re-synchronization of all sub-systems readout to the same event. Event and bunch counters as well as readout memories and pointers are reset.

Rest of the FPGA

TP[5:2]
TP[1]
RJ45_L1A, RJ45_BC0
LEMO_L1A, LEMO_Hard_Rst

EvN, BCN, OrN
NB: EvN counts all and only the L1Accept from TTCrx, independently from Start/Stop, thus it matches the TTCrx event_counter.

Int_L1A (disabled by Stop, enabled by Start)

{Hard_rst, Soft_rst, Start, Stop}

Run_status

Fake_L1A

L1 latency pipeline

Zero_latency_L1A

Pipe_Length[7:0]

HTRsubcardN[7:0]

TTC_L1A
TTC_Hard_rst
TTC_Soft_rst
TTC_Start
TTC_Stop

DataTrig FIFO

push
InSpy FIFOs

TP&DAQ FIFO

Input_Data @ Clk2x

Fake_Data @ Clk2x

TP_Data DAQ_Data

Cntr/Stat

Rest of the FPGA

XLocBusSlave

{Clk, Clk2x}

HP_DATA[15:8] HP_CLK[2:1]

LED[1:4]

TTC signals

Connections to the board

LocalBus
HCAL L1 (Trigger) Path - Proposal

The data coming from the front-end (QIE) are in the 7-bit Mantissa-Exponent format. They have a resolution of about 0.25 GeV and a maximum energy of 2714 GeV [CMS IN 2001/037, Fig.5]

The sequence of operations in the HCAL Trigger Primitive Generator (HTR board) is (see next pages):

0) Reset the TP if any of the following: 32 bit-Link not OK; CapIDError; GOL_reconstructed_data[2:1] ≠ (0, 1);
1) Linearize the data with a LUT on a 10-bit transverse energy value. If we assign the resolution = LSB = 0.5 GeV this gives an end-of-scale value of 512 GeV.
2) Apply a filter (still under study) for Bunch Crossing Identification. The filter will likely sum two consecutive time-samples and then perform a peak-detection (and maybe apply a threshold).
3) Sum 1 to 7 linearized channels of transverse energy. In case of overflow the output is set to the maximum.
4) The next step is the compression for the transmission of the Trigger Primitives to the Regional Calorimeter Trigger.
   We use a LUT with 10-bit input.
5) The Muon window must be applied directly on the QIE-format, using a LUT with a 2-bit output, with the following meaning:
   00: energy below low-threshold
   01: energy within low-threshold and high-threshold
   10: energy bigger than high-threshold
   11: unused

6) The BCID information allow to select the 2-bit vector corresponding to the peak of the event (this avoids to flag as a muon the tail of a more energetic event).
7) If there are multiple channels, the 2-bit vectors go into another LUT; this is to take care of cases where showers can leak into a cell and incorrectly set the muon bit: "If two muons are input, and both are below the low threshold or above the high threshold, then the output is 0. If both are above the low and below the high threshold, then the output is 1. If one is below the low threshold and one is above the low threshold _and_ below the high threshold, the output is 1. If one is above the high threshold, then the output is 0 irrespective of the value of the second." [W.Smith]
Trigger-Path
Case without Sum of QIE-channels

* In case that after the summing there is a "plateau" [e.g.: 0, 10, 10, 0] select all the relative maximum points.
New Trigger-Path : Sum n(<8) QIE-channels

* If overflow, set the result to the max value. Investigate 11-bit sum and LUT. Note that there is a unique compression LUT for a group of channels. Each channel participating on the sum must have the possibility to be masked, to perform a sync histogram based on each independent channel (input data from each deserialiser) [Carlos]
Doing the Peak-detection in parallel with the Sum decreases the latency, and it can be used to off-set the latency of the following sum of trigger towers.
NB: during configuration this LUT has non-zero outputs, so need to disable (Stop command) the SLB board (software specification).
This is not done in this module in order to reduce the latency.
LUT Initialization Scheme over VME-Local Bus

NB: to configure a given LUT, from VME the access will be always at the same address. Then on the board VME FPGA the correct address will be generated with a counter.
Input LUT initialization at compilation time

NOTE:
The Input LUTs in the Trigger Path use Single Port Distributed Ram. The 128x11 memory needed is mapped to 11 128x1 distributed ram primitives as shown in the figure. The first bit of all 128 locations are mapped to one instance of the primitive, the second bit of all 128 locations are mapped to the second instance and so on. This information is needed when we need to initialize the ram contents at power up. The initial values can then be put in the UCF file accordingly. Each of the 11 instances have to be initialized separately.
L2-DAQ Path

No energy extraction algorithm ⇒ QIE-data + address. The 24 QIE-channels (in parallel with the 24 TP-channels) are temporarily stored in a pipeline (circular buffer) to wait for the L1A trigger decision. The storage time is programmable between 2 and 255 clock ticks. Each L1A trigger selects a block of NDD time-samples per QIE-channel and NTP time-samples per TP-channel. The selected data of the 24 channels must be inserted on the HTR/DCC data format. EV#, BC#, ORBIT# must correspond to the appropriate data.

As of Pre-prod v25 firmware, the circuit starts writing on the Event Buffer only when empty and about 5 clock ticks after that, starts reading it. This implies that, on a very high trigger rate, the HTR can send event fragments to DCC almost continuously. Investigate how to align the DAQ-channels using SLB info.
Trigger acceptance & Empty events

1. Wait for L1A arrival

2. Increase EvN (should match TTCrx)

3. Does L1A violate Trigger Rules (I) and (ii)?
   - YES: Reject L1A; set RL=1 (RL=1 stored with the L1A following the rejected L1A)
   - NO: Overflow Warning?
     - YES: Store no data, only L1A pointer with EmEv=1 (need pointer to calculate Word_count)
     - NO: Store data and L1A with EmEv=0

4. Pointer_FIFO Empty?
   - YES: The 2 processes are decoupled by drnd_buffer and pointer_fifo
   - NO: Read Pointer_out

5. In pointer_out is EmEv=1?
   - YES: Generate Empty Event block with proper flag
   - NO: Generate Normal Data block
Derandomizer and trigger rules

The Derandomizer is not a simple FIFO as it handles the case of two L1A within a 4-tick interval (Trigger TDR 16.4.3). Such an interval is smaller than the number of time samples (∼10) to be collected (Trigger TDR 7.3.1), thus overlapping.

Example with:
- # of samples = 5
- WR_ADDR of the first word = 11

Example with:
- two L1As separated by two BXs
- # of samples = 5
- WR_ADDR of the first word = 11
Generation of the Overflow Warning Flag

Derandomizer buffers are Synchronous dp-Rams 36 X 512 deep. They are the first elements that can overflow in the HTR, in case of a high trigger rate.

Let: \[ \text{word\_count} = \text{drnd\_wr\_addr} - \text{drnd\_rd\_addr} \]

The overflow_warning flag (OvW) is generated by the simple scheme shown below:

\[ \text{word\_count} \geq 384 = 75\% \text{ of 512} \]

\[ \text{word\_count} \leq 256 = 50\% \text{ of 512} \]

This scheme introduces a sort of histeresys, to avoid that the OvW flag keeps toggling when the buffer are around 75% full.
Example of minimum L1A spacing that does not violate the Trigger rules

[Diagram showing time stamps and spacing intervals]
Example of DAQ-output sequence

In this example there are no rejected L1As

Note that the OW included in each Data Block, latches the value of the internal OW at the beginning of each block transmission.
Pedestal calculation

Study the possibility to calculate the pedestal as an average of the input value during the abort gap. For instance accumulate 32 inputs and then divide by 32 (simple shift).
If the abort gap is used for something else (IDLE patterns, etc), calculate the pedestal taking data during the minor gaps (~38 BXs). For instance accumulate the 16 inputs and then divide by 16 (simple shift); this protects from random latency effects, etc.
This must be done per channel.
Pedestal values should be sent out to the DCC and/or used for zero suppression and energy filtering.
Address Section
Details on http://www.physics.umd.edu/hep/HTR/preprod/Xilinx_addr_map.html

Total No of Local Bus Address Lines is 21.
Therefore Total Space : $2^{21} - 1$. Divided into four Segments as shown above
More Design guidelines

Synchronous design:

```verbatim
always @ (posedge clk or posedge rst)
  if (rst) begin
    { reset_instructions }
    end // rst
  else begin // clk
    { .... }
  end // clk
```

Source Synchronous data transmission (of DAQ data) with inverted clock.
RX_ERR stretched to LED.
Assign SLEW RATE = SLOW for all outputs (investigate FAST for clock outputs)
Register all data outputs of the chip.
Each verilog module has the same name of the file.
Use parameterized modules when possible (more difficult when instantiating embedded features like RAMs).
Possibly avoid underscore “_” on long names: LongName better than long_name (it helps finding paths of internal signals).

ISE 5.1i settings

If using DCI (Digital Controlled Impedance), set Match Cycle = 2 under Generate Programming File → Startup Options (see Answer 12573).
Force FFs in IO cells, especially for Trigger Primitives.
It would be better to use a script.
Port Headers Style

The Verilog primitive modeling style established a preference in defining port interfaces in the following order for each
interface grouping (e.g., PCI bus, RS232 bus, LocalBus):
1. Outputs first
2. Inputs second
3. Control signals third
4. Clock and reset last.

When instantiating a module on a higher-level module do not map by order but use this format:

```
module_name instance name ( .InternalPortName1(ext_port_expr1), .InternalPortName2(ext_port_expr2) )
```

More Conventions

- Polarity
  - Use _n (for negative) or _i (inverted) suffix to infer objects that are active low.
- Registers
  - Use _reg or _d to denote a registred signal.
  - If registers are pipelined, use a number to indicate depth of pipeline:
    - data_1r data_2r data_3r -- depth of pipeline
    - enb_1rn enb_2rn enb_3rn -- depth + polarity

Maintaining a consistent style is highly recommended.