# The 10-ps Wave Union TDC: Improving FPGA TDC Resolution beyond Its Cell Delay

Jinyuan Wu and Zonghan Shi

Abstract— There are two major issues in the delay chain based FPGA TDC due to uneven internal delay in the carry chain. (1) The bin widths are uneven and depend on temperature and power supply voltage, which must be calibrated as frequently as possible. The auto-calibration functional block developed in this work provides semi-continuous calibration that converts the TDC measurements from bins to picoseconds. (2) In many applications, the TDC resolution is limited by the "ultra-wide bins", corresponding to the carry chain crossing at the boundaries of the logic array blocks. The apparent widths of these ultra-wide bins can be several times bigger than the average bin width. The "wave union launchers" described in this paper are designed to make multiple measurements with a single delay chain structure, effectively to sub-divide the ultra-wide bins in each raw measurement. Several TDC schemes with resolutions in 20 to 10 picoseconds range implemented in today's low cost FPGA have been tested.

# Index Terms—Front End Electronics, TDC, FPGA Firmware

### I. INTRODUCTION

CHAIN structures existing in most of today's FPGA families can be used in time-to-digital conversion (TDC) purposes[1-6]. There is a good review of delay chain based FPGA TDC in Reference [2]. Unlike in application-specific integrated circuit (ASIC) devices in which the designers can choose to delay either hit input, or clock or both, usually in FPGA only hit input can be delayed and the register array is driven by a single clock as shown in Fig. 1(a).



Fig. 1. Delay chain based FPGA TDC and differential nonlinearity plot.

Manuscript received October 30, 2008. This work was supported in part by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the United States Department of Energy and University of Chicago's Fermilab Strategic Collaborative Initiative.

The authors are with Fermi National Accelerator Laboratory, Batavia, IL 60510 USA (phone: 630-840-8911; fax: 630-840-2950; e-mail: jywu168@fnal.gov).

A special feature of the FPGA TDC is its large differential nonlinearity (DNL) as shown in Fig. 1(b) which is represented as apparent width of each TDC bin. There are several origins of DNL. (1) The first and the most significant one is the logic array block (LAB) structure. When the input signal in the carry chain passes across the LAB boundaries (and also the half-LAB boundaries in some FPGA families), extra delays added cause periodic "ultra-wide bins". Based on our measurement, in an Altera Cyclone II device (EP2C8T144C6) [7], the typical raw bin width is about 60ps, while the ultrawide bins can be as large as 165ps. (2) Another origin of the DNL is the delays in the clock distribution network [2]. The clock signals drive different flip-flops in the register array not exactly simultaneously. (3) There is also a logic or firmware origin of the DNL. The carry chain in FPGA is actually a small lookup table allowing users to specify different carry logic. The delay cells can be specified as either non-inverting or inverting buffers. With inverting delay cells, the input signal passes through the delay chain with alternating opposite logic transitions that have different propagation delays causing different widths of the even and the odd bins. In some cases, the DNL of the even-odd bins can be a good feature that will help us to improve the overall measurement resolution.

Two major issues must be solved for the practical FPGA TDC "turn-key" applications. (1) The bin widths are uneven and depend on temperature and power supply voltage, which must be calibrated as frequently as possible. The autocalibration functional block developed in this work provides semi-continuous calibration that converts the TDC measurements from bins to picoseconds. The auto calibration scheme will be discussed in Section II. (2) In many applications, the TDC resolution is limited by the "ultra-wide bins". The "wave union launchers" described in this paper are designed to make multiple measurements with a single delay chain structure, effectively to sub-divide the ultra-wide bins in each raw measurement.

A wave union launcher creates a pulse train or "wave union" with several 0-to-1 or 1-to-0 logic transitions for each input hit and feed the wave union into the TDC delay chain/register structure, making multiple measurements. There are two types of the wave union launchers: (1) the Finite Step Response (FSR) ones and (2) the Infinite Step Response (ISR) ones. This classification is an analogue of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) for linear systems except the inputs for the wave union launchers are logic steps.

A FSR wave union launcher, like FIR linear systems, employs no feedback and generates a pulse train with finite length and limited number of logic transitions. An ISR wave union launcher, like IIR linear systems, uses feedback to generate an infinite pulse train.

In our work, we have studied two wave union launchers: the "wave union launcher A", a FSR launcher with two useable logic transitions and the wave union launcher B", an ISR one based on a ring oscillator. They will be discussed in the remaining sections of this paper.

# II. AUTOMATIC CALIBRATION BLOCK

It is known that the propagation delay of a delay cell depends on temperature and power supply voltage. In ASIC TDC it is possible to compensate the delay variation using analog method, i.e., to generate a control voltage from the phase difference of external crystal oscillator and the internal ring oscillator and to use the control voltage to fine tune the internal cell delays via a negative feedback.

In FPGA TDC, analog compensation is not convenient and digital calibration is more preferable. There are at least two approaches of digital calibration, i.e., the double registration approach [4] and statistical approach [8].

In the double registration scheme, the total delay time of the delay line is designed to be longer than the clock period  $t_{\rm p}$ . Sometimes, an input logic transition will be recorded by the register array twice. If the positions of the two registered logic transitions are  $N_1$  and  $N_2$ , respectively, then the average cell delay is  $t_{\rm d} = t_{\rm p}/(N_2 - N_1)$ . The advantage of this scheme is its fast response time. However, it does not provide bin-by-bin calibration when the bin widths are different since only the average cell delay is measured in this scheme.

The statistical approach provides bin-by-bin calibration and we will focus on this method in the rest of this section.

The automatic calibration functional block we developed in our work is shown in Fig. 2.



Fig. 2. The automatic calibration functional block

After power up or system reset, all TDC inputs are fed with calibration hits. The timing of these hits should have no correlation with the clock signal driving the TDC, so the hits should be generated from an independent oscillator. It is also possible to use real event hits as calibration hits if the hit rate of the real events is sufficiently high.

The input from the TDC encoder in our design is a 6-bit number, representing the bin number of the logic transition of the input signal with possible range of 0 to 63. A 64-bin DNL histogram is booked in the FPGA internal memory. If the number of total hits is known, then the counts in each bin can be used as its bin width. For example, if 16384 hits are booked into the histogram and assume these hits are evenly spread over 2500ps, the period of 400MHz clock driving the TDC, then the width of a bin with N count is N\*2500ps/16384 = N\*0.1526ps.

Once all hits are booked into the histogram, a sequence controller starts to build the lookup table (LUT) in the FPGA internal memory. The LUT is integrated from the DNL histogram so that it outputs the actual time of the center of the addressed bin. The time value of the first bin is half of the width of the first bin. Then another half bin width of the first bin and the half bin width of the second bin are added to get the center time of the second bin. This sequence is repeated for remaining bins. Once the LUT is built, the outputs of the LUT are the TDC times calibrated to the temperature and power supply condition during booking the previous 16K hits.

In normal operation, new DNL histogram can be booked as real event data is taken. Each time when a new DNL histogram is booked with 16K hits, a new calibration LUT can be built and used for subsequent events. In real implementation, the DNL histogram booking/LUT building process and the current LUT are in two ping-pong memory pages. The pages are swapped in a single clock cycle so that no service dead time occurs during LUT update.

An example of the LUT is shown in Fig. 3, which is integrated from Fig. 1(b).



Fig. 3. An example of the LUT in auto calibration functional block

Through the LUT, the bin number from the TDC encoder is converted to a time from 0 to 2500ps. The spacing between two bins is the sum of the half width of the two bins. In our example above, a hit can arrive with bin number from 4 to about 44 (see also Fig. 1(b)). Outside this range, the bin widths are not measured and the LUT will treat these bins to correspond to either 0 or 2500ps. Fortunately, most of hits will land at the active range because the active range itself is defined by non-zero bins booked in the DNL histogram that are created by input hits. There is a small chance that a hit may land just one bin outside the boundaries due to temperature or power supply drift. In this case, interpreting the hit time as 0 or 2500ps is a good approximation.

# III. THE WAVE UNION TDC A

A wave union scheme, "wave union launcher A", has been tested as a proof of concept. The wave union launcher A belongs to the FSR type. It generates a pulse train with three logic transitions of which two are encoded. We will discuss its implementation and test result in this section.

# A. The Wave Union Launcher A

The wave union launcher A is implemented in a LAB with 16 logic elements as shown in Fig. 4(a). It is connected with rest of 48 cells in the 64-cell carry chain/register array.



Fig. 4. The wave union launcher A (a) and its screen dump (b)

When the input level is 0, a logic pattern or "wave union" with two 1-0 transitions and one 0-1 transition is formed in the launcher and the pattern is held in place. When the input level becomes 1, the wave union is unleashed to propagate in the carry chain. At the leading edge of the clock (400MHz in our test), a snap shot is recorded in the register array. A screen dump of such snap shots is shown in Fig. 4(b). In the screen dump, the pattern propagates from left to right and each line corresponds to a hit event. The top portion of the screen dump shows hits with nearly same arrival times relative to the clock while the bottom portion shows events with different arrival times.

The nominal separation of the two logic transitions marked in Fig. 4(b) is 13 bins and they are encoded for further process. Given that the period of the ultra-wide bins is 8, at least one of the two logic transitions will be in a normal bin. This arrangement effectively subdivides the ultra-wide bins or improves the sensitivity of the TDC. If one transition is in an ultra-wide bin and is not sensitive to the arrival time change, the other transition will be in normal bins and maintain the sensitivity. The effect can be seen in Fig. 5.



Fig. 5. The bin width plot of plain TDC and the wave union TDC A

One of the two logic transitions is encoded and its DNL histogram is booked and marked with "Plain TDC" in Fig. 5 in which ultra-wide bins are seen. The sum of the bin numbers of the two transitions is also booked and marked with "Wave Union TDC A" in Fig. 5. Now as expected, no bins are ultra-wide anymore. The sum of the bin numbers spreads from about 20 to about 100, about twice of the range for plain TDC.

When both of the transitions are in normal bins, nominally they are in opposite odd-even bins, given their separation of 13. When the odd and even bins have different width, they effectively subdivide each other and therefore, further improve the TDC resolution.

It is interesting to compare our scheme and the interleaving scheme in ASIC TDC for improvement of resolution. Both schemes use multiple measurements to reduce measurement errors. But in our design, the two measurements are made in the same delay line/register array structure to save logic resource. In ASIC, since fine delay control is possible, usually the time difference of the two TDC inputs is exactly half of bin width which yields best gain of factor of 2 for resolution improvement. In FPGA, fine propagation delay control is very hard. The resolution improvement is more or less factor of 1.4 (square root 2), if there are sufficient variations of the bin widths. This is the reason why it is preferable to use inverting buffer in the carry chain which causes even-odd bin width difference. In this design, improving resolution is a secondary purpose while subdividing the ultra-wide bin being the primary one.

# B. Calibration

The sum of the bin numbers of the two transitions improves sensitivity of measurement by subdividing the ultra-wide bins. However it only guarantees monotonic dependency on arrival time and calibration is required to convert it to actual time. The auto calibration functional block described in previous section is used and the DNL histogram and the LUT now have 128 bins. The lookup tables of the plain TDC and the wave union TDC A are compared in Fig. 6.



Fig. 6. The lookup tables of plain TDC and the wave union TDC A

The slope of the LUT which can be interpreted as average bin width for wave union TDC A is about half of the slope for plain TDC.

# C. Test Result

A simple bench top test has been down in an Altera Cyclone II device (EP2C8T144C6). The input to the TDC is a pulse train with 53.11 MHz repeating rate that is generated by a crystal oscillator of 53.11 MHz. The TDC is driven at 400 MHz that is derived from a separate crystal oscillator of 50 MHz. The times of the 53.11 MHz edges are digitized with three TDC schemes and the time differences of consecutive edges are booked in histograms as shown in Fig. 7.



Fig. 7. Histograms of the time differences with different TDC schemes

Only the fine time part, i.e., the lower bits representing time difference modulo 2500ps is plotted. As expected, pulse edges with 53.11 MHz repeating rate have a fine time separation of  $1171ps = 8*2500ps - (10^6ps/53.11)$ .

The un-calibrated raw TDC, i.e., the output of the encoder has a delta t RMS error of about 58ps. The error comes from both incorrect linear assumption of the time value of each bin and existence of ultra-wide bins. After calibration, the RMS error for the plain TDC is reduced to 40ps. However, the ultra-wide bins still artificially emphasize or de-emphasize some delta t values. The RMS error from the wave union TDC A is further reduced to 25ps and the histogram becomes nearly a bell-shaped distribution since the structure due to ultra-wide bin is eliminated.

# IV. THE WAVE UNION TDC B

Another version of the wave union TDC, the "wave union TDC B" has also been tested. The "wave union launcher B" used in this test is simply a ring oscillator enabled by the input and it belongs to ISR type. After the input level turns from 0 to 1, a pulse train with unlimited length and logic transitions is generated. More measurements can be made for one input so that the average of these measurements yields better TDC resolution.

# A. The Wave Union Launcher B

The wave union launcher B is an NAND gate with a feedback through a delay buffer as shown in Fig. 8(a). When the input is 0, the output is held at constant logic 1 level. After arrival of the input, it starts to oscillate launching a wave union into the carry chain. The carry chain/register array structure takes 16 snap shots of the oscillation bit patterns in 16 clock

cycles at 400MHz as shown in Fig. 8(b).



Fig. 8. The wave union launcher B and the screen dump

The phase of the oscillation is determined by the arrival time of the input signal which is what to be measured by the TDC. The oscillation frequency of the ring oscillator is designed to be around 400 MHz and it does depend on temperature and power supply voltage as shown in Fig. 8(b).

However, if the drift of temperature and power supply voltage is sufficiently slow, it is reasonable to assume that the oscillation frequency is stable and can be measured to a good accuracy after long time. Then in the 16 snap shots, the locations of the logic transitions can be utilized to compute the arrival time of the input signal to a higher resolution.

# B. Data Processing in FPGA

A priority encoder is implemented for the output of the lower 48 registers. Only 1 to 0 logic transitions are encoded and the bin numbers are chosen from 16 to 63 with 16 representing earlier and 63 later arrival times, respectively. If in one snap shot more than one valid logic transitions exist, the transition with smaller bin number is chosen.

Assuming the oscillation period of the ring oscillator is  $t_P$  and the arrival time of the input signal is  $t_0$ , the time of the n-th oscillation edge t can be written:

$$t = t_0 + t_P n \tag{1}$$

After arrival of the input signal, 16 snap shots are taken driven by the TDC clock with period  $t_{\rm C}$  and the corresponding clock cycle numbers are assigned as m=0 to 15. If the measured TDC value (after calibration) of an oscillation edge is  $T_{\rm M}(m)$ , the actual time t of the edge is:

$$t = T_{\rm M}(m) + t_{\rm C}m \tag{2}$$

Combining these two equations together, the arrival time of the input signal can be written:

$$t_0(m) = T_M(m) + t_C m - t_P n(m)$$
(3)

Integer n(m) is the number of the oscillation edge being encoded at clock cycle m and the initial value n(0) = 0 is chosen. It can be seen that if both the clock and ring oscillation periods and the oscillation edge number are known, the input signal arrival time can be calculated by any of the 16 measurements. The process of computing  $t_0(m)$  is called delay correction which will be discussed next.

The period of the ring oscillator is designed to be different from the clock period so that  $T_{\rm M}(m)$  in the 16 snap shots can have different values to ensure that not all measurements are in an ultra-wide bin. This causes an issue of keeping track of the oscillation edge number n(m). In most of time, if an

oscillation edge is seen in a clock cycle, the next oscillation edge is seen in the next clock cycle, assuming difference between the oscillation period and clock period is not too big. But sometimes, an oscillation edge can be seen twice in two clock cycles and on the other hand, an oscillation edge may be skipped when the oscillation is faster than the clock. These cases can be illustrated with an example with several hits shown in Fig. 9.



Fig. 9. Raw measurements of the wave union TDC B

Hits with different arrival times are detected and 16 data points are taken associated to each hit in the example above. We define three types of jumps, i.e., U-type, V-type and W-type, between two adjacent measurement points. At a given clock cycle except m = 0, if the current raw TDC measurement from the encoder is in the range [16, 31] and the measurement of previous cycle is in [48, 63], it is defined as a U-type jump. A jump from [48, 63] to [16, 31] is defined as W-type. All the other jumps, usually are very small are defined as V-type jumps. The two edges in the U-type jump are actually the same oscillation edge that propagates in the delay chain and is recorded twice, first in the [48, 63] range and then in [16, 31] range, i.e., the two edges have the same n(m) = n(m-1). Similarly, the n(m) value increases by 1 and 2 for V-type and W-type jumps, respectively. The above can be summarized:

U-type: 
$$T_M(m):[48,63] \rightarrow [16,31]$$
  $n(m)-n(m-1)=0$   
V-type:  $T_M(m):$  all the other jumps  $n(m)-n(m-1)=1$   
W-type:  $T_M(m):[16,31] \rightarrow [48,63]$   $n(m)-n(m-1)=2$  (4)

In actual implementation, instead of computing the multiplications in Equation (3), the last two terms are replaced with an accumulation while processing the measurement points one by one.

$$t_{0}(m) = T_{M}(m) + T_{A}(m)$$

$$T_{A}(m) = T_{A}(m-1) + \begin{cases} t_{C} = D_{U} & \text{U-type} \\ t_{C} - t_{P} = D_{V} & \text{V-type} \\ t_{C} - 2t_{P} = D_{W} & \text{W-type} \end{cases}$$
(5)

The input values  $D_{\rm U}$ ,  $D_{\rm V}$  and  $D_{\rm W}$  for the accumulation are calculated by averaging over many measurement points. We have used a simple IIR low pass filter with exponential impulse respond function as our average calculator:

$$D_{U}(i) = D_{U}(i-1) + 2^{-L}([T_{M}(i) - T_{M}(i-1)] - D_{U}(i-1)) \quad \{U\}$$

$$D_{V}(i) = D_{V}(i-1) + 2^{-M}([T_{M}(i) - T_{M}(i-4)]/4 - D_{V}(i-1)) \quad \{3V\}$$

$$D_{W} = 2D_{V} - D_{U}$$

The average calculator given here needs no initialization and no end of accumulation sequence as regular average calculator does. After sufficient long time, the output of the average calculator converges to the average and become available at all times. A parameter, L or M above, allows the users to choose an appropriate trade off between the speed of convergence and the precision of the average. The average calculator for  $D_{\rm U}$  is enabled to operate whenever a U-type jump is seen. The  $D_V$  average calculator is designed to be enabled when the previous 3 jumps are all V-type so that the input is 4 times of the V-type jump value to improve computation precision. The value  $D_{\mathrm{W}}$  is derived from  $D_{\mathrm{U}}$  and  $D_{\rm V}$  since the W-type jump will not happen if the ring oscillator frequency is slower than the clock frequency, while the U- and V-type jumps always exist no mater the oscillator frequency is faster or slower than the clock frequency.

After the delay correction process described above, the delay corrected measurements  $t_0(m)$  should have a constant value for a given hit. An example of several hits with various input arrival times is shown in Fig. 10.



Fig. 10. The delay corrected measurement values in wave union TDC B

An improvement of the overall measurement resolution can be anticipated by averaging of these 16 arrival times:

$$t_{0av} = \frac{1}{16} \sum_{m=0}^{15} t_0(m) \tag{7}$$

An obvious draw back for the wave union TDC B is longer dead time since it needs 16 clock cycles to collect all data points.

# V. TEST OF WAVE UNION TDC B

A test module, the FAST\_TDC module has been built for the Test Beam/Fast Timing project at Fermi National Accelerator Laboratory. The photograph of the module is shown in Fig. 11.

The FAST\_TDC module is a 6U VMEbus board with two types of analog input stage schemes that accept four input channels each. One scheme that has been tested is a single ended analog pulse to differential logic signal converter using

a differential amplifier (AD8351, Analog Device Inc.). Differential signals from the input stage are received by an Altera Cyclone II FPGA device (EP2C8T144C6), in which 8 blocks of wave union TDC B as described in previous section The digitized time data are sent via are implemented. differential links serial to another FPGA (EP2C5Q208C7) for DAQ interface functions. The DAQ FPGA interfaces with a synchronous dynamic random access memory (SDRAM) device and stores the hit time data in it. The DAQ FPGA also interfaces to the Ethernet circuits so that data stored and processed in it can be output via Ethernet to a personal computer. An RS232 serial port is also implemented in the DAQ FPGA for operational control and monitoring purposes. An A24/D32 interface can also be implemented for DAQ systems using VMEbus.



Fig. 11. The FAST\_TDC module

The lower four channels of the BNC connecters on the module are configured to accept NIM inputs and to convert them to differential logic signals. An NIM pulse generator is used to generate pluses at about 10 KHz repeating frequency and via a LeCroy 429A logic fan-in/fan-out module two copies of the pulses are created. The time differences of the two input pulses are measured with the TDC module.

We have used BNC adapters shown in the photograph of the module to change relative delays. The inner conductor length of the BNC adapters is about 32mm and the propagation delay added by each adapter is approximately 140ps.

Histograms are generated in the DAQ FPGA and output as a BMP picture file via Ethernet which can be received on PC by a web browser such as Microsoft Internet Explorer.

Each NIM input is measured by a group of four TDC blocks and an average time of the four measurements is calculated. The difference of the two average times is used to book the histogram.

For each input pulse, an absolute time relative to the TDC clock derived from an on board 50 MHz crystal oscillator is measured. The absolute time of a single channel should have all possible values since there is no correlation between the input signal and the 50MHz crystal. However, the time difference of two NIM inputs should have correlation as

shown in Fig. 12.



Fig. 12. The histogram of time differences

The full scale in Fig. 12 is 1/4 of the 400MHz clock period, i.e., 2500 ps/4 = 625 ps. The histogram has 256 bins and the width of each bin is to 2.44 ps (=625ps/256).

The left-most peak (which partially rolls over to the right edge) corresponds to the time difference of two input NIM signals without any BNC adapters. The second peak from left corresponds to one BNC adapter inserted into one of the NIM signal and the third peak corresponds two BNC adapters inserted, respectively, as shown in the photograph of the module. As expected, the separation between peaks is about 140ps.

Each peak contains 16K events and the RMS resolution of each peak is about 10ps. The peak is close to but not exactly a Gaussian shape because there are only limited small error sources. The tail of the distribution won't extend to infinity, which is a non-Gaussian feature in a good way.

# VI. SUMMARY AND DISCUSSIONS

Using multiple measurement method to subdivide ultra-wide bins and to improve resolution in carry chain delay based FPGA TDC has been studied. Parameters of several TDC schemes are compared in Table I.

TABLE I
PARAMETERS OF SEVERAL TDC SCHEMES

| Device: EP2C8T144C6, Price: \$28 (April 2008),          |         |        |       |       |        |               |
|---------------------------------------------------------|---------|--------|-------|-------|--------|---------------|
| Operating Frequency: 400MHz, Total Logic Elements: 8256 |         |        |       |       |        |               |
|                                                         | Max bin | Av bin | ΔΤ    | Dead  | Delay  | Logic         |
|                                                         | width   | width  | RMS   | Time  | Chain  | Element       |
|                                                         |         |        | error |       | Length | Usage         |
| Un-calibrated TDC                                       | 165ps   | 60ps   | 58ps  | 2.5ns | 64     | 1621<br>(20%) |
| Plain TDC                                               | 165ps   | 60ps   | 40ps  | 2.5ns |        |               |
| Wave Union TDC A                                        | 65ps    | 30ps   | 25ps  | 5ns   |        |               |
| Wave Union TDC B                                        |         |        |       | 45ns  |        | 6851          |
|                                                         |         |        | 10ps  |       |        | (83%)         |
|                                                         |         |        |       |       |        | 8 CH          |

It is not easy in FPGA to use analog method to compensate the delay variation due to change of temperature and power supply voltage. So a digital "after-the-fact" calibration approach becomes a natural choice. With a handy calibration scheme, the wave union TDC A design is done by concentrating on subdividing the ultra-wide bins, without worrying about subsequent complexity of DNL.

It should be possible to bring benefits of the digital "after-the-fact" calibration approach to the ASIC TDC design practice. In ASIC TDC design, if the digital calibration approach is taken, delay cells can operate at full speed with rail-to-rail power supply instead of using control transistors to slow them down for delay adjustment. Usually, non-inverting delay cells which have two inverter delays are chosen to avoid the even-odd bin DNL. If the DNL can be calibrated after digitization, simple inverters with shortest propagation delay can be used as delay cells.

The scheme of wave union launcher can also be carried to ASIC TDC. Interleaving in TDC usually is done by using two delay chain/register array structures. Using wave union launcher, it is possible to make multiple measurements in the same structure to save silicon area and power consumption resources.

# VII. CONCLUSION

Several improvements for carry chain delay based FPGA TDC have been studied. Measurement resolution of 10-ps between two NIM signals has been achieved, which permits FPGA TDC to be utilized in the area such as time-of-flight measurement applications.

Cross transplanting between FPGA and ASIC TDC has been beneficial to both areas. The delay chain architecture was first explored in ASIC TDC and now used in FPGA TDC. On the other hand, digital calibration approach and wave union launcher scheme studied in this work could also help future ASIC TDC developments.

# ACKNOWLEDGEMENT

The authors would wish to express thanks to Mike Albrow, Erik Ramberg, Anatoly Ronzhin, Robert DeMaat, Sten Hansen, Rajendran Raja, Holger Meyer of Fermilab, Fukun Tang, Henry Frisch, Jean-Francois Genat, Chien-Min Kao of University of Chicago University and Qi An of Science and Technology of China for their helpful inputs over years.

### REFERENCES

- [1] A. Amiri, A. Khouas & M. Boukadoum, "On the Timing Uncertainty in Delay-Line-based Time Measurement Applications Targeting FPGAs," in *Circuits and Systems*, 2007, IEEE International Symposium on, 7-10 27-30 May 2007 Page(s): 3772 - 3775.
- [2] J. Song, Q. An & S. Liu, "A high-resolution time-to-digital converter implemented in field-programmable-gate-arrays," in *IEEE Transactions* on *Nuclear Science*, 2005, Pages 236 - 241, vol. 53.
- [3] M. Lin, G. Tsai, C. Liu, S. Chu, "FPGA-Based High Area Efficient Time-To-Digital IP Design," in TENCON 2006. 2006 IEEE Region 10 Conference, Nov. 2006 Page(s):1 – 4.
- [4] J. Wu, Z. Shi & I. Y. Wang, "Firmware-only implementation of time-to-digital converter (TDC) in field programmable gate array (FPGA)," in *Nuclear Science Symposium Conference Record*, 2003 IEEE, 19-25 Oct. 2003 Page(s):177 181 Vol. 1.
- [5] S. S. Junnarkar, et. al., "An FPGA-based, 12-channel TDC and digital signal processing module for the RatCAP scanner," in *Nuclear Science*

- Symposium Conference Record, 2005 IEEE, Volume 2, 23-29 Oct. 2005 Page(s):919 923.
- [6] M. D. Fries & J. J. Williams, "High-precision TDC in an FPGA using a 192 MHz quadrature clock," in *Nuclear Science Symposium* Conference Record, 2002 IEEE, 10-16 Nov. 2002 Page(s):580 - 584 vol. 1
- [7] Altera Corporation, "Cyclone II Device Handbook", (2007) available via: {http://www.altera.com/}
- [8] R. Pelka, J. Kalisz & R. Szplet, "Nonlinearity correction of the integrated time-to-digital converter with direct coding," in *IEEE Transactions on Instrumentation and Measurement*, Volume 46, Apr. 1997, Page(s): 449 - 453.