# Flexible digital front-end for high resolution PET scanner

Pedro Guerra, Jose Luis Rubio, Juan Enrique Ortuño, Georgios Kontaxakis, Maria Jesus Ledesma and Andres Santos

Abstract—A new high-resolution low-cost animal positron emission tomograph (PET) is under development. Latest advances in flexibility and size of modern FPGAs allow for the replacement of the analog gamma-ray detection electronics by their digital counterpart, enabling a new framework where optimal approaches to the  $\gamma$ -event detection are possible. In particular an optical optimum filter is implemented for time stamping, and timing resolution is estimated based on a accurate modeling of the detection and processing electronics. We have assembled a prototype for a flexible PET front-end that includes data acquisition, data streaming and slow control. This prototype has been used to demonstrate overall system functionality and estimate streaming capabilities of the frontend microcontroller.

*Index Terms*— High Resolution Positron emission tomography, Digital front-end electronics, Real Time Signal Processing Architectures.

## I. INTRODUCTION

**P**ositron emission tomography (PET) is an image modality whose main applications are in the fields of oncology, cardiology and neurology. In contrast with other well known modalities, such as conventional Magnetic Resonance (MR) or X-ray computer tomography (CT), the generated image is connected to the functionality of the tissue cells and not to their structure. This has been an important breakthrough in oncology, where it has proved to be a powerful tool for the detection of tumors at early stages, when their only manifestation is in the form of biochemical disorders.

The underlying technology combines high energy physics with advanced signal processing techniques in order to generate images of clinical use. Patients are injected with a drug labeled with a short live radioactive component, such as fluorine or oxygen, where the drug determines the actual chemical path that will be visualized and the radioisotope provides the means to image such paths. In case of PET, the isotope disintegration produces a positron, an antimatter element, that will annihilate with an electron generating two high energy gamma rays. These emitted quanta propagate in opposite directions due to momentum conservation, thus defining a line of response (LOR). The supporting electronics identify these pairs of gamma rays as single events that impact on opposite detectors, also called gamma camera, within a very short time window, in the order of a few nanoseconds. The detection of millions of these coincidences from different points of view provides a measurement of line-integral activity, opening the door to the application of tomographic reconstruction algorithms [1].

Traditionally, acquisition front-end electronics, which are responsible for the gamma event detection and characterization, relied on an ASIC to do most of the work [2-4], where an FPGA in combination with a microprocessor assumed mostly control duties. However in the last years FPGAs have grown in capacity and currently the trend is towards the replacement of analog processing of the scintillation pulse by its digital equivalent implemented on a FPGA.

In this work we describe a prototype of the acquisition front-end under developed, being the final aim the design and assembly of a compact, low-cost and flexible detector for small animal imaging. Currently we assume a detector consisting of single scintillation layer attached to a position sensitive photomultiplier (PS-PMT) and Anger readout, although the interface will be flexible enough to accommodate other configurations.

This paper is structured in four sections. The first section presents the CAD tools that have been used for this work and the material employed for the assembly of the prototype. The second section describes the main components of a generic tomograph and describes the different elements of the implemented prototype, which comprises ADC acquisition, processing FPGA, a microcontroller and a remote PC, plus the corresponding SW. The third section deals with time stamp generation, a crucial topic for proper coincidence detection. This work closes with the conclusions and on going work.

#### II. MATERIAL AND METHODS

## A. Software Tools

The pulse detection core, described in VHDL, has been optimized and verified through cosimulation with Modelsim SE(Mentor Graphics, Wilsonville OR, USA) and Simulink 5.0 (The Mathworks, Natick MA, USA) using the software package *XtremeDSP*® from Xilinx. Simulink has been used to provide realistic input stimuli to the VHDL simulator

Manuscript received April 9, 2005. This work has been supported by the Spanish Education and Science Ministry through the FPU Research Grant program, by the Spanish Thematic Network IM3 (G03/185) and by the project TEC2004-07052-C02-02.

P. Guerra, J.L Rubio, J.E. Ortuño, G. Kontaxakis, M.J. Ledesma, and A. Santos are with the Universidad Politécnica de Madrid, at the Biomedical. Image Tecnology Lab, ETSI. Telecomunicación, Ciudad Universitaria s/n, E-28040 Spain, phone: +34915495700-4242. e-mail: pguerra@die.upm.es

through the detailed modeling of the analog elements of the front-end (crystal layers, PS-PMT, analog electronics and ADCs). Several scenarios have been described with Simulink to evaluate different aspects of the algorithms and the verification has been fully automated with a sequence of Matlab scripts [5].

The control microprocessor has been programmed with Dynamic C v.9.0 (Rabbit Semiconductor, Davis CA, USA). This is an extension to standard C that includes constructs for cooperative and preemptive multi-tasking as well as protecting writes to variables during power failures, facilitating thus real-time programming on embedded systems.

The front-end device is remotely interfaced via TCP/IP with internal tools developed with Borland Builder 6.0 (Borland Software, Scotts Valley CA, USA).

## B. Hardware

The DSP core processes the data acquired at 40MHz by an 8-channel free-running ADC from Texas Instruments (Dallas, TX, USA), gating and characterizing single gamma events at a maximum count rate of 2Mcps. These events are queued and read by the microprocessor through its IO expansion bus [6,7] and ultimately sent to the host computer through an UDP connection.

A very preliminary prototype of the complete system has been assembled using development kits from different vendors, in order to debug the developed SW and validate the HW/SW integration:

- RM3200 from Rabbit Semiconductors, a module based on the popular 8-bit microcontroller RM3000 with Fast-Ethernet interface.
- Digilab 2SB, a module from Digilent Inc. based on the Xilinx Spartan2-200k.
- ADS5120EVM, an evaluation board from TI (Texas Instruments, Dallas TX, USA) for the ADS5120, an 8-channel 40MHz ADC.

#### **III. SYSTEM ARCHITECTURE**

In its most simple form the PET tomograph consists of two detection heads, one physically opposed to the other, which rotate around the object of study, as it is shown in the figure 1. In order to be able to discriminate between coincident and non-coincident events, it is required that all acquisition subsystems synchronize to a common clock and also to a common time reference.

The gamma rays emitted by the radioactive source are trapped by a high density material, usually a scintillator crystal, producing a short flash of light which is amplified, with a photomultiplier (PMT) or an avalanche photodiode generating an electrical pulse that is related to the time and the actual crystal of interaction.

In the case of position sensitive photomultipliers (PS-PMT) there are multiple readout anodes available. Each readout anode produces an electric pulse that is related to the light fraction collected by a sector of the PS-PMT cathode, as it is shown in Figure 2. Although individual acquisition of each anode is feasible, the usual approach is to combine the PS-PMT outputs with a resistive network and use some variant of the Anger equations to extract the interaction position [8, 9].

The system architecture described is this section has been selected in order to clarify concepts, although our aims is to design a flexible and scalable front-end that can be adapted to more complex geometries and detection strategies, such as those described in [10, 11].



Fig. 1: Diagram of a simple tomograph with synchronized clocks for coincidence detection and a rotating gantry. A stepper motor provides the rotation of two detection heads. Each head consists of scintillator material, a PS-PMT powered by a high voltage (HV) source, acquisition electronics and the communication interface.



Fig. 2: The scintillation light pulse from a given crystal irradiates an area of the PS-PMT surface, which includes several photocathodes. Each of these collects and amplifies the energy, giving away a current pulse through its anode that is function of the impact location and PS-PMT properties



Fig. 3: Overall subsystem architecture. Data acquired by the ADC is processed by the FPGA producing results that are transferred to an external computer with the support of the microcontroller.

#### A. Subsystem Architecture

In our implementation, as it is shown in figure 3, we assume that PS-PMT outputs are reduced to 4 Anger signals  $(X^+,X^-,Y^+,Y^-)$  which are sampled at 40MHz. The input data stream is processed on a sample by sample basis in order to identify and characterize individual scintillation pulses. Each detected pulse generates a short packet of data that is stored in a queue. Once there are enough events waiting to be transmitted, an interruption is generated triggering the

execution of the corresponding handler that reads the data out of the queue and sends it to the host computer.

## B. FPGA DSP module

The detection algorithms implemented on the FPGA are relatively simple and follow the classical approach described in the literature. The block comprises the following submodules (figure 4):



Fig. 4: Block diagram of the implemented algorithm

- ADC control, which generates the proper signaling to the external ADC, and transforms its output to a standard two's complement representation. It includes logic for self test and debug.
- Pre-processing, which includes polarity correction and base line restoration (BLR). Polarity correction inverts the input pulse in case of negative polarity and restores the base line. Base line deviations occur when the activity is very high, although its impact is more severe with analog processing, where gated integrators do not fully discharge, than with digital processing, where reset to 0 is cost-less.
- Pulse detection, which is based on the comparison of the instantaneous energy to an externally programmed reference. Once the energy crosses this reference a gating signal of programmable length is generated together with other synchronization signals.
- Delay line Z<sup>-n</sup>, needed to align the gating window with the incoming signals.
- Integration block, which computes the numerator and denominator of the Anger expressions based on the gating signal.
- Timing block, which computes an accurate time stamp for the input pulse, with an estimated resolution of a few nanoseconds, where the actual value depends on the scintillation crystal used in the detection.

These processing blocks, described in VHDL, have been synthesized into a 200k Xilinx Spartan2e FPGA, being the final floorplan the one shown in figure 5. The implemented design may run up to 55MHZ, according to the timing reports; however design functionality has been tested with a 40 MHZ clock rate. As it is shown in figure 5, the *Timing Module*, whose functionality is described with detail in section IV, is the largest block of the design and consists of a FIR filter, a fixed-point divisor and some additional control logic.

## C. Embedded SW description

The RM3200 provides seamless communication between the acquisition FPGA and the remote controller. The embedded software consists on three concurrent threads:

- TCP server for control
- UDP client for data streaming
- Error management

The TCP server provides the commands to control and configure the different user registers of the FPGA, as it is shown in figure 3. On the other hand the UDP client is signaled by the interruption handler to transfer data from the FPGA queue to the computer's disk. The error thread is used for error reporting.



Fig. 5: Floorplan of the implemented design.

Hardware and software functionality as well as integration has been thoroughly tested. Close attention has been paid to the microcontroller behavior under very high detection rates, in order to guarantee the control thread responsiveness when the processor utilization is very high. Critical routines have been optimized in assembly code.

The peak streaming bandwidth from the board to the PC has been measured, resulting in around 2Mbps, or 12.5Kcps, which is compatible with the results published for a similar setup [12].

## D. PC's SW description

The application running on the PC is the counterpart of the RM3200 SW, so that we have two main threads:

- TCP client, which issues control commands
- UDP server, which accepts datagrams from the acquisition module and store streaming data in the hard disk.

The control commands include the following:

- Acquisition start/stop
- FPGA register read/write, that gives support to device configuration
- File download/upload for future extensions of the embedded SW that would provide a downloadable web interface to the user.

# IV. TIMESTAMP GENERATION

Accurate time stamp generation is crucial in order to properly classify detected events. Most PET scanners rely on a mixed-signal time-to-digital converter (TDC) [4,13]. In the progress of digitalization, the current trend is to replace the external device with digital signal processing, to such an extend that nowadays the generation of the timestamp exclusively by digital means is an active topic of research [14-16].

In order to achieve a time resolution higher than the sampling period some form of interpolation is required and also a criterion to define the start of the pulse is needed. Criteria based on the absolute energy are simple to implement but suffer from time walk due to signal amplitude. On the other hand, correlation based methods are more robust at the cost of higher complexity.

Out approach has been to implement a digital version of the constant fraction discrimination (CFD) method. This method works on the absolute energy E or a shaped version  $\hat{E}$  of it,

$$\tilde{E}(t) = h_s(t) \otimes E(t) \tag{1}$$

and it basically consists on looking for the zero crossing of the signal that results after subtracting a delayed fraction of the signal to the original one, which is just basically looking for the zero crossing of the signal when applying the filter  $h'_{s}(t)$ .

$$\hat{E}'(t) = h_s(t) \otimes (E(t) - f \cdot E(t - T))$$

$$\hat{E}'(t) = h'_s(t) \otimes E(t)$$
(2)



Fig. 6: Output of the implemented filter for a simulated LSO pulse. The actual zero crossing is estimated via linear interpolation.

On the analog domain the implementation of CFD, which is usually the main building block of the TDC, imposes practical limitations on the filter that we may consider. However on the digital domain it is feasible to implement a filter optimum in some sense. This view has been addressed by different authors but most of them neglect the fact that, due to the process of detection, the main source of noise is not Gaussian but Poisson. In this work the method described in [17] has been applied to design a robust timing filter. According to these authors, the optical optimum filter h (3) combines the knowledge of the expected signal  $\lambda$ (n), the dark current noise  $\lambda_o$  and the thermal noise  $N_0/2$  as follows:

$$h = \frac{\lambda(n)}{\lambda(n) + \lambda_o + \frac{N_0}{2}}$$
(3)

As time stamp estimator, the value  $\hat{\tau}$  that maximizes the convolved signal is taken,

$$\widehat{\tau} = \arg\max_{\tau} \{h * x\}$$
(4)

This can be rewritten as interpolating the zero crossing of the filtered signal (5), as it is shown in figure 6.

$$\widehat{\tau} = \arg_{\tau} \{ h' * x == 0 \}$$
<sup>(5)</sup>

In the timing module, the zero crossing detection triggers the interpolation that computes the actual fractional part of the time stamp. If a zero crossing is not found within a certain window relative to the pulse detection trigger a error is generated to indicate that the input pulse is probably corrupted due to pulse pile-up.

We have used the front-end models and simulation platform described in [5] to compare the properties of the estimated time stamp obtained with the proposed filter and with the one described in [18]. As it is shown in figure 7, the proposed method has much higher linearity, fact that enables direct comparison of time stamps obtained by different detection modules. On the other hand the alternative method has some limitations which are already pointed out by the authors in their original document [18].



Fig. 7: Comparison of the proposed filter estimations vs. the estimator used in the ClearPET. A known delayed is introduced to the signal and the estimator value is computed for real LSO signals.

This higher linearity of the timing translates, as it is shown in figure 8, into a lower estimation error, which enables reducing the coincidence time window to 5 ns in an LSO detector. Such reduction has a positive impact in random coincidence rejection and overall image quality.



Fig. 8: LSO Single timing resolution for the implemented HW with simulated LSO pulses as input. Estimations are unbiased and limited to  $\pm 2ns$ . This enables reducing the coincidence time window to 5 ns.

Moreover equation 3 shows that the optimum filter is a ratio where both numerator and denominator depend on the shape of the scintillated pulse. Out of this observation, we can conclude that sample integration is a valid energy estimator when noise is low and we could expect that the filter were robust to pulse shape variations. This last statement, which requires further study, would enable us using a single filter even when more than one pulse shape are expected, as it is the case of *phoswich* detectors.

#### V. CONCLUSIONS AND FUTURE WORK

A prototype of the digital front-end electronics of a PET detector has been designed, assembled and evaluated. Current state of the art detection algorithms have been implemented on the FPGA in order to detect and characterize pulses generated by scintillation crystals coupled to a PS-PMT. We have particularly addressed the problem of generating reliable time stamps with higher resolution than the time period, eliminating the need of an external mixed-signal device. The complete substitution of external devices and analog processing by its digital equivalent in the FPGA increases flexibility and enables module reuse in different detector configurations.

The implemented detection algorithms have been evaluated and verified using a battery of test benches that analyze the VHDL response to realist synthetic input stimuli obtained through detailed modeling of the acquisition chain

Every pulse produces a data packet that is queued within the FPGA and read by an external microprocessor that handles the streaming to the host computer. Streaming capabilities of the used processor have been measured and, as expected, the functionality is correct but the current solution it is not able to handle the huge amount of data generated by the detector in operating conditions. Therefore we plan to extend the detection module with a dedicated path for data streaming. This communication channel will be independent of the slow control and will be based on a HW implementation of the TCP/IP protocol.

We have also started the design of a new platform that integrates the controller inside the FPGA, solution that increases design flexibility and communication bandwidth. A real time operating system will run on top of FPGA resources, providing an additional level of flexibility.

## ACKNOWLEDGMENTS

The authors want to thank Xilinx Inc. for the generous donations of software and hardware through the Xilinx University Program. The authors wish to thank M. Desco and J.J. Vaquero from the Hospital General Universitario Gregorio Marañon for their fruitful comments and motivating discussions.

#### REFERENCES

- J. M. Ollinger and J. A. Fessler, "Positron-emission tomography", IEEE Signal Processing Magazine, vol. 4, pp. 43-55, 1997.
- [2] D. F. Newport and J. W. Young, "An ASIC implementation of digital front-end electronics for a high resolution PET scanner", *IEEE Transactions on Nuclear Science*, vol. 40, pp. 1017-1019, 1993.
- [3] D. M. Binkley, et al., "A Custom CMOS Integrated Circuit For PET Tomograph Front-end Applications" at IEEE Nuclear Science Symposium, pp. 867-871, 1993.
- [4] J. W. M. Young, J.C.; Lenox, M., "FPGA based front-end electronics for a high resolution PET scanner" at *IEEE Nuclear Science Symposium*, vol. 2, pp. 902-906 vol.2, 1999.
  [5] P. Guerra, *et al.*, "Modeling the acquisition front-end in high
- [5] P. Guerra, et al., "Modeling the acquisition front-end in high resolution gamma-ray imaging" at *IEEE Nuclear Science Symposium* [In CDROM], Rome, 2004.
- [6] Rabbit Semiconductor, RM3000 Microprocessor User's Manual. USA, 2002.
- [7] Rabbit Semiconductor, "TN227:Interfacing External I/O with Rabbit 2000/3000 Designs" Rabbit Semiconductor.
- [8] R. Engels, U. Clemens, G. Kemmerling, and J. Schelten, "High spatial resolution scintillation detector based on the H8500 photomultiplier" at *IEEE Nuclear Science Symposium*, vol. 1, pp. 692-695, Portland, 2003.
- [9] V. M. Popov, S.; Weisenberger, A.G., "Readout electronics for multianode photomultiplier tubes with pad matrix anode layout" at *IEEE Nuclear Science Symposium*, vol. 3, pp. 2156-2159, Portland, 2003.
- [10] J. E. Ortuño, J. J. Vaquero, G. Kontaxakis, M. Desco, and A. Santos, "Preliminary Studies on the Design and Simulation of High Resolution Small Animal PET Scanners with Octagonal Geometry" at *IEEE Nuclear Science Symposium*, pp. 2053-2057, Portland, 2003.
- [11] J. F. Butler, et al., "CdZnTe Detector Arrays For Nuclear Medicine Imaging" at IEEE Nuclear Science Symposium, pp. 565-568, 1993.
- [12] J. Imrek, et al., "Development of an FPGA-Based Data Acquisition Module for Small Animal PET" at *IEEE Nuclear Science Symposium* [In CDROM], Rome, 2004.
- [13] Acam-Messelectronic GmbH, "TDC-GP1:Functional description" Stutensee-Blankenloch, 2001.
- [14] A. Bousselham, C. Robson, P. E. Ojala, and C. Bohm, "A Flexible Data Acquisition Module for a High resolution PET Camera" at *IEEE-*NPSS Real Time Conference [In CD-ROM], Stockholm, 2005.
- [15] R. Fontaine, et al., "Real Time Digital Signal Processing Implementation for APD-Based PET Scanner with Phoswich Detectors" at IEEE-NPSS Real Time Conference [In CDROM], Stockholm, 2005.
- [16] D. Novak, et al., "Ethernet Based Distributed Data Acquisition System for a Small Animal PET" at IEEE-NPSS Real Time Conference [In CDROM], Stockholm, 2005.
- [17] E. Geraniotis and H. Poor, "Robust Matched Filters for Optical Receivers", *IEEE Transactions on Communications*, vol. 35, pp. 1289-1296, 1987.
- [18] M. Streun, et al., "Coincidence detection by digital processing of freerunning sampled pulses", Nuclear Instruments and Methods in Physics Research Section A, vol. 487, pp. 530-534, 2002.