Use your browser's back button to choose another title or click here for a New Search.

How to Get the Article

 Email CTN Library (free)

Journal subscriber access




Bookmark and Share


From the CTN Special Issue of American Journal of Drug and Alcohol Abuse: Read the other articles here.




Zero-Inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial.

American Journal of Drug and Alcohol Abuse 2011;37(5):367-375. [doi: 10.3109/00952990.2011.597280]

Mei-Chen Hu, PhD, Martina Pavlicova, PhD, Edward V. Nunes, MD (all from Columbia University, GNY Node).

In clinical trials of behavioral health interventions, outcome variables often take the form of counts, such as days using substances or episodes of unprotected sex. Classically, count data follow a Poisson distribution; however, in practice such data often display greater heterogeneity in the form of excess zeros (zero-inflation) or greater spread in the values (overdispersion) or both. Greater sample heterogeneity may be especially common in community-based effectiveness trials, where broad eligibility criteria are implemented to achieve a generalizable sample. This article reviews the characteristics of Poisson model and the related models that have been developed to handle overdispersion (negative binomial (NB) model) or zero-inflation (zero-inflated Poisson (ZIP) and Poisson hurdle (PH) models) or both (zero-inflated negative binomial (ZINB) and negative binomial hurdle (NBH) models). All six models were used to model the effect of an HIV-risk reduction intervention on the count of unprotected sexual occasions (USOs), using data from a previously completed clinical trial among female patients (N = 515) participating in community-based substance abuse treatment (National Drug Abuse Treatment Clinical Trials Network protocol CTN-0015). Goodness of fit and the estimates of treatment effect derived from each model were compared. Results found that the ZINB model provided the best fit, yielding a medium-sized effect of intervention.

Conclusions: This article illustrates the consequences of applying models with different distribution assumptions on the data. Taken together, the data suggest the importance for any given data set of finding the most appropriate model for outcome data in order to arrive at the most accurate estimate of the effect of a treatment intervention. If a model used does not closely fit the shape of the data distribution, the estimate of the effect of the intervention may be biased, either over- or underestimating the intervention effect. Investigators designing clinical trials should be encouraged to hypothesize in advance the distribution of the outcome counts based on their knowledge of the population and the intervention being tested, as well as prior data. (Article (Peer-Reviewed), PDF, English, 2011)

Keywords: CTN platform/ancillary study | CTN protocol development | HIV/AIDS | Outcomes evaluation | Research design | Sexual risk behavior | Sexually transmitted diseases | Statistical models | Women | American Journal of Drug and Alcohol Abuse (journal)

Document No: 733, PMID: 21854279, PMCID: PMC3238139 (available 9/1/2012).

Submitted by CTN Dissemination Librarians, 8/23/2011.


Hu, Mei-Chen
Nunes, Edward V. mail
Pavlicova, Martina
NIDA-CTN-0015 www
Greater New York (Lead) search www
Florida Node Alliance search www
Ohio Valley search www
Pacific Northwest search www
Southern Consortium search www

dark blue line
Supported by a grant from the National Institute on Drug Abuse to the University of Washington Alcohol and Drug Abuse Institute.
The materials on this site have neither been created nor reviewed by NIDA.
Updated 8/2011 --
dark blue line