Part I: Introduction
(GET THE LATEST RESEARCH NEWS - SUBSCRIBE TO THE NEWSLETTER )
Vernon, S. and Reeve, W. 2006. The challenge of integrating disparate high-content data: epidemiological, clinical and laboratory data collected during an in-hospital study of chronic fatigue syndrome.
This paper provides an introduction to the recent efforts by the CDC to produce a model for CFS. These efforts resulted in the simultaneous publication of 14 research papers in the Journal Pharmacogenomics in April, 2006. Almost 20 researchers, most of them working for free, analyzed a large data set gathered from a 2-day hospital visit by CFS patients and controls in Wichita Kansas in 2003.
Vernon and Reeves, the two lead investigators overseeing this effort, begin the paper by placing CFS in the context of other complex diseases as AIDS, asthma and cancer. Refreshingly they also, almost for the first time that I can remember, posit a link between the fatigue seen in CFS with that often seen in cancer. They posit that ‘complex’ chronic diseases such as these cause alterations in the homeostatic mechanisms in the body. The CDC proposes that these processes are so complex that they require a multi-disciplinary approach to study them.
The approach the CDC took was to take a fairly large set of CFS patients and controls, subject them to a wide variety of tests including gene expression tests and then basically give the data to four groups of investigators and asked them to come with models that would contribute to the classification, diagnosis and treatment of CFS.
For several years now the CDC has been using a new (and rather arduous) method for finding its study participants called community sampling. Designed to eliminate recruitment bias and provide ‘bullet-proof’ control groups, community sampling involves randomly telephoning large numbers of people in order to build a randomly recruited representative set of study participants. In this case the CDC first screened 56,000 people in Wichita, Kansas for CFS via telephone. Of these they took a core group of 7162 people with fatigue and evaluated them further via telephone and in a clinic over three years. After discarding those with exclusionary factors (another disease, most psychiatric disorders, idiopathic fatigue) they were left with only 70 people who had CFS.
Why is this number so small? In their effort to produce a truly representative study group the CDC appears to have uncovered a set of CFS patients not ordinarily seen in CFS clinics. A high percentage, for instance, had gradual onset (80%). One study found that a high percentage of these CFS patients did not meet the CDC criteria for CFS after three years. It’s possible the CDC has uncovered a population of CFS patients who, while they may or may not be well, are able, at least according to the CDC criteria, to transition more quickly out of the disease than the CFS patients that find their way to the clinics and centers. This is probably not an unforeseen finding; most chronic diseases have varying degrees of debilitation, the most severe enough, naturally, are seen more often by physicians. Most of the people who participated in these studies appeared to have CFS for quite a while; the average CFS duration for one group was 12 years.
Of the 70 people with CFS, 58 (83%) agreed to come to a hospital for a two day testing period. They were matched with 58 controls of the same sex, age, race and body mass index. The CDC also included 59 people who were fatigued but did not meet the criteria for CFS, 41 people with depression who met the criteria for CFS, and 39 people with depression and fatigue but not CFS. According to this paper 99 people with CFS participated in these studies.
CFS with Depression
Fatigued but not CFS
Fatigued Not CFS With Depression
When we look at the physical characteristics of the groups we see that the CFS group was quite ‘large’; about 85% of them were overweight with 43% being obese and 12% morbidly obese. Since obesity, in particular, morbid obesity is itself a risk factor for many problems this factor alone could have skewed the results. The CDC controlled for this factor, however, by having an equally heavy healthy control group (81% overweight). The median age was about 50, about 85% were women and the group was almost entirely white (95%).
How does this stack up to obesity rates in the nation as a whole? The CDC estimates that about 65% of US adults are overweight and about 30% are obese. This puts the overweight and obese rates in CFS at about 35% higher than normal – perhaps not a surprising finding given the debility, isolation and stress associated with CFS. The higher median age range for the CFS patients probably also contributed to the increase. One of the studies indicates this could partially reflect impaired metabolism in CFS patients. On the other hand Caucasians have a lower obesity rate than some minorities and this population was all white.
The tests are obviously an extremely important part. It is these data point, after all, that the investigators will use to come up with their models. Obviously the more data points the better – but there are financial limits; the CDC had to choose which data points they felt would contribute most to increasing our understanding of CFS.
Over two days all participants did the following;
Physical tests – temperature, height, weight and body mass index
Laboratory tests– Complete blood count (CBC) (c-reactive protein, ALT, SGPT, albumin, AP, AST, bilirubin, calcium, CO2, chloride, creatinine, glucose, potassium, TP, sodium, BUN); HPA axis – salivary cortisol, androstenedione, SHBG, testosterone, ACTH, DHEA, DHEA-S, T3, reverse T3, T4, TSH, insulin-like growth factor, estradiol and progesterone (women); Cytokines – TNF-a, IL-6, sR-IL-6, Catecholamines – norepinephrine, epinephrine, normatinephrine, neuropeptide Y. Mineralcorticoids - renin, aldosterone
Autonomic Nervous System Status – blood pressure, heart rate – lying down, standing
Medical Outcomes Survey Short Form – measures functional impairment,
Multi-dimensional Fatigue Inventory – general, physical, mental fatigue, etc.
CDC Symptom Inventory
Cambridge Neuropsychological Test Automated Battery – cognitive test measures short-term memory, patterns recognition, reaction time, etc.
Two night sleep study
Peripheral blood gene expression – 20,000 genes
(Gene polymorphism data – was not mentioned in this introduction)
There’s a lot here – the CDC called it ‘an exhaustive list’ of clinical, epidemiological and laboratory data. If one excludes the gene expression results, however, this was not the largest set of laboratory measures ever done on CFS patients in a study. A 1-week intensive twin study done by the Buchwald team at the University of Washington measured pathogen prevalence, immune factors, did brain imaging and sophisticated tests of orthostasis and aerobic functioning as well as a three day sleep study. The Buchwald study also lead to a slew of papers but most of the findings were not significant. A follow up study was done, but the results have not yet been published.
The larger size of this study apparently precluded such an intensive effort. This study was unique in several ways, however. It is the first attempt to integrate gene expression data and laboratory and clinical data to build a statistical model of CFS. That, indeed, must have been exhausting given the enormous amounts of data generated by the gene expression studies. Some of these studies are of a magnitude of complexity above that attempted before.
The laboratory data covered a wide array of neuroendocrine factors plus other markers believed to reflect the allostatic status of the cardiovascular, immune and other systems in the body. C-reactive protein, for instance, is used to assess the inflammatory status of the body in one study. While the laboratory data on neuroendocrine factors is extensive the data on the immune system and cardiovascular systems is not. The immune data mostly consists of pro-inflammatory cytokines known to interact with the neuroendocrine system. Some systems are not covered at all; there are no measures of oxidative stress, for instance, in these studies.
With the exception of the gene expression data much of this data is not new. Vernon and Reeves noted that studies in CFS (including the gene expression studies) have generally only uncovered ‘subtle’ perturbations occurring in different systems, in particular, the central nervous system, immune system and metabolism. Given this history the CDC did not expect to find other than subtle abnormalities in their laboratory data. They hoped, however, that an analysis of this large data set will reveal patterns of disruption that will differentiate CFS patients from controls. The CDC, at this point anyway, seems to believe the problem in CFS, as we know it today, is the result of multiple failures, some perhaps subtle, that combine to create the illness known as CFS. One of the studies suggests that CFS may occur when certain levels of allostatic load are reached. The CDC’s choice of the neuroendocrine system to focus around indicates that they believe that the interactions in this large and complex system probably are central to the disease. This system regulates many of the processes that occur in the body including those of the immune system.
This is not the first time researchers have attempted to differentiate CFS patients from controls using common laboratory measures such as the complete blood count (CDC). In what was termed a ‘landmark paper’ by the editors of the Journal of Chronic Fatigue Syndrome, Suhadolnik et. al. found that that some CBC and immune measures including several involving the RNase L pathway were able to differentiate CFS patients from controls (click here) (Suhadolnik et. al. 2004)
Four teams analyzed the data. Their makeup was novel; the first team had computer science, physics and statistics experts as well as an immunologist and psychiatrist; team two had chemical engineering and bioengineering experts as well as an immunologist, pathologist and molecular biologist; team 3 had mathematics and computational chemistry experts as well as a cardiologist and an infectious disease specialist, and team four had mathematics and bioinformatics experts as well as an epidemiologist and pediatrician.
Each treated the data very differently. Team 1, which produced four papers, attempted to produce a classification of CFS that displayed its heterogeneity and then tried to match that with gene expression and genetic profiles. Team 2, which produced three papers, attempted to find the central factors that differentiated the four subgroups. Team 3, which produced two papers, used the symptom questions to determine the validity of the current CFS classification. Team 4 used the lab, gene and genetic data to differentiate the four different groups according to allostatic load.
A Different Approach
The CDC believes three factors can explain why our understanding of CFS has progressed so slowly over the past 20 years. They cite the standard 3000 peer reviewed papers but this figure is misleading in the sense that only about 20% of those are studies exploring CFS pathophysiology.
First they believe that patient recruitment from specialty and referral clinics results in ‘recruitment bias’. They believe that not only are different kinds of patients drawn to CFS clinics in general, but that each clinic has its unique set of patients. This could make comparing results between studies problematic. They went so far as to say that this approach, which has dominated CFS research, ‘precludes (a) critical comparison of results’, i.e. makes it impossible to critically compare results from one research group to the next. Strong words!
Secondly they believe the control groups so important in the study process have been mostly flawed; either they are not present or they are ‘controls of convenience’. Getting healthy controls can be difficult. Oftentimes they come from workers in a hospital or students.
Third they believe the process of diagnosing CFS (i.e. the definition) is flawed.
Most researchers would surely agree with all three points but might question the significance of the first two. They are standard problems in many research studies. It is possible, however, that they are accentuated in CFS. If CFS is full of subsets then it is possible that certain subsets could be drawn towards certain clinics. Patients with depression might be more likely to end up in clinics lead by psychologists, etc.
A consensus seems to gathering that the biggest problem in CFS research is a less than precise definition that allows for inclusion of subsets which end up obscuring research findings. One of the goals of the CDC’s approach was to ‘capture’ that heterogeneity and thus provide a better classification scheme for CFS. This can only be done when using large numbers of CFS patients and large data sets. One wonders if 99 CFS patients,41 with depression, was a large enough number to do so.
The CDC cast a rather wide net; their inclusion of patients with idiopathic fatigue meant they took a look at the causes of fatigue in general. Their inclusion of CFS patients with major depression – a subset of patients often excluded from research studies – further broadened their sample base. One could argue that a larger sample of just CFS patients would have aided them greatly in finding verifiable subsets. Oddly enough, however, the CDC appears to have been handicapped in this regard by their sampling protocol. In the end, if I am reading the data correctly, they only had 70 pure CFS patients that were eligible for this study.
The conclusion is fascinating. It says that the integration of different body system measures and clinical features will allow us to identify the subsets present in CFS and the disturbed physiological pathways at work in them; that it will demonstrate that CFS (and other illnesses) with disabling fatigue can be medically explained. The CDC believes that algorithms (complex formulas) that integrate multi-systemic data will produce an objective diagnostic marker, decipher the pathophysiology and create custom therapies for CFS patients.
Given the future tense used it’s obvious that none of the above were achieved in these studies; i.e. they didn’t identify verifiable subsets, they didn’t medically explain CFS, they didn’t give us a diagnostic marker, etc. Vernon and Reeves, however, believe the results indicate that they are on the right track and that given time they will. The present results were satisfying enough that they are embarking on new studies in Georgia to expand and verify them.
But does the CDC think they are on the right track? There is an odd disclaimer at the end of each Pharmacogenomics paper that I can’t remember seeing on other CDC authored papers. It was not found on any of Vernon’s gene studies or on her recent infectious mononucleosis study. It states that
‘The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency’
Are Vernon and Reeves somewhat out on a limb here? That the agency is not sure of this creative (and expensive) new approach to CFS? Or is it just standard agency boilerplate that the CDC tacks onto the more theoretical papers published by its researchers?
Suhadolnik, R. A., Peterson, D., Reichenbach, N., Roen, G., Metzger, M., McCahan, J., O’Brien, K., Welsch, S., Gabriel, J., Gaughan, J. and N. McGregor. 2004. Clinical and biochemical characteristics differentiating chronic fatigue syndrome from major depression and healthy control populations: relation to dysfunction of the RNase L pathway. Journal of Chronic Fatigue Syndrome 12: 5-35.