The Big Reveal

We gathered and tracked PHQ-9 test results from forty of our clients. Did we improve their mood—or did we just collect a lot of paper?

In our first blog entry on this topic, we talked about why we believe tracking treatment outcomes in numbers with reliable tools, such as the PHQ-9, is key to providing a client with the best chance at achieving his or her treatment goals. Now you may be thinking, do these people really think that they can derive meaningful information on something as complex as a person’s psyche from a 9-item questionnaire? Yes—and no. Yes, because the questions on the PHQ-9 reflect criteria necessary to diagnose depression determined by the Diagnostic and Statistical Manual 5, a guide used by all psychiatric providers. They’re designed to accomplish two tasks: indicate whether a person is experiencing depression symptoms, and if so, assess the symptom severity by assigning them a numerical value. The higher the PHQ-9 score, the higher the frequency and severity of depressive symptoms. A score of 0-4 usually indicates that the person is minimally depressed and may not need treatment. A score between 20-27, on the highest end of the spectrum, indicates severe depression, which may require medication and, if appropriate, psychotherapy.

So where does ‘no’ come in? If we based our treatment plans on questionnaire data alone, a client with a score of 3 could conceivably be turned away without treatment, and a client with a score of 27 could end up in the nearest psychiatric emergency room. We believe that using data intelligently requires far more than collecting and interpreting a score, no matter how reliable the questionnaire. The difference between assigning diagnoses and providing quality psychiatric care lies in thoroughly assessing the person behind the number, not just the number itself. For example, if a client has a score of 3 and has reported that he has trouble sleeping, has little energy, and is not concentrating well several days a week, he may have a sleep issue, an anxiety issue, or a troubling life transition—all of which benefit from psychiatric care. Similarly, his score could also mean he's having terrible allergies or his mother-in-law has staged a surprise 10- day-visit. No 9-item questionnaire can tease out that kind of detail, nor should it. From a best practice perspective, the PHQ-9 represents a statistically reliable snapshot of how a client's mood has fared over the previous two weeks. It's neither a time machine nor a crystal ball; it can’t tell us if a person is suffering the aftermath of an abusive childhood, struggling with new parenthood, or questioning the value of his chosen career. But it is a way to start a conversation about how that person can feel better, and how we can help that happen.

The graphs below represent 40 randomly selected Pondworks clients 18 and older who have completed at least two PHQ-9s. This sample represents a mix of genders, ethnicities, races, religions, ages, professions, and diagnoses. Clients in the sample may be taking a variety of medications, engaging in supportive psychotherapy, or combining the two. To ensure that we we’re holding our data to appropriate scientific standards, we used vetted methods and formula that meet standards for basic statistical validity by major research institutions.

Figure 1

Figure 1 tracks changes in client score between two PHQ-9 tests, one given at the beginning of treatment, the second between 60 and 119 days. The X axis represents the range of scores collected from the 40 clients. The blue bars indicate how many clients in the sample reported that particular score in their first test, and the Y axis expresses those client numbers.

So what do the numbers say? Lower PHQ-9 scores equal fewer symptoms. Per the graph, our study sample reflects a marked downward shift in scores between test 1 and test 2. Also of note is that no one in the sample reported any severe depression symptoms on their second test. The mean or average score for all clients on their first test was 11. For the second, it was 6, resulting in a 55% average reduction in score between the two tests. Similarly, the median score (the number that falls exactly in the middle of a range of numbers) on test 1 was 10. For test 2, it was 5, resulting in a 50% reduction in score.

According to the provider interpretation guidelines on the PHQ-9, clinically significant change is defined as a post-treatment score of less than or equal to 9 combined with improvement of 50%. Nineteen clients (47.5% of total sample) reported a score of 9 and under on their first test. That number increased to twenty-nine (72.5% of total sample) on test 2. Both the median and the mean between test 1 and test 2 demonstrated a score reduction of greater than 50%. After running the data through a two-tailed T-test, we confirmed that there was a statistically significant separation of scores between tests 1 and 2. Translation: We’re doing something right, and that’s great news. But as much as we'd like to sit back and bask in the glow of our statistical success story, there's more work to be done. In our next blog installment, we scrutinize our methods to determine how meaningful those results really are.

