Bias and Errors in Research
No one study is absolute proof of a hypothesis. It takes multiple studies with increasingly stronger designs to provide evidence of relationships. However, the study design is only one element in determining the quality of a particular study. In addition to the type of evidence the article provides—cross-sectional, longitudinal, or experimental—one must consider the overall validity of the study and reliability of the measurements used in the study. Despite researchers’ best efforts, it is impossible to produce a perfect study. The results of a study are limited by who the participants were and how the data were collected. The following section highlights a few of the most common problems found in research. When reading articles, one should ask how well the author controlled for these biases and errors.
Validity and Reliability
To evaluate the quality of a study, one must understand the difference between validity and reliability. If a test or scale measures what it was intended to measure, it is a valid measurement tool. Data produced by valid tools or tests are considered accurate, or true. Clinical measures are considered valid if they are close matches to the gold standard way of measuring a variable. For example, the 12-minute run is a valid test of cardiorespiratory fitness because there is a strong correlation between athletes’ measured maximal oxygen consumption and their performance on the run. This is called criterion validity. Measures that cannot be validated by comparing to a criterion may still be valid if they are logical and based on well-established evidence. These measures have what is known as face validity and construct validity.
Reliability is sometimes confused with validity. A test or measure is reliable if it produces similar results time after time on the same participants under the same conditions. Consider a clinical skill, such as consistently grading manual muscle tests. The ability to apply the 0-to-5 scale fairly over and over is the reliability. Practice will improve one’s intertester reliability on clinical measures, meaning that one has more consistency when taking measures. Intratester reliability is how one compares to another evaluator. Research studies often need several people to take the same measures on a sample. Ensuring that the intratester reliability is high adds credibility to the data. The validity and reliability of a measure can be determined using statistics. These correlations test how well the data match a criterion (validity) or a repeated test using the same measure (reliability). Some examples include standard error of estimate, Cronbach’s alpha, interclass correlations, and kappa coefficients.
Threats to Reliability
In studies that test the effect of a treatment on a dependent variable, it is critical that the dependent variable be measured reliably. If the measurement tool is not reliable, the researcher cannot say for sure that a change in the dependent variable was due to the treatment. It could be the result of the instability of the measurements. This is why studies should only be conducted with measures that have proven reliability in the specific population (age, gender, athletic/nonathletic) with which the researcher is working.
Even when a test or survey is reliable, a study’s procedures could reduce that reliability. For example, if a list of words is given to participants for a memory test and the same words are used on multiple occasions, it is likely that the participants will have learned the words. The improvement of participants over time in this case was not due to a treatment but, rather, due to exposure to the test. This is called a learning effect and is common with physical skills or tests that require comprehension or math. Researchers can reduce learning effects by changing the order of tests in the assessment, increasing the time between data collections, or using different versions of validated tests. Poor inter- and intratester reliability is a threat to reliability. Authors should report statistics that measured the reliability of evaluators and describe what procedures were used by the evaluators to insure consistency. Another common issue for reliability is participant fatigue. Especially with survey measures or tests that require physical exertion, participants’ attention, interest, or motivation may decrease if the testing is too long. Ways to minimize fatigue include breaking long data collection sessions into smaller sections, allowing rest breaks between tests, and scheduling data collection early in the day.
A test or survey is valid if it measures what it was intended to measure (ie, it is accurate). An entire study is considered valid if it really tested what it claimed to be testing. This is called internal validity. A research study lacks internal validity if there are problems with participants, reliability, validity of measurements, or uncontrolled variables. Sample bias is one of the most difficult aspects of internal validity for a researcher to control. Recruiting and keeping participants in a study is time consuming, and who participants are and how they were recruited plays a large part in how useful the data will be. Health care studies tend to attract volunteers who have an interest in the topic. They may have prior knowledge or beliefs that could influence their compliance or the outcome. For example, if a doctor is seeking participants for a study on the effects of surgery on regaining function, the length and degree of disability someone has experienced may make them more or less willing to participate. It is likely that those with longer histories and more complications will consider the surgery vs a patient who may do well with the surgery but feels that he or she should explore other options first. Particularly for studies that involve physical therapy, participants will assume that the exercise should be improving their scores and will give better effort to prove it or will rate their perceptions of pain or benefit more positively. The Hawthorne Effect occurs if participants are influenced by the attention given to them rather than the actual effect of the intervention. Another problem comes from participants who do not finish the study. Potentially, something about the people who drop out is different than the ones that stay in. It could be that they have more or less severe injuries, they do not think there was any benefit to the treatment, or they were less motivated to comply with the procedures. And, when participants leave control groups at higher rate than they do the treatment group, it makes comparisons difficult because the groups are less similar than they were at the start. Social desirability is also a threat to internal validity. When participants provide self-reported data on health measures, such as exercise or diet or following medical orders, they tend to put themselves in the best light possible. They may not be consciously lying but have selective memory for how well they are doing with these behaviors. This is a key weakness of any study relying on survey data as the only measure of a dependent variable. A stronger method is to combine surveys with objective data (eg, measuring pain with a scale and counting the number of pain pills taken since the last visit).
Weight Training Exercises For Weight Loss Photo Gallery
Click on Photos for Next Weight Training Exercises For Weight Loss Gallery Images
An experiment or intervention with high internal validity has controlled for other factors that could change the dependent variable. These other factors are called confounding variables. A confounding variable is something besides the independent variable that could explain why a change occurred. If all confounding variables are not considered by the researcher beforehand, the conclusion is less reliable because there could be another cause-and-effect relationship besides the one between the independent and dependent variables. Figure 3-19 illustrates the relationship between the 3 variables. Imagine you are doing a study to see which treatment is better for delayed-onset muscle soreness pain—ice or heat. You measured the pain level of your participants before and after a 10-minute treatment. The treatment groups were randomized, and all participants had similarly high levels of pain 24 hours after the eccentric exercise. After treating one of the participants, he says to you that he does not think the ice or heat mattered as much as the ibuprofen he took this morning. You are panicked now because you realize that you did not control for any treatments the participants might have tried on their own. Without tight control over the confounding variables, you cannot say for certain that any decrease in pain was due to the treatment you provided.
There are 2 ways to control for confounding variables. But first, the researcher must identify as many factors that could impact the dependent variable as possible when planning the study. Then, they can use procedures that would minimize the effect. For example, in the delayed-onset muscle soreness study, you could have given participants clear instructions about what they could and could not do during the study. You could have double-checked this by asking participants prior to the treatment if they had taken anything for pain. The other option for controlling the influence of a third variable is to measure it during the study and use a special statistical analysis called analysis of covariance. This test allows you to see the effect of the independent variable on the dependent variable if the confounding variable is kept stable. This analysis is commonly used in epidemiology because there are many factors that could influence people’s health. Age, gender, diet, education, occupation, level of physical activity, and medications are a few examples of confounding variables measured by researchers. If a confounding variable is found after the data has been collected, there is little that can be done about it, and the researcher has to acknowledge the error in their discussion.