What is a variable? And what do the terms independent variable and dependent variable mean? This primer is geared to explaining the basics, with several examples and some exercises to help you test your understanding.
For some writing their dissertation, understanding the basic characteristics of concepts of independent and dependent variables is second nature. To others’ ears, these terms may sound like jargon--something that sounds important but remains confusing. In either case, understanding the meaning of these two terms is a prerequisite for quantitative research, whether your research eventually takes the form of a journal article, thesis or dissertation, or a term paper. Before discussing the differences between these two types of variables and how they are related to each other, it is important to understand what a variable is. Even that, however, can seem confusing, given the myriad ways that researchers employ and discuss variables. So let’s start from the beginning. (If you already know what a variable is and want to read about independent and dependent variables click here.)
Vogt (2005) states that a variable is “loosely, anything studied by a researcher” (p. 377). Within quantitative research, a variable is defined much more precisely. The two most common ways of thinking of a variable are from a numerical and certain experimental viewpoint (where only the variable's operational presence or absence is relevant). This article will be limited to the discussion of a variable from the first viewpoint. From an applied mathematical perspective, a variable is a measureable aspect of a thing, represented as a changing or potentially changing quantity. Four criteria are implied by this definition. First, there must be a phenomenon or object of interest. Second, that phenomenon or object must have an aspect. That aspect must be measureable, and finally, the quantity must have the potential to change or be characterized by difference (e.g., between men and women). The criteria for what constitutes a variable can be illustrated using a simple example: the measurement of temperature of water over time. This example is a variable because there is an object being measured--water. There is an aspect of that object--its temperature. The temperature is represented as a measureable quantity, and lastly, the temperature changes as it is heated; as the word suggests, it is “variable.” (Click here to test your understanding).
I’m sure you can come up with your own obvious examples of quantifiable variables: the height of a child as she ages, your own body weight, the cost of living in the United States over the last decade. If you think about it, each of these meets the criteria. Of course, in areas of quantitative research nobody is particularly interested in the temperature of heated water or an individual’s fluctuating body weight. What academics are interested in are those complex social and scientific problems, and a basic understanding of these problems involves an equally fundamental understanding of the four criteria.
While in the media, stand-alone variables are all over the place, in the social and behavioral sciences, stand-alone variables usually don’t say enough. Variables, for the most part, usually come in twos, threes, fours, and more. Researchers are more often interested in the interaction of variables. Does a certain management style lead to better team performance? How much of the gender-wage gap is a result of discrimination and how much is a result of work-force experience, the amount of education received, and occupational segregation? Which of multiple social or psychological variables correlates with an increase in homicide rates? When researchers study variables in combination, then they are able to tell their story.
The terms “independent” and “dependent” refer to the relationship between these two types of variables. The terms have meaning only with respect to each other. In the case of the dependent variable, its value or behavior is considered reliant, to an extent, upon the value of the independent variable but not the other way around. That is why it is considered “dependent.” The independent variable, on the other hand, is truly independent from the dependent variable. Its value does not change according to the value of the dependent variable.
A simple cause-effect example between two variables can illustrate the concept of dependence/independence. Imagine you are studying the water levels of a small lake in the mountains, and you predict the water levels of the lake in late spring are dependent on the amount of snowfall accumulated over the fall and winter months in the elevations higher than the lake. You hypothesize that the more snow the higher elevations receive, the higher the water level will be. Each spring, for several years, you measure water levels and snow levels.
In this example, water level is the dependent variable and snow level is the independent variable. Water level is dependent on the amount of snow. More snow causes more runoff which in turn, flows into the lake, causing the water level to rise. Conversely, the amount of snow is independent of the amount of water in the lake. That is, an increase in water does not magically cause more snow to fall.
Looking at the data out of context, it may be that labels of independent/dependent seem completely arbitrary. An increase in the amount of accumulated snow corresponds to an increase in the water levels of the lake and vice versa. Unless it is the case that an increase in water levels corresponds to a decrease in snow levels (perfectly plausible since not all precipitation is in the form of snow), we must return the non-numerical context to understand independence and dependence. Behind the idea of independent and dependent variables is the notion of, if not direct cause and effect, then at least influence or contribution. The independent variable is thought to influence or contribute to the value of the dependent variable.
Although not all indepedent and dependent variables are causal related variables, the notion of cause and effect can help clarify the idea of “independence” in the independent variable and “dependence” in the dependent variable. In the mountain lake example, cause-effect is fairly obvious, and therefore, it is relatively easy to understand what the independent dependent variables are. However, as Kalof, Dan, and Dietz (2008) state, “It can be hard to understand what variables are independent (causes) and what variables are dependent (effects) when we are reading research or thinking about the implications of theory” (p.36).
Take two somewhat related examples. In the first example, we are still dealing with a relatively simple scenario. Imagine you are studying the relationship between how much time a teacher spends with a student and that student’s gender. Which is the independent variable and which is the dependent variable? Might time spent influence gender? Or might gender influence time spent? More specifically, is it possible being male or female is somehow dependent on how much time a teacher spends with boys and girls respectively? Or might the time spent with boys and girls somehow be dependent on whether the student is male or female? Putting aside the very real possibility that gender does not matter or that there are other factors at play, the question of which variable should be understood as a dependent variable and which should be understood as an independent variable seems fairly clear. Gender could influence time spent, but time spent certainly does not affect whether someone is a male or female. Gender is the independent variable and time spent the dependent variable.
Now on to the more difficult example. Take time spent and student attitude. Is a student’s attitude in some manner dependent on the time a teacher spends with him or her? Or does the student’s attitude influence how much time a teacher spends with that student? It is not so obvious. Certainly, it is plausible that a student who receives less time and therefore less attention from a teacher might not have as good an attitude compared to the situation where that student had received more time from the teacher. But it is also plausible to think that a teacher would spend more time with students who display positive attitudes. In this case, either variable—attitude or time spent—could theoretically function as the independent variable or the dependent variable. How these two variables would be related, and therefore labeled, requires a more sophisticated statistical approach.
One helpful strategy Kalof, Dan, and Dietz (2008) suggest is to look at the time ordering of the variables: “If one variable is describing things that occur before the things described by another variable happen, then the first variable can be usually taken as the independent variable and the second as the dependent variable” (p. 37). To illustrate, imagine higher unemployment rates (independent variable) are thought to contribute to an increase in suicide rates (dependent variable). A signal that the unemployment rate is an independent variable would be that the years or months marking the data for unemployment rates would be earlier than the years or months marking the data for suicide rates. Employment rate data would precede suicide rate data because the underlying thought would be: Insofar as unemployment might trigger suicide, first, people become unemployed. Then they become distraught. Then they commit suicide.
Of course, data is often collected at the same time. Case in point: a researcher distributing a survey instrument in which items on the survey pertain to both the independent variable and dependent variable. In this case, even if you can’t answer the question, which data comes first? You can still ask yourself, which phenomenon comes first?
Certainly not all studies are interested in quantifying causality. This is
particularly true for correlational studies. Just because two
variables covary does not mean one variable causes the other. Other
criteria must be met for conclusions to be drawn about cause and
effect. More often with correlational studies, researchers are
interested in prediction rather than causation. In cases where the
researcher is interested in the degree to which one variable
predicts or helps explain the behavior of another variable without
reference to quantifying the degree of causality, the labels predictor variable
or explanatory variable can be used in place of
independent variable and criterion variable or
response variable in place of dependent variable. These alternative variable labels are based on similar notions of
independence and dependence. That is, the values of a given
response variable, say, will vary (or not) based on the values of
the predictor variable, the caveat being, such "dependent" variance
is not readily discernable as an effect. Thus, for example, an
increase of days in a growing season (predictor variable) may
correspond to an increase in the sugar content of apples (response
variable). The number of days is not necessarily the cause of
increased sugar content. Not only is the number of days a proxy for
other "seasonal" variables such as temperature and amount of sunlight, but there may be other factors
at play--what are known as confounding variables (e.g., soil quality
and amount of water). Still, the number of days is completely independent of how much sugar in the apples of a
particular crop. Despite the differences these labels imply, the terms independent variable and dependent variable
are often used without any explicit discussion of cause and effect.
Hopefully this has clarified the concept of independent and dependent variable. One good way to see if you understand the difference is to test yourself. Click here to read three scenarios where you will be asked to label the independent and dependent variables.
Here are 7 possible quantifiable variables. Is it a quantifiable
variable? True or false? Why or why not?
1. The color of the sky over a 2 year period.
2. Price for six months.
3. The likeability of the president on a scale of 1 to 5 during his first year in office.
4. The city of Buffalo during the last ten winter seasons.
5. The number of people in poverty in the United States for the year 2009.
6. Differences between men and women in the average number of words in their spoken sentences for men and women that work in talk radio.
1. False. Color may be an aspect of the sky and it may change, but it not quantifiable (unless we are assigned some kind of value to the category (0=blue, 1=gray, etc.).
2. False. There is no “thing” – the price of what?
3. True. There is a person. The aspect of the person, “likeability," is quantifiable (1-5), and can change over time.
4. False. What is the aspect that is measurable? Its population? Its temperature? Despite there being a "thing" (a city) and the hint of a possibility of change (over a ten year period), there is no quantifiable aspect in this example.
5. False. This seems like a variable; after all, population is an aspect of the United States and can be represented as a number, and it has the potential to vary or show difference. However, there is no possibility of change. It is a data point rather than a variable.
6. True. This is a little abstract, but the number of words is an aspect of the sentence production according to gender. Although there is no change over time, there is potential variation according to gender.
Go Back to "What is a variable?"
Coorporation XYZ wishes to study the correlation between safety training and the number of accidents at various production sites. Those responsible for the training beliee that safety training will help reduce the amount of accidents. The numbers of safety training hours work teams receive and the number of accidents each team experiences are recorded for a period of a year after training occurs. What are the independent and dependent variables?
Independent variable: number of safety training hours
Dependent variable: number of accidents
Explanation of Scenario 1
Training is thought to predict a decrease in the number of accidents. If the labels were reversed, the number of accidents would be thought to predict how many training hours employees would receive. But that would not be the point of the observational study. Notice which data comes first: training hours. A set of data recorded prior to a second set is often represented by the independent variable.
A researcher examining communication in the work place believes
face-to-face communication compared to phone or e-mail communication
is more effective in conveying information in one-on-one situations. In an experiment, the researcher tests a) how much information subjects
retain when they are given that information b) face-to-face, c) over
the phone, and d) through e-mail. Of the four variables given, which
is the dependent variable?
Dependent variable: The amount of information subjects retain
Explanation of scenario 2
The researcher is examining the “effectiveness” of certain types of communication. Although different types of communication do not cause the ability to remember information, the various types are speculated as having an influence on the ability of subjects to remember. If the amount of information subjects retained were the independent variable, then it would somehow cause influence a form of communication. There are three independent variables corresponding to the three types of communication.
A researcher is studying the use of a certain grammatical features according to the socioeconomic status of a speaker. In a pilot study involving speakers from a particular city, the lower the socioeconomic status within participants of the sample population examined, the more this grammatical feature appears in the population. The researcher wants to test the hypothesis that the socioeconomic status influences the frequency of the grammatical feature for a given socioeconomic group for urban populations in the pacific northwest, and is planning on drawing a random sample of speakers from three cities in the region. What is the primary independent variable and dependent variable?
Independent variable: socioeconomic status
Dependent variable: frequency of grammatical feature
Explanation of scenario 3
The example states that the researcher aims to test whether
socioeconomic status “influences” the frequency of the grammatical
feature. In that sense, it is an independent variable. Technically
speaking, being working class or upper class, for example, does not
cause one to speak a certain way. In that sense, the independent
variable should be considered more of a predictor variable, a
predictor variable analagous to an independent variable.
Frequency of a grammatical feature is the dependent variable or
response variable, since its values either change or not depending
on the status of the speaker.
Back to Top
Kalof, L., Dan, A., & Dietz, T. (2008). Essentials of social research. Berkshire, England: Open University.
Vogt, W.P. (2005). Dictionary of statistics and methodology. Thousand Oaks, CA: Sage.