There are many datasets out there. To figure out which datasets would fit your topic or interest it can help to ask questions about what your ideal dataset would look like:
- Who or what is being studied? (e.g, people, towns, countries, etc.)
- What about them is being studied? (e.g., people's height, towns' populations', countries' exports, etc.)
- Where? (e.g., people in Iowa, towns in Ethiopia, countries in Southern Asia)
- When? (e.g., the 1990's, the Middle Ages, during World War II, etc.)
- How often is the data collected? (e.g, once, or weekly, monthly, yearly, etc.) Note: in dataset speak this can involve some special jargon:
- Cross sectional data is collected at on particular point in time across a group of individuals (for instance, if you measured the height of 10 5th grade children on April 10th, 2022) (Jupp, 2011, p. 53).
- Longitudinal data is collected at multiple times across the same group of individuals (for instance, if you followed the same 10 from 5th grade to 6th grade and measured their heights every month) (Jupp, 2011, p. 165).
- Panel data is collected at multiple times across different groups of individuals (for instance, if every year you measured the height of 10 children from the incoming 5th grade class) (Jupp, 2011, p. 212 ).
- Format? What dataset file formats (e.g., an Excel file, CSV, etc.) work with the statistical software (e.g., SPSS, Stata) you're using or have access to through TCNJ?
Try to be specific about your topic/question! For example:
|
Less specific: |
More specific |
Topic |
The health of U.S. adults over time. |
Records of heart disease among U.S. adults in the last 10 years, broken down by year. |
What |
"Health" is too broad. What's an example of health? |
Heart disease |
When |
"Over time" is too vague. What time period are you interested in? |
Last 10 years |
Jupp, V. (2011). The SAGE dictionary of social research methods. London, England: SAGE Publications, Ltd doi: 10.4135/9780857020116