Code
data("who2", package = "tidyr")PSYC 859 Data Management and Visualization
January 8, 2026
This assignment is due by 1/21/2026 at 8am.
For each question either write the code you would use or copy and paste it from the RStudio syntax window. Additionally, paste any (reasonable) output generated by the code. If there is a lot of output, paste enough to see that you were able to get the correct answer.
Load the who2 dataset from the tidyr package into your working environment.
This dataset records counts of tuberculosis by country and year. The other values correspond to a method of diagnosis, sex, and age group. The method of diagnosis codes are: rel = relapse, sn = negative pulmonary smear, sp = positive pulmonary smear, ep = extrapulmonary. For example, sp_m_014 corresponds to positive pulmonary smear (sp), male sex (m), and ages 0-14 (014).
You will see many NA values in who2 because some country-year combinations do not report counts for certain diagnosis/sex/age groups. When you summarize totals, remember to use na.rm = TRUE so missing values do not turn your results into NA.
Additionally, load the Pew relig_income dataset from the tidyr package:
This dataset describes the relationship between income and religion.
Using the Pew dataset
head() and tail(). Is the dataset considered tidy? Why or why not?str()) and class (class()) of the data. If this is not tidy, verbally describe how it would look if it were a tidy dataset. If it is tidy, is it in a format that you would store the data in long-term?tidyr verbs to tidy the Pew dataset. Your tidy dataset should have columns named religion, income, and count. Show the first five rows.Using the who2 dataset
who2 dataset using the methods described in the section above.tidyr package, tidy the who2 dataset so there is a case_group variable and a cases variable. Use those names throughout the rest of the assignment. Show the first five rows of your new dataframe by using head(). Remember, a tidy dataset consists of:case_group variable are still considered not tidy because they represent three observations in one (case type, sex, and age). Use tidyr verbs to separate this variable into type, sex, and age (hint: the values are separated by _).country, year, type, sex, and age. Report whether any combinations appear more than once.sex and age to be more descriptive of what they represent (male or female, 0-4, 5-14, etc.). You can use a combination of dplyr::mutate() and dplyr::recode() to recode the values. Use ?recode if you get stuck.tidyr::unite() by recombining sex and age into a single variable (e.g., sex_age). This is just to practice unite() before you mess it up again in the next step. Show the first five rows of your new dataframe.tidyr verbs to recreate a ‘messy’ dataframe. It should look exactly the same (or similar) as when you first loaded the who2 dataset into your environment. Show the first five rows of your new dataframe. If the code throws a ‘duplicate identifiers error’ you should use dplyr::distinct() in your pipeline.dplyr verbs (use your cases variable).sex and age (use your cases variable). Show your output. If you get NA values for everything, remember to remove those empty values somewhere in your pipeline! Hint Look at the default arguments for the functions you use to summarize your data.country and sex, then add a variable that is the relative prevalence by sex in each country.