PSYC 859 Syllabus
Course Description
This graduate course is intended to provide an applied introduction to data management and data visualization in the social sciences. In order to take full advantage of modern statistical methods (e.g., structural equation models), competency in data management, semi-automated processing, and data wrangling is prerequisite. Likewise, prior to employing inferential statistics, exploratory visualization and analysis is essential to facilitate data cleaning and to form an initial understanding of patterns in the data. This course will cover both the principles and practice of data management, visualization, and exploratory analysis for summarizing quantitative data. In addition, students will learn data science skills to manage and visualize “big data,” where the size or complexity of the dataset defies traditional techniques.
Applications of data management, visualization, and analysis will use the R statistical programming language. R is quickly becoming the lingua franca in data science across disciplines and offers unparalleled tools for data analysis and visualization.
Learning Objectives
- Design and implement reproducible data workflows for managing, cleaning, and documenting complex datasets using modern R-based tools.
- Use exploratory data analysis and visualization as tools for scientific reasoning, including identifying patterns, anomalies, uncertainty, and data limitations.
- Apply evidence-based principles of graphical perception and design to create clear, accurate, and interpretable quantitative graphics.
- Critically evaluate and redesign data visualizations in scientific research and public communication, justifying design choices and trade-offs.
- Communicate substantive insights through publication-quality and presentation-ready visualizations, integrating narrative, annotation, and appropriate use of modern visualization technologies.
Prerequisites
PSY 830 (Statistical Methods in Psychology I) or graduate course equivalent.
Students are expected to have a basic understanding of R or another high-level programming language (e.g., Python). Minimally, students should understand basic principles of computer programming, including:
- conditional logic (if/else) and logical operators (e.g., equality)
- basic data types (in R, vectors, lists, data.frames, matrices, and arrays)
- flow control (for/while loops, next, break)
- import and export of data from text files
- subsetting data using basic R syntax such as x[1:10, c(1,5)].
Required Textbooks
- Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Cheshire, CT: Graphics Press.
- Wickham, H. (2025). ggplot2: Elegant graphics for data analysis (3rd ed.). New York: Springer. Please use the online (not-yet-published) version here: https://ggplot2-book.org.
- Wickham, H., Cetinka-Rundel, M., & Grolemund, G. (2023). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (2nd edition). O’Reilly Media. Available online: https://r4ds.hadley.nz.
Recommended Textbooks
- Chang, W. (2019). R graphics cookbook (2nd ed.). Sebastopol, CA: O’Reilly. (Especially useful for those with little background in R and ggplot2). Available online: https://r-graphics.org.
- Cleveland, W. S. (1994). The elements of graphing data (2nd ed.). Summit, NJ: Hobart Press.
Required Software
- R (free): http://cran.r-project.org/
- RStudio (free): http://www.rstudio.com/
- Inkscape (free): https://inkscape.org/en/
- Gimp (free): http://www.gimp.org
- R packages to install: tidyverse, shiny, ggplotly, broom, ggplot2, tidymodels
Class Structure
With a few exceptions, class will be structured into three blocks as follows:
- 9:00-10:10am Figure critique, lecture, and discussion of readings
- 10:10am-10:20am Break
- 10:20am-11:30am R demonstration and practical exercise
During the R demonstration, we will work on a data-related project together, so please bring a laptop with the above software and packages loaded. Let me know if this will be a difficulty for you so that we can arrange alternative plans.
Evaluation
Although the course will review the principles of effective data visualization (e.g., graphical perception), the course is primarily intended to facilitate your applied skills managing and visualizing data. Consequently, there will be no formal exams or quizzes. Instead, your grade will be based primarily on figure critiques, take-home exercises, participation, and three projects. Students are encouraged to bring a dataset that is relevant to their research for use in each project. If possible, a challenging dataset (one that defies simple management in a spreadsheet format) will provide better opportunities to learn advanced data management and visualization skills.
All projects are to be completed individually. Although you are encouraged to discuss data management and visualization challenges with your classmates, you’ll get the most benefit from the course by developing projects yourself.
15% Participation, as defined by attending class, contributing to reading discussions, engagement with lab exercises, and otherwise contributing to scholarly discourse.
10% Figure critiques and data exercises. In the first part of the semester, students will complete take-home data exercises to become familiar with managing, tidying, and wrangling data in R. In the latter half of the semester, students will be expected to submit a critique of at least one figure or table. The critique can be brief, perhaps in bullet form, but should highlight key strengths and limitations of the display, as well as suggestions for alternative visualizations. One (or perhaps a few) figures will be discussed at the beginning of class before the discussion of readings.
Assignments will be due by 8am on Wednesdays to provide time to review them and incorporate into class discussion.
5% Data quality assurance and processing proposal. Due: 1/29 (Week 4)
15% Data quality assurance project (code and output). Due: 2/12 (Week 6)
15% Conceptual figure/infographic project. Due: 3/5 (Week 9)
5% Final project proposal. Due: 3/26 (Week 11)
10% Final presentation of data visualization project. Due: 4/23 (Week 14)
25% Data visualization final product. Due: 4/27
Schedule
Week 1 (1/8): Introduction to data management and tidy data
Conceptual readings
- Briney, K., Coates, H., & Goben, A. (2020). Foundational Practices of Research Data Management. Research Ideas and Outcomes, 6, e56508. https://doi.org/10.3897/rio.6.e56508
- Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some Simple Guidelines for Effective Data Management. The Bulletin of the Ecological Society of America, 90, 205-214. https://doi.org/10.1890/0012-9623-90.2.205
- (Optional) Borghi, J. A., & Gulick, A. E. V. (2021). Data management and sharing: Practices and perceptions of psychology researchers. PLOS ONE, 16, e0252047. https://doi.org/10.1371/journal.pone.0252047
Practical readings
- Wickham, H., Cetinka-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.).
- Ch. 2: Workflow: basics. https://r4ds.hadley.nz/workflow-basics.html
- Ch. 4: Workflow: code style. https://r4ds.hadley.nz/workflow-style.html
- Ch. 5: Data tidying. https://r4ds.hadley.nz/data-tidy.html
- Ch. 6: Workflow: scripts and projects. https://r4ds.hadley.nz/workflow-scripts.html
- Ch. 7: Data import. https://r4ds.hadley.nz/data-import.html
- (Supplementary) Tidyr pivoting vignette: https://tidyr.tidyverse.org/articles/pivot.html
- Tidyr cheatsheet: https://rstudio.github.io/cheatsheets/tidyr.pdf
- RStudio Data Import cheat sheet: https://rstudio.github.io/cheatsheets/data-import.pdf
Week 2 (1/15): Data aggregation, manipulation, joins
Practical readings
- Wickham, H., Cetinka-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.).
- Ch. 3: Data transformation: https://r4ds.hadley.nz/data-transform.html
- Ch. 12: Logical vectors: https://r4ds.hadley.nz/logicals.html
- Ch. 13: Numbers: https://r4ds.hadley.nz/numbers.html
- Ch. 14: Strings: https://r4ds.hadley.nz/strings.html
- Ch. 16: Factors: https://r4ds.hadley.nz/factors.html
- Ch. 19: Joins: https://r4ds.hadley.nz/joins.html
- (Optional) Ch. 15: Regular expressions. https://r4ds.hadley.nz/regexps.html
- (Optional) Ch. 17: Dates and times. https://r4ds.hadley.nz/datetimes.html
- (Optional) Ch. 20: Import: spreadsheets. https://r4ds.hadley.nz/spreadsheets.html
Week 3 (1/22): Data processing and quality assurance, custom functions, basics of automation
Conceptual readings
- Van den Broeck, J., Argeseanu Cunningham, S., Eeckels, R., & Herbst, K. (2005). Data Cleaning: Detecting, diagnosing, and editing data abnormalities. PLoS Medicine, 2(10), e267.
- (Optional) Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72, 2-10.
Practical readings
- Wickham, H., Cetinka-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.).
- Ch. 25: Functions. https://r4ds.hadley.nz/functions.html
- Ch. 26: Iteration. https://r4ds.hadley.nz/iteration.html
- R validate package. Review https://cran.r-project.org/web/packages/validate/vignettes/cookbook.html
- Introduction to pointblank package: https://bookdown.org/pdr_higgins/rmrwr/data-exploration-and-validation-with-the-pointblank-package.html
Week 4 (1/29): Advanced data manipulation and management, tracking work in R markdown
Conceptual readings
- Wilson, G., Aruliah, D. A., Brown, C. T., Chue Hong, N. P., Davis, M., Guy, R. T., Haddock, S. H. D., Huff, K. D., Mitchell, I. M., Plumbley, M. D., Waugh, B., White, E. P., & Wilson, P. (2014). Best Practices for Scientific Computing. PLoS Biology, 12(1), e1001745. https://doi.org/10.1371/journal.pbio.1001745
- Karl Broman steps to reproducible research: http://kbroman.org/steps2rr/
Practical readings
- Gandrud (2015). Chs. 2 and 4 in Reproducible Research with R and RStudio.
- Getting started with Quarto: https://quarto.org/docs/get-started/hello/rstudio.html
- Introduction to targets package in R: https://books.ropensci.org/targets/walkthrough.html
Week 5 (2/5): Principles of data visualization and graphical grammar
Conceptual readings
- Tufte, The visual display of quantitative information, Chs. 1-3.
- Cairo, The functional art, Ch. 1.
Practical readings
- Wickham, ggplot2 (3rd edition).
- The grammar. https://ggplot2-book.org/mastery.html
- Ch. 13: Building a plot layer by layer. https://ggplot2-book.org/layers.html
Week 6 (2/12): Visual and graphical perception
Conceptual readings
- Cleveland, W. S., & McGill, R. (1984). Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods. Journal of the American Statistical Association, 79, 531-554.
- Kosslyn, S. M. (2006). Graph design for the eye and mind, Ch. 1.
- Munzner, T. (2014). Visualization Analysis and Design, Ch. 6.
Week 7 (2/19): Graphic design, layout, style, use of color
Conceptual readings
- Tufte, Beautiful evidence, Ch. 5: The Fundamental Principles of Analytical Design
- Tufte, The visual display of quantitative information, Chs. 4 and 5.
- Franconeri, S. L., Padilla, L. M., Shah, P., Zacks, J. M., & Hullman, J. (2021). The Science of Visual Data Communication: What Works. Psychological Science in the Public Interest, 22(3), 110–161.
Practical readings
- Boone & Evans. UX Color Theory: a short practical intro about use of color.
- Review: http://colorbrewer2.org/
- Color palette app: https://coolors.co/app
Optional readings
- Munzner, T. (2014). Visualization Analysis and Design, Ch. 10: Map Color and Other Channels.
- Cleveland, W. (1985). The elements of graphing data, Ch. 2: Principles of graph construction.
- Wilke, C. O. Fundamentals of Data Visualization.
- Chapter 17 — The principle of proportional ink: https://clauswilke.com/dataviz/proportional-ink.html
- Chapter 18 — Overlapping points: https://clauswilke.com/dataviz/overlapping-points.html
- Chapter 19 — Common pitfalls of color use: https://clauswilke.com/dataviz/color-pitfalls.html
- Chapter 20 — Redundant coding: https://clauswilke.com/dataviz/redundant-coding.html
Week 8 (2/26): A tour of quantitative visualization
Conceptual readings
- Tufte, The visual display of quantitative information, Chs. 6-8.
- Heer, J., Bostock, M., & Ogievetsky, V. (2010). A Tour through the Visualization Zoo. Communications of the ACM, 53(6), 59-67.
Practical readings
- Wickham, ggplot2 (3rd edition; online draft).
- Ch. 2: First steps. https://ggplot2-book.org/getting-started.html
- Layers (overview). https://ggplot2-book.org/toolbox.html
- Ch. 3: Individual geoms. https://ggplot2-book.org/individual-geoms.html
- Ch. 4: Collective geoms. https://ggplot2-book.org/collective-geoms.html
- Ch. 5: Statistical summaries (incl. distributions + overplotting). https://ggplot2-book.org/statistical-summaries.html
- Ch. 6: Maps. https://ggplot2-book.org/maps.html
- Ch. 8: Annotations. https://ggplot2-book.org/annotations.html
Week 9 (3/5): Visualizing continuous data (in ggplot2)
Conceptual readings
- Tufte, The visual display of quantitative information, Ch. 9.
- Cleveland, W. S. (1993). Visualizing Data, Chs. 1-2.
Practical readings
- Wickham, ggplot2 (3rd edition; online draft).
- Ch. 10: Position scales and axes. https://ggplot2-book.org/scales-position.html
- Ch. 11: Color scales and legends. https://ggplot2-book.org/scales-colour.html
- Ch. 12: Other aesthetics. https://ggplot2-book.org/scales-other.html
Week 10 (3/12): Visualizing count and categorical data (in ggplot2)
Practical readings
- Friendly, M. Working with Categorical Data in R.
- Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H., & Wickham, H. (2012). The Generalized Pairs Plot.
3/19: No class (Spring break)
Week 11 (3/26): Maximizing clarity: preparing graphics for presentation and publication
Conceptual readings
- Tufte, E. R. Beautiful evidence, Ch. 5: The Cognitive Style of PowerPoint
Practical readings
- Wickham, ggplot2 (3rd edition; online draft).
- Ch. 9: Arranging plots. https://ggplot2-book.org/arranging-plots.html
- Chang, W. R Graphics Cookbook (2nd ed.).
- Annotations. https://r-graphics.org/CHAPTER-ANNOTATE.html
- Appearance of plots. https://r-graphics.org/CHAPTER-APPEARANCE.html
- Video: Hans Rosling: The best stats you’ve ever seen
- Review: Six Simple Techniques for Presenting Data: Hans Rosling’s TED 2006
Optional readings
- Patchwork site: https://patchwork.data-imaginist.com/articles/patchwork.html
4/2: No class (Well-being day)
Week 12 (4/9): Visualizing and understanding fit (and misfit) of statistical models
Week 13 (4/16): Exploratory statistics for understanding data: clustering, multidimensional scaling, dimension reduction
Week 14 (4/23): Final presentations of data projects
Class Attendance
You are advised to attend all lectures because some material presented in lecture will not be in the readings. Additionally, the lectures will give you a sense of what to focus on in the readings and how to integrate information across topics.
University Policy: As stated in the University’s Class Attendance Policy, no right or privilege exists that permits a student to be absent from any class meetings, except for these University Approved Absences:
- Authorized University activities: University Approved Absence Office (UAAO) website provides information and FAQs for students and FAQs for faculty related to University Approved Absences.
- Disability/religious observance/pregnancy, as required by law and approved by the Equal Opportunity and Compliance Office (EOC).
- Significant health condition and/or personal/family emergency as approved by the Office of the Dean of Students, Gender Violence Service Coordinators, and/or the Equal Opportunity and Compliance Office (EOC).
Use of AI Tools (e.g., Large Language Models)
Recent advances in large language models (LLMs) and AI-assisted coding tools (e.g., ChatGPT, GitHub Copilot) have made them highly effective aids for data wrangling, visualization, and programming in R. In professional research and applied data science settings, such tools are increasingly used to accelerate development, diagnose bugs, and explore alternative implementations.
At the same time, a central goal of this course is for students to develop their competency in data management, exploratory analysis, and visualization. Becoming proficient in these areas will allow students to implement these skills independently and to critically evaluate code written by collaborators or generated by AI tools.
Permitted uses
Students may use AI tools for the following purposes:
- Debugging code, including identifying syntax errors, logical errors, or unexpected behavior.
- Requesting explanations of how a piece of code works, why it produces a particular result, or why it fails.
- Requesting suggestions for code improvement, refactoring, or alternative approaches to a task.
These uses are consistent with how AI tools are used responsibly in real research workflows.
Expectations and responsibilities
When using AI tools, students are expected to:
- Actively evaluate and understand any code they submit, regardless of its source.
- Ensure they can explain what the code does and why it works, including key functions, assumptions, and consequences.
- Make independent decisions about whether to adopt AI-suggested code, rather than copying it uncritically.
- Remain responsible for correctness, clarity, and reproducibility of all submitted work.
Submitting code that the student does not understand is inconsistent with the learning objectives of the course.
Prohibited uses
The following are not permitted:
- Submitting AI-generated code or analyses without understanding or review.
- Using AI tools as a substitute for engaging with core course concepts (e.g., visualization principles, data cleaning logic).
- Representing AI-generated work as understanding or reasoning that the student cannot demonstrate if asked.
Transparency
For major assignments and the final project, students may be asked to include a brief AI use statement (1-3 sentences) describing whether and how AI tools were used (e.g., “used for debugging,” “used to explore alternative ggplot layouts”). This is not punitive; its purpose is to promote transparency and reflective practice.
Laptops and mobile devices; video/audiotaping
You are encouraged to bring a laptop to class for course-related use during the lecture and practical part of each meeting. Please, however, ensure that laptops and mobile devices are silent during class. In addition, please refrain from texting, checking social media, or otherwise dividing your attention with personal matters. If you would like to audio or videotape any of the lectures, please obtain the instructor’s permission first.
Equal Opportunity and Compliance - Accommodations
Equal Opportunity and Compliance Accommodations Team (Accommodations - UNC Equal Opportunity and Compliance) receives requests for accommodations for disability, pregnancy and related conditions, and sincerely held religious beliefs and practices through the University’s Policy on Accommodations. EOC Accommodations team determines eligibility and reasonable accommodations consistent with state and federal laws.
Counseling and Psychological Services (CAPS)
UNC-Chapel Hill is strongly committed to addressing the mental health needs of a diverse student body. The Heels Care Network website is a place to access the many mental health resources at Carolina. CAPS is the primary mental health provider for students, offering timely access to consultation and connection to clinically appropriate services. Go to the CAPS website or visit their facilities on the third floor of the Campus Health building for an initial evaluation to learn more. Students can also call CAPS 24/7 at 919-966-3658 for immediate assistance.