---
title: "Data Quality Assurance and Visualization Final Project"
subtitle: "Project Assignment"
date: last-modified
---

PSYC 859 Spring 2026

## Due dates

- Proposal (separate submission): 3/26/2026
- In-class presentation (separate assignment): 4/23/2026
- Final project submission: 4/27/2026

## Project guidelines

Building on your submitted final project proposal, execute your data quality assurance and visualization plan. The final submission should make your workflow understandable and runnable by someone else (here, me), and it should clearly show how your QA process, exploratory work, and polished visualizations fit together.

This project should largely follow the structure of your final proposal, but with the emphasis now on implementation and outputs rather than background or justification alone. In other words, I should be able to see the data pipeline you built, the checks you conducted, the exploratory reasoning those checks supported, and the final visual products that emerged from that process.

If your implementation differs from your proposal in any meaningful way (e.g., different variables, different merge strategy, revised hypotheses, additional cleaning steps, dropped datasets, or a different visualization plan), include a short addendum explaining what changed and why.

You may combine code, commentary, and graphics in a Quarto or R Markdown document for submission, or you may submit these pieces separately as long as the entry point and organization are clear.

If you include any data with your submission, it should be deidentified and otherwise appropriate to share in the course context.

## Project materials to be submitted

Submit a single `.zip` containing your full project folder (preferred). A Git repository snapshot or link is also fine if it contains the full structure and all required outputs.

Minimum contents:

1. **Code**: All scripts and/or Quarto/R Markdown documents used to execute the QA + visualization pipeline.
2. **Filesystem snapshot**: A `tree.txt` (preferred) or screenshot showing the project folder structure.
3. **QA outputs**: The key tables/logs/reports used to identify problems and document data quality (e.g., missingness summaries, out-of-range listings, duplicate-ID checks, merge fidelity checks, artifact reports).
4. **EDA outputs**: Your exploratory graphics, either embedded in a report or included as separate files, with a brief statement of what each plot was meant to help you understand.
5. **Publication-quality figures**: At least 2 polished graphics intended for others, exported at high resolution.
   - Preferred formats: vector graphics (`.pdf` or `.svg`) or raster graphics at 300 DPI or higher (`.png`, `.tiff`).
6. **Documentation**: A `README` (or equivalent) describing:
   - what the data are,
   - how to run the project,
   - the main cleaning/processing decisions, and
   - where the key outputs are located.
7. **Design memo**: A brief write-up explaining the design choices for your publication-quality graphics.
8. **Addendum**: If needed, a short note describing departures from the proposal.

Notes:

- Include a clear entry point (for example, `run_pipeline.R`, `render_project.qmd`, or a main Quarto document).
- Keep raw/original data unmodified; store derived products separately (for example, `data/raw` vs `data/processed`), consistent with your proposal.
- It is fine if your final submission is organized as a small, reproducible project rather than a single monolithic file.

## Key concepts to be demonstrated

This project should demonstrate both data QA skills and data visualization skills. Much of the earlier QA project can be reused, but it should now be strengthened by the addition of exploratory and publication-oriented graphics.

### Data QA skills

Successful projects should demonstrate:

1. Thoughtful folder and file setup (file management)
2. Data import into `R`, preserving unadulterated data
3. Data tidying (if necessary)
4. Data wrangling/manipulation (filter, select, arrange, transform, split/apply/combine, etc.)
5. Intelligent data storage choices to support QA and analysis
6. Dataset joins/alignment (if applicable), including fidelity checks
7. Data QA for invalid values, missingness, artifacts, implausible values, and statistical outliers
8. Use of graphical methods to look under the hood of the data and diagnose problems
9. Clear documentation of manual or programmatic edits made to produce analysis-ready data
10. Use of custom functions and/or existing `R` packages to support validation, reporting, and visualization

### Data visualization skills

Your project should also demonstrate the following:

1. **Visualize the same data in at least three (somewhat) different ways.**
   - This could include, for example, aggregated versus faceted views, raw data plus a summary layer, or an overview plot paired with a more detailed diagnostic plot.

2. **Create at least 10 EDA graphics with yourself as the audience.**
   - These do not need to be publication-ready.
   - Their purpose is to help you understand the data, identify data-quality problems, evaluate hypotheses, or decide how best to communicate the results.
   - For each EDA graphic, include a brief statement (1-2 sentences is enough) about what question, uncertainty, or hypothesis the figure was meant to address.

3. **Demonstrate the use of one exploratory analysis**, such as principal component analysis, multidimensional scaling, or clustering.
   - This can count toward the 10 EDA graphics.
   - Use it to look for larger multivariate structure that might be missed in simpler univariate or bivariate plots.

4. **Create at least 2 graphics for others (publication quality).**
   - These should be polished, legible, and intentionally designed for an audience beyond yourself.

5. **Create at least one plot with multiple layers** that result either from distinct datasets or from different levels of aggregation.
   - For example, raw observations plus summary statistics, or participant-level data overlaid with condition-level estimates.

6. **Create and interpret at least one generalized pairs plot.**

## Design memo for publication-quality figures

For your 2 publication-quality graphics, include a short design memo (roughly 1-2 pages total is sufficient). The memo may be a separate file or part of your main report.

Address the following:

- Who is the intended audience for each figure?
- What is the main takeaway you want that audience to see?
- What design alternatives did you consider?
- How does the figure maximize readability and interpretability?
- How does the figure use principles from graphical perception and analytical design?
- When relevant, how does the figure depict both central tendency/belief and uncertainty?

To the extent possible, use course terminology in this memo, drawing on ideas from Tufte, Cleveland, and Bertin.

## Evaluation emphasis

Projects will be evaluated primarily on the following:

- **Reproducibility and organization**: The workflow is structured, documented, and runnable.
- **Quality of QA pipeline**: The submission shows thoughtful checking, problem detection, and clear handling of data issues.
- **Strength of exploratory work**: EDA graphics are purposeful and help reveal structure, anomalies, or substantive patterns.
- **Quality of final figures**: Publication-quality graphics are legible, well designed, and appropriate for their intended audience.
- **Integration of QA, EDA, and final communication**: The final product shows a coherent progression from raw data to polished insight.
- **Transparency of decisions**: Important design, cleaning, and analytic decisions are documented.
