• Preparation
    • Purpose
    • Required analysis file variables
    • Analysis-specific sample restrictions
    • Ask yourself
    • Potential further analyses
  • Analysis
    • Step 1: Set up a matrix to hold teacher, new teacher, and student results.
    • Step 2: Load the Teacher_Year_Analysis data file.
    • Step 3: Create dummy variables for major teacher race/ethnicity categories.
    • Step 4: Restrict the teacher sample.
    • Step 5: Review teacher variables.
    • Step 6: Get teacher sample sizes.
    • Step 7: Store percentages by race for all teachers and newly hired teachers.
    • Step 8: Load the Student_School_Year data file to get student data.
    • Step 9: Make the file unique by sid and school_year.
    • Step 10: Restrict the student sample.
    • Step 11: Review student variables.
    • Step 12: create dummy variables for major student race/ethnicity categories.
    • Step 13: Get student sample sizes.
    • Step 14: Store percentages by race for students.
    • Step 15: Replace the dataset with the matrix of results.
    • Step 15: Graph the results.
    • Step 1: Save the chart in Stata Graph and EMF formats.

OpenSDP Analysis / Human Capital Analysis: Recruitment / Examine the Distribution of Teachers and Students by Race



Compares the shares of all teachers, newly hired teachers, and students by race.

Required analysis file variables

  • tid
  • school_year
  • t_new_hire
  • t_race_ethnicity
  • sid
  • s_race_ethnicity

Analysis-specific sample restrictions

  • For the student and teacher samples, keep only records for which race information is not missing.
  • For the student and teacher samples, keep only years for which teacher new hire information is available.

Ask yourself

  • Is the racial composition of your teacher workforce similar to the racial composition of your student body? Is there a difference in racial composition between all teachers and newly hired teachers?
  • If there is a difference between teachers and students, what impact might this have on student learning?

Potential further analyses

You may wish to replicate this analysis for specific schools or groups of schools.


Step 1: Set up a matrix to hold teacher, new teacher, and student results.

matrix race = J(4, 4, .)
matrix colnames race = race teacher new_teacher student

Step 2: Load the Teacher_Year_Analysis data file.

use "${analysis}\Teacher_Year_Analysis.dta", clear
isid tid school_year

Step 3: Create dummy variables for major teacher race/ethnicity categories.

gen t_black = (t_race_ethnicity == 1)
gen t_asian = (t_race_ethnicity == 2)
gen t_latino = (t_race_ethnicity == 3)
gen t_white = (t_race_ethnicity == 5)

Step 4: Restrict the teacher sample.

keep if school_year == 2015
keep if !missing(t_race_ethnicity)
keep if !missing(t_new_hire)

Step 5: Review teacher variables.

tab school_year t_race_ethnicity, mi
tab t_new_hire t_white, mi row
tab t_new_hire t_black, mi row
tab t_new_hire t_latino, mi row
tab t_new_hire t_asian, mi row

Step 6: Get teacher sample sizes.

summ tid
local teacher_years = string(r(N), "%6.0fc")
    bys tid: keep if _n == 1
    summ tid
    local unique_teachers = string(r(N), "%6.0fc")

Step 7: Store percentages by race for all teachers and newly hired teachers.

local i = 1
foreach race of varlist t_white t_black t_latino t_asian {
    matrix race[`i', 1] = `i'
    summ `race'
    matrix race[`i', 2] = 100 * r(mean)
    summ `race' if t_new_hire == 1
    matrix race[`i', 3] = 100 * r(mean)
    local i = `i' + 1

Step 8: Load the Student_School_Year data file to get student data.

use "${analysis}\Student_School_Year.dta", clear

Step 9: Make the file unique by sid and school_year.

keep sid school_year s_race_ethnicity
duplicates drop
isid sid school_year

Step 10: Restrict the student sample.

keep if school_year == 2015
keep if !missing(s_race_ethnicity)

Step 11: Review student variables.

tab school_year s_race_ethnicity, mi

Step 12: create dummy variables for major student race/ethnicity categories.

gen s_black = (s_race_ethnicity == 1)
gen s_asian = (s_race_ethnicity == 2)
gen s_latino = (s_race_ethnicity == 3)
gen s_white = (s_race_ethnicity == 5)

Step 13: Get student sample sizes.

summ sid
local student_years = string(r(N), "%9.0fc")
    bys sid: keep if _n == 1
    summ sid
    local unique_students = string(r(N), "%9.0fc")

Step 14: Store percentages by race for students.

local i = 1
foreach race of varlist s_white s_black s_latino s_asian{
    summ `race'
    matrix race[`i', 4] = 100 * r(mean)
    local i = `i' + 1

Step 15: Replace the dataset with the matrix of results.

svmat race, names(col)

Step 15: Graph the results.

#delimit ;
graph bar teacher new_teacher student, 
    bar(1, fcolor(dknavy) lcolor(dknavy)) 
    bar(2, fcolor(dknavy*.7) lcolor(dknavy*.7)) 
    bar(3, fcolor(maroon) lcolor(maroon))
    blabel(bar, position(outside) color(black) format(%10.0f))
    over(race, relabel(1 "White" 2 "Black" 3 "Latino" 4 "Asian") 
    title("Share of Teachers and Students", span)
    subtitle("by Race", span)
    ytitle("Percent", size(medsmall))
    ylabel(0(20)100, labsize(medsmall) nogrid)
    legend(order(1 "All Teachers" 2 "Newly Hired Teachers" 3 "Students")
        position(6) symxsize(2) symysize(2) rows(1)
        size(medsmall) region(lstyle(none) lcolor(none) color(none)))
    graphregion(color(white) fcolor(white) lcolor(white))
    plotregion(color(white) fcolor(white) lcolor(white) margin(5 5 2 0))
    note(" " "Notes: Sample includes teachers and students in the 2014-15 school year, 
with `unique_teachers' unique teachers and `unique_students' unique students.", size(vsmall) 
#delimit cr

Step 1: Save the chart in Stata Graph and EMF formats.

graph export "${graphs}/Share_Teachers_Students_by_Race.emf", replace
graph save "${graphs}/Share_Teachers_Students_by_Race.gph", replace

