Making Text Analytics Accessible to Writing Faculty

William Marcellino (Frederick S. Pardee RAND Graduate School, US)

Ever cheaper computing power along with increasing sophistication in statistical/machine learning approaches to text offers a potential revolution in writing instruction and assessment. We can efficiently mine large corpora of genre and disciplinary examples to extract their defining content and functional features, and then concretely visualize those features in student writing moves. Writing instruction and assessment are primarily a human-only art, but they could be transformed into a more data-driven practice, with context-rich human analytical attention leveraged by machine means. Enabling this jump requires three things.

First, writing instruction as a field needs a workable consensus on the relationship between humans and machines: beyond machines-as-labor-threats, what does a student-centered, fruitful union of human and machine analysis look like? Second, effective practice requires a synthesis of disciplinary approaches: writing instruction/assessment must borrow methods and technology from corpus linguistics, digital rhetorics, computer science, and machine learning. Finally, these methods and technology must be made accessible broadly: analytics and machine learning need to be accessible within the majority humanities base of writing instruction, not just to a few cross-trained practitioners with a foot in another discipline.

I'll illustrate using RAND-Lex, a text analytics and machine learning tool suite developed at the RAND Corporation. Through a pilot effort at University of South Florida, RAND-Lex is making scalable analytics accessible for both writing instruction and digital humanities. Of particular interest to this audience may be "stance comparison": the use of corpus-based analytics to detect the lexicogrammatical (style and stance) features that characterize genre and disciplinary writing, in order to relate those features to student writing.