Web-Scale Assessment – plus: Cheap, Easy, Robust & Secure

I was very pleased to see Brandt Redd’s latest blog post does an excellent job of communicating something I struggled with articulating for a long time — namely, that post-enterprise, “modern” architectures, built on top of  things like AWS, GAE and iTunes (Amazon Web Services, Google App Engine) are the only rational starting point(s) for something like the challenge of delivering hundres of thousands, or millions, of even more-or-less simultaneous end-of-course, college entrance or other important (‘high stakes’) educational assessments.

I had searched also for an “open source” assessment design and delivery system — something that made item writing easy and manageable, and that worked with a delivery system that would not have a per-seat or per-user-per-test delivery charge associated with it.  I did find “Tao“, and the team of earnest and hardworking French, Belgian and other academics had made a good go of doing some really “innovative” work.  But this “open source e-testing platform” proved, in my own experimentation at least,  to be unsuitable for my “use cases” on several fronts.  It does has special strengths which reflect it’s heritage (1) — at the same time, the administrative burdon and complexity of deployment reminded me of another era.  It is, in some ways, a good example of what is possible with a lot of grad students (they have their own object-oriented data layer!) and proven LAMP technology.  But like the proprietary systems I have seen from major vendors (AIR, Measured Progress, Pacific Metrics, Internet Test Systems, ETS, CA&L and Pearson), the basic approach is the use of traditional “enterprise technology” which indeed can be designed and delivered to support tens of thousands of users (although i think the model with Tao is a distributed one — many little Taos can opererate independently, or in groups, and test results can be aggregated etc.)  I am sure with work many of these could add an order of magnitude  (or more?) to those sorts of capacity numbers, but it won’t be pretty and and it won’t be nearly as robust as compared to … and entirely different approach (see below).

And to be clear, I am seeking the sort of assessment delivery experience that could also serve as a mechanism for study and review; that has can be used “offline” for these purposes as well as for assessment — because it seems to me the only way to be certain you can achieve equal experience for all e-examinees for a high stakes test is to remove internet latency of any kind for the experience.

In the US, the major “common core state standards” assessment implementation consortia are both going pel-mel down the path of conflating “computer based testing” and “online testing”.  Only Examsoft, among major vendors, seems to have approached the problem with offline test delivery in mind.

And so to make a long story short, i have a new project: I will create an assessment design and delivery system with these characteristics:

  1. easy-to-use item authoring for a range of item types — not fancy “technology enhanced” item types, mind you, to lay claim to “taking advantage of the computer”  (see– you drag-and-drop the X onto the choice!) but items that include student produced responses that are not always constrained by choices in a multiple choice question — and not just for math, but for language arts, critical thinking and rhetorical skills;  items that are supported by significant data, diagrams, texts or other artifacts which can be examined in detail; and items that can be auto-scored by machine intelligence — intelligence that looks at multiple dimensions of an answer, from an expert perspective (and not an over-worked teacher on the evening shift, trying to score 32 essays per hour);
  2. support for item families, item templates, and DOMC item delivery, providing enhanced item security while maximizing the utility of item development work across the broadest possible set of use cases;
  3. simple assessment creation and assembly.  No automated test assembly — stand along apps and models can do that well enough. But the system should support metadata and authoring activities to minimize the difficulty of creating test forms, managing items across their life cycle, and yet simple enough that a teacher can put together anything from a short quiz to a mid-term or final with less work than doing it with a word processors. Or at least not much more 🙂
  4. assessment delivery should leverage web-scale services, and the tests themselves should run — offline — on (firstly) iPads, and in future — well perhaps other devices that provide high resolution displays, consistent and very precise UI Controls, and a rich set of API’s to create and manage the test delivery experience.
  5. in addition to linear tests, the system should support adaptive testing for diagnostic purposes, and “review mode” subject matter study using “knowledge maps”, with (multi-threaded) guided topic-progressions to support exploration, reinforcement and enrichment.

So the solution here seems “obvious” to me as it is doable.  I shall create as a proof of concept a small test design and delivery system in three parts:

  1. A desktop app for item and test authoring.  in fact to keep the project small and manageable  i will start with a simple set of item types found in common middle school standardized tests.  In fact as a single authoring app it will be more of a test creation app than anything else. Items can be composed sepraretly using whatever tools the author knows — word, html, a spreadsheet program — and then the “Test Maker Pro” (TMP) app will be a desktop app (Windows and Mac both would be nice) can be used to create a “form” or test.  
  2. An ipad test delivery system — “Test Taker Pro” (TTP) that can engest and deliver the assessments designed in TMP.  The system will obviously use iTunes to distribute the test delivery app; it should also use a “back end as a service” provider to allow the the completed tests to be fired off to the test owner for detailed analysis and enhanced score reporting, and to send back to the student for consumption in the App the full test results.  The test delivery app should:
    • Deliver multi-section, timed or un-timed tests.
    • Support core data types used in K12 standardized testing today (2013) — and also new CCSS-inspired critical thinking, free form and student-produced response item types. Item Sets that share significant “passages”, data tables, diagrams or other artifacts are important.
    • Support time-on-task logging and (for forensic purposes) item choice logging. (more later)
    • Deliver immediate (Raw and forumla) scores to test takers, supplemented by more nuanced reporting and feedback after the “test results file” and “test event and log files” are (invisibly and automatically) shipped back to the test owner for analysis.
    • Display test material, instructions, and the like in a high quality manner, and leverage the full capabilities of the device to allow delivery of high stakes tests in a proctored environment.

3.  The back-end system should support use of data analytics and visualization tools, and make it easy to process the “test results files” and “test event and log files” to compile everything from item statistics to individual student progress reports.

Evolving Structure of a Web Scale Assessment System (proposed)

Evolving Structure of a Web Scale Assessment System (proposed)

And lastly, while I am tempted to commit to making this project open source, I think for sure the “test creation” system should be.  With an open Spec for the test content data structure, alternative implementations — and conversions from other formats — will be easier to create and promote.  The test delivery system might be better served, for security reasons, to not have it’s source code available to the general public.  Then again, their could easily be a “reference implementation” made open source, for “study mode” and in-classroom “quizzes”, with a “secure, timed test” delivery system (that has additional security elements built in — for secured, password-protected tests that cannot be launched before a proctor supplies the examinee a particular passphrase.)   I will think about this as the project unfolds.

======

1.   “The TAO framework has the ambition to provide a modular and versatile framework for collaborative distributed test development and delivery with the potential to be extended and adapted to virtually every evaluation purpose that could be handled by the means of computer-based assessment.” So for example, if you want to deliver 5,000 tests to 15 year olds in 17 OECD countries in an array of languages (where your items are otherwise ‘the same’), where each country will haven many many test sites of varying sizes and degrees of sophistication, TAO offers a path to a solution (with no small amount of distributed technical support required).  My focus is on a single test experience delivered and managed through the web with no technical expertise required for users and manager.  For a flavor of what this world looks like, see http://www.tao.lu/news/community-news/tao-24-way



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s