Forty-six years later, now more than ever, I have to ask: What’s Goin’ on?
I was interested to read in the the piece in the MIT Technology Review,
A million dollar prize certain grabbed some headlines, but the details of the winning solution – more image annotations (e.g. more trained doctors / technicians), plus partitioning the basic problem into a) finding nodules; and b) diagnosing cancer), are both clear signposts to the future. Indeed, the future of low-dose CT scans is certainly looking stronger. And while progress with machine learning, medical imaging, and diagnostic medicine is not always linear (or straightforward, as we read here), 3D imagines that capture relative tissue density and other characteristics clearly provide a highly construct-relevant feature set that is making advances in this are steady and promising (editorial: in a way that other work (e.g. is this argument convincing?) relying on indirect features and characteristics (computational linguistics in this case) is not yet keeping up…).
Since Google’s acquisition of the Kaggle, I have not taken a new look at the Google tool set for creating deep learning networks, but the promise of introducing a “semantic data layer” based on a semantic grammar approach to rubric construction might offer a promising path to better machine understanding of text and speech.
Often, in the context of large scale testing programs, “critical thinking assessment” is represented more by “information synthesis“, “reading comprehension“, “problem solving” or other exercises that require an examinee to make a claim and cite evidence and reasoning to support it.
In some contexts this is also called “Argumentative Writing” — much as the “analyze and argument” question on the GMAT was once a common “analytical writing” task, but only one program that comes to mind — the CAE’s Collegiate Learning Exam Plus (or Minus or Pro of whatever the marketing types want to call it this year) — does or did (at one point) break out “problem solving” and “analytic reasoning & evaluation” as dimensions on a rubric for a performance task, although they may have moved toward generalize “analysis and problem solving” dimension in current exams.
In any event, the big news today is that I have discovered EXACTLY the self-paced, student-centric, topic-organized critical thinking product and platform I have long envisioned that would replace the beloved “SRA Reading Cards” of my youth. A group in Chicago has created a modern, digital version of this tool — organized as a set of subject mater-organized topics, grade / difficulty sequenced, that (hopefully) are as interesting and “teachful” as the SRA reading card stories and articles were. Only here, students WRITE about what they read, not just answer MCQs. And they are taught to cite evidence, make claims, explain reasoning — even identify counter-arguments! Great stuff.
Read more about them at ThinkCERCA.com.
My longstanding quest for the best tool set for rapid, flexible and powerful development (where for me “powerful” includes robust sql database support, strong dictionary / no
sql data support, access to powerful statistics libraries, machine learning libraries, NLP and other libraries, and support for web apps and restful, back end services), I have settled on Python3, MySQL / Aurora, PyCharm + Sublime, NLTK, numpy, flask and the rest. I am already happy with my productivity, and have recently recognized a need to move from OS-X to Linux for more of the heavy lifting. Which, naturally, means everything is going to AWS…
AWS has improved in a hundred ways in the last three years, and when I was last certified I thought it was the best thing ever. So January 31 I hope to be re-certified but have already begun to migrate my personal projects and infrastructure to the cloud… I expect this to take months as I interleave it with ongoing development and research projects.
All of which is good, and AWS goes a long way for me toward making infrastructure into software. But there is another area I want to understand better and apply in my quest for more efficiency, and this raises the question in the title: Chef or Pup
pet? I wont throw in Ansible or Salt as this article does, and based on some of what I am reading, perhaps my Python penchant might argue one
way, whereas my striving to use workplace-relevant tools and approaches across the board my weight my choice against what is optimums for my current daily workload.
Another factor might be how well AWS integrates with either product, which would weigh more heavily than my personal Python needs, as Ruby and other toolsets are likely to be more important to many of my future clients / employers.
I also should give a big SHOUT OUT (is that big? ) to James and his team at LinuxAcademy who continue, five or seven years in, to innovate and do a fantastic job of providing top-flight hands-on training for AWS / Linux / Azure devs, sysops and architects. Fantastic performance for a small firm that obviously has their priorities right!
But back on Chef vs. Puppet, I will find or create a comparison and figure out if either are going to save me cycles, make me more efficient, or just slow me (and my small team) down!
Bias and variance, precision and recall – these are concepts that, after a few months or maybe even a just a couple of weeks of crawling around in actual data, predictive models, and the study of where prediction and reality meet — begin to have an intuitive feel. But it was nice to read recently a short piece that brings these concepts clearly into focus, and frames them in terms of model behavior. This is something I will keep handy to share where my own jabbering on the subject is likely to be less clear and certainly less concise. The source of the article was (via re-post) the KDNuggets blog, which is an excellent resource.
There are, perhaps unsurprisingly, many good “nuggets” on the KDnuggest blog / web site. And this latest item does a good job of explaining what is at some point intuitive to people who work with machine learning models regularly. Perhaps this is particularly relevant to modeling and mining “text’ — the work I have been doing in Machine Learning — because it certainly is spot on. And this is more a way of describing how the math models the real world, and how the data is reflected in the math, so I expect this view is likely helpful to anyone modeling data.
The somewhat “click-bait” sounding title — “4 Reasons Your Machine Learning Model is Wrong” is only modestly apologized for with the “(and How to Fix It)” suffix, but makes me worry fake-aggressive, pretend-demeaning discourse could be among the worst forms of carry-over of 2016 into 2017.
I will instead remember that genuine aggressive, demaning discourse is worse… and continue to appreciate the sharing that sites like this do for the larger community.
Happy New Year!
So much progress, so much exemplary assessment. So much data.
This will take some some number of weeks to process…. and I will try to link to the best bits. So far the Economist treatment is looking pretty thorough.
And of course the PISA web site itself has a visualization tool… i think…
The interactive “problem solving” exercises — using “MicroDYN” systems and “finite-state automata” in particular look really interesting.
As I have posted three or four times on the question of how best to create modern, scalable, flexible and robust web applications (here, here and here), I finally decided that Ruby was going to require too much ramp-up time and — when I read about Flask — decided that my tool set for the next quantum of time (five years?) – to replace my joy working with Objective C / iOS / Xcode/ sqlite – would be Python3 / MySQL (Aurora) / Flask (for web framework) / PyCharm, NLTK, Pandas and one day scikit-learn and the rest of it (for now I will stick with Rapidminer and LightSIDE as my black-box for text-based machine learning).
And one last thing: Docker. There are many, many reasons that this presents to me the best-of-breed toolbox of modern application frameworks and tools. To more fully describe — at the “example” level and not at the technology / component level — why these tools, frameworks and yes, deployment choices are perfect for me, there could be no more perfect example than this: I wanted to put together a simple web-based registration scheme: a public facing set of web pages — maybe just one or two — that do a “signup”.
In an relatively few days — or perhaps a couple dozen hours — I was able to create from scratch an entire (simple) web app using Flask, Python3, MySQL (I went with out-of-the-box MySQL rather than Aurora as I will need to learn more about Aurora and that will be fun but not short) — that is now live here (and with a link on my home page).
Just as creating a quick-and-simple one page web site using Bootstrap proved to me the value of that framework — this “demonstration app” has validated my ideas and met my initial goals. I did have a couple of false starts with other technologies, but this one looks good. And like BootStrap, you really only get the benefit of it when working with experts. My lame web site reflects my one day rush to get some pages up — and a pre-packaged template that was free (or cheap enough to be equivalent). But if I wanted to do something serious with that web site, I would hire an expert.
Similarly, web security is awfully complex these days, and a side-benefit (or main benefit for some) from using Docker is built-in “isolation” that is a good starting point for enhanced security, and as such is a foundational component of this new technology stack. But like bootstrap and Aurora, I am going to need to spend more time with Docker to understand it, and for this project, a my Russian friend Yury took care of Docker (and everything else) so that I could get the project done quickly. But I will return to all of this in 2017. And with any luck it all be even more “mainstream” then than they are today. Aurora, for one, seems like another major competitive advantage to AWS, who already has too many to count!