Listening to the Data – Four ways to tweak your Machine Learning models

Bias and variance, precision and recall –  these are concepts that, after a few months or maybe even a just a couple of weeks of crawling around in actual data, predictive models, and the study of where prediction and reality meet — begin to have an intuitive feel.  But it was nice to read recently a short piece that brings these concepts clearly into focus, and frames them in terms of model behavior.  This is something I will keep handy to share where my own jabbering on the subject is likely to be less clear and certainly less concise.  The source of the article was (via re-post) the KDNuggets blog, which is an excellent resource.

There are, perhaps unsurprisingly, many good “nuggets” on the KDnuggest blog / web site. And this latest item does a good job of explaining what is at some point intuitive to people who work with machine learning models regularly.  Perhaps this is particularly relevant to modeling and mining “text’ — the work I have been doing in Machine Learning — because it certainly is spot on. And this is more a way of describing how the math models the real world, and how the data is reflected in the math, so I expect this view is likely helpful to anyone modeling data.

The somewhat “click-bait” sounding title — “4 Reasons Your Machine Learning Model is Wrong” is only modestly apologized for with the “(and How to Fix It)” suffix, but makes me worry fake-aggressive, pretend-demeaning discourse could be among the worst forms of carry-over of 2016 into 2017.

I will instead remember that genuine aggressive, demaning discourse is worse… and continue to appreciate the sharing that sites like this do for the larger community.

Happy New Year!



The new PISA! Still Reviewing and Reading…

So much progress, so much exemplary assessment. So much data.

This will take some some number of weeks to process…. and I will try to link to the best bits. So far the Economist treatment is looking pretty thorough.

And of course the PISA web site itself has a visualization tool… i think…

The interactive “problem solving” exercises — using “MicroDYN” systems and “finite-state automata” in particular look really interesting.

The test is here;  more results and more start here.

My Technology Stack for new Web Apps

As I have posted three or four times on the question of how best to create modern, scalable, flexible and robust web applications (here, here and here), I finally decided that Ruby was going to require too much ramp-up time and — when I read about Flask — decided that my tool set for the next quantum of time (five years?) – to replace my joy working with Objective C / iOS / Xcode/ sqlite – would be Python3 / MySQL (Aurora) / Flask (for web framework) / PyCharm, NLTK, Pandas and one day scikit-learn and the rest of it (for now I will stick with Rapidminer and LightSIDE as my black-box for text-based machine learning).


And one last thing:  Docker.  There are many, many reasons that this presents to me the best-of-breed toolbox of modern application frameworks and tools.  To more fully describe — at the “example” level and not at the technology / component level — why these tools, frameworks and yes, deployment choices are perfect for me, there could be no more perfect example than this:  I wanted to put together a simple web-based registration scheme: a public facing set of web pages — maybe just one or two — that do a “signup”.

In an relatively few days — or perhaps a couple dozen hours — I was able to create from scratch an entire (simple) web app using Flask, Python3, MySQL (I went with out-of-the-box MySQL rather than Aurora as I will need to learn more about Aurora and that will be fun but not short) — that is now live here (and with a link on my home page).


Just as creating a quick-and-simple one page web site using Bootstrap proved to me the value of that framework — this “demonstration app” has validated my ideas and met my initial goals.  I did have a couple of false starts with other technologies, but this one looks good. And like BootStrap, you really only get the benefit of it when working with experts. My lame web site reflects my one day rush to get some pages up — and a pre-packaged template that was free (or cheap enough to be equivalent).  But if I wanted to do something serious with that web site, I would hire an expert.

Similarly, web security is awfully complex these days, and a side-benefit (or main benefit for some) from using Docker is built-in “isolation” that is a good starting point for enhanced security, and as such is a foundational component of this new technology stack.  But like bootstrap and Aurora, I am going to need to spend more time with Docker to understand it, and for this project, a my Russian friend Yury took care of Docker (and everything else) so that I could get the project done quickly. But I will return to all of this in 2017.  And with any luck it all be even more “mainstream” then than they are today.  Aurora, for one, seems like another major competitive advantage to AWS, who already has too many to count!


Reuters shines some light…

A new piece from Renee Dudley and Reuters.   Some I know were disappointed by “yet another multiple choice test”, or with the continued dominance of “the number-two pencil”, while others were nevertheless satisfied with this somewhat new version of the test if only for a couple (handful?) of content-related changes they were happy to see.  Some less so.  Some were enraged.

Read “Crash Course: College Board faces rocky path after CEO pushes new vision for SAT“.

TensorFlow is released: Google Machine Learning for Everyone

2FNLTensorFlow_logoGoogle posted information about TensorFlow —  the release of as open source of a key bunch of machine learning tools on their Google research blog here.

Given the great piles of multi-dimensional tables (or arrays) of data machine learning typically involves, and (at least for us primitive users) the tremendous shovel work involved in massaging and pushing around these giant piles of data file (and sorting out the arcane naming schemes devised to try to help with this problem is almost a worse problem itself),

the appellation of “Tensor Flow” as a tool to help with this is at first blush very promising. That is, rather than just a library of mathmatical algorithm implementations, I am expecting something that can help make the machine learning work itself more manageable.

I suspect that just figuring out what this is will cost me a few days… but I have much to learn.



Advantages of BootStrap: multi-language support add-on

I was very impressed with how quick and easy it was to create a web site with “twitter bootstrap.js” — but only today did a fellow grad student teach me another benefit:  support for multi-language web sites.  [I found tutorials at very useful, and the one below basically had my site up and running in less than its the running time (it is well over an hour and designed for non-technical novices). The others from this teacher were on individual bootstrap 3 features, rather than the one-stop-shop provided below.]

There are many ways to do multi-lingual web sites, of course.  I was really intrigued with / impressed with EasyLing, which seemed to have a pretty painless and “standards-based” way to approach the problem (but probably overkill for many uses).  And I discovered a WordPress multiple-language plugin (?) or service / api, but it was apparently discontinued over a year ago… still it seemed there should be a better way.

And Wencheng Hu, an experienced translator and web developer, provided me the key solution element:  a complete BootStrap 3 solution for language labels and names, etc.  — documented here at  complete with flags icons (in three sizes) for the 43 languages it supports!

How well this works for a “web app” or even a “blog” on an ongoing basis, i am not sure.  probably a blog should just be done as separate blogs for a variety of reasons, but swapping out labels and UI elements in an application is often useful (if, for a variety of reasons, painful).  My 8-language support for my language learning” apps on iOS taught me a lot about the challenges of making an app work for 8 different audiences…

I am very interested to see the final details of my implementation for my new web site.  I will post the implementation particulars, as one-page bootstrap web sites are a very handy and useful tool for SMEs, and this neat trick certainly works quite well for a small number of languages. Further, popular and inexpensive hosting services support the necessary tools with ease and by default, by and large, so the follow up post should be quite short. And soon!

Status of Critical Thinking in the Workplace

Status of Critical Thinking in the Workplace –

the Most Important Skill for Business Growth

This blog post by Person is a welcome gesture highlighting Critical Thinking, both to highlight its importance to actual work and business, and to call this to the attention of Higher Ed.

Tidbits:  Specifically, when it comes to skills like critical thinking, it is consistently rated by employers as being a skill of increasing importance, and yet a recent study showed 49% of employers rate their employees’ critical thinking skills as only average or below average.

The graphic also was interesting in its display of how 2 year grads compared with 4 year grads on the measure of “critical thinking skills”: 4 year college graduates CT-skills-by-college-group-0325-0144

were more likely to be rated “excellent” for critical thinking than 2 year grads — 28 vs 4% — but for “adequate” skill level, community college graduates fared better with  73% to 63%.  Overall the recent of 2 year grads rated as deficient, but I could help but wonder if the there was a bias in the reported scores for 2 year grads (e.g, graded on a curve, as this curve looked a bit strange.

This blog will feature more references to Critical Thinking going forward, as my research into better instruction and measurement for Critical Thinking and Problem Solving skills makes progress.