Big
Data, at all sorts oflevels in learning, reveals secrets we never imagined we could discover. It reveals things to you
the user, searcher, buyer and learner. It also reveals thing about you to the
seller, ad vendors, tech giants and educational institutions. Big data is now
big business, where megabytes mean megabucks. Given that less 2% of all
information is now non-digital, it is clear where the data mining will
unearth its treasure- online. As we do more online, searching, buying, selling,
communicating, dating, banking, socializing and learning, we create more and
more data that provides fuel for algorithms that improve with big numbers. The
more you feed these algorithms the more useful they become.
Among the fascinating examples, is Google’s success with big data in
their translation service, where a trillion word data-set provides the feed for
translations between over a dozen languages. Amazon’s recommendation engine
looks at what you bought, what you didn’t buy, how long you looked at things
and what books are bought together. This big data driven engine accounts for a
third of all Amazon sales. With Netflix, their recommendation engine accounts
for an astonishing three quarters of all new orders. Target, the US
retailer, know (creepily) when someone is pregnant without the mother-to-be
telling them. This led to an irate father threatening legal action when his daughter
received a mail voucher for baby clothes. He returned a few days later,
sheepishly apologizing!
Why is Big Data such a big deal in learning?
Online learning, by definition, is data, it can also
produce data. This is one of the great advantages of being online, that it is a
two-way form of communication. For many years data has been gathered and used
in online learning. De facto
standards even emerged making this data interoperable, namely SCORM and now
TinCan.
However, something new has happened, the
awareness that the data produced by online learning is much more powerful than
we ever imagined. It can be gathered and used to solve all sorts of difficult
problems in learning, problems that have plagued education and training –
formative assessment, drop-out, course improvement, productivity, cost
reduction and so on.
Learning = Large data
So how relevant is big data to learning? We need to start with an
admission, that big data in learning is really just ‘Large data’. We’re not
dealing with the unimaginable amounts of relevant data that Google bring to
bear when you search or translate. The datasets we’re talking about come from
individual learners, courses, individual institutions and sometimes, but rarely
from groups of institutions, national tests and examinations and rarer still,
from international tests or large complexes of institutions.
Ten level taxonomy of data
Data
can be harvested at 10 different levels:
1. Data on brain
We’ve seen the commercial launch
of some primitive toys using brain sensors (see my previous post) but we’ve yet to see
brain and situation really hit the world of learning. Learning is wholly about
changing the brain, so one would expect, at some time, for brain research to
accelerate learning through cheap, consumer brain and body based technology. S
Korea is developing software and hardware that may profoundly change the way we
learn. With the development of an
’emotional sensor set’ that measures EEG, EKG and, in total, 7 kinds
of biosignals, along with a situational sensor set that measures temperature,
acceleration, Gyro and GPS, they want to literally read our brains and bodies
to accelerate learning. There are
problems with this approach as it’s not yet clear that the EEG and other brain
data, gathered by sensors measure much more than ‘cognitive noise’ and general
increases in attention or stress, and how do we causally relate these
physiological states to learning, other than the simple reduction of stress.
The measures are like simple temperature gauges that go up and down. However,
the promise is that a combination of these variables does the job.
2. Data on learner
This is perhaps the most fruitful
type of data as it is the foundation for both learners and teachers to improve
the speed and efficacy of learning. At the simplest level one can have
conditional branches that take input from the learner and other data sources to
branch the course and provide routes and feedback to the learner (and teacher).
Beyond this rule-sets and algorithms can be used to provide much more
sophisticated systems that present, screen-by-screen, the content of the
learning experience. There are many ways in which adaptive learning can be
executed. See this paper from Jim Thompson on Types on Adaptive Learning. In adaptive
learning systems, the software acts as a sort of satnav, in that it knows who
you are, what you know, what you don’t know, where you’re having difficulty and
a host of data about other, useful learner-specific variables. These variables
can be used by the software, learner or teacher to improve the learning
journey.
3. Data on course components
One can look at specific learning experiences components in a course,
such as video, use of forums, specific assessment items and so on. Peter Kese of Viidea is an expert in the analytics from recorded
lectures and his results are fascinating. Gathering data from recorded lectures
improves lectures, as one can spot the points at which attention drops and
where key images, points and slides raise attention and keep the learners
engaged. When Andrew Ng, the founder of
Coursera, looked at the data from his ‘Machine Learning’ MOOC, he noticed that
around 2000 students had all given the same wrong answer – they had inverted
two algebraic equations. What was wrong, of course, was the question. This is a
simple example of an anomaly in a relatively small but complete data set that
can be used to improve a course. The next stage is to look for weaknesses in
the course in a more systematic way using algorithms designed to look
specifically for repeatedly failed test items. At this level we can pinpoint
learner disengagement, weak and even erroneous test items, leading to course
improvement. At a more sophisticated level, in a networked learning solution
where the learning experiences are presented to the learner based on algorithms,
screen-by-screen, items can be promoted or demoted within the network.
4. Data on course
A course can produce data that also shows weak spots. It can also show
dropout rates and perhaps indications of the cause of those dropouts. One can
gather pre-course data about the nature of the learners (age, gender,
ethnicity, geographical location, educational background, employment profile
and existing competences). During the course time taken on tasks, note taking,
when learning takes place and for how long. Physiological data such as eye
tracking and signals from the brain. This pre-course or initial diagnostic data
can be used to determine what is presented in the course. At a more
sophisticated level, it can be used as the course progresses, much as a satnav
provides continuous data when you drive. Course output data from summative
assessment is also useful, however, the big data approach pushes us towards not
relying solely on this as was so often the case in the past. This is important
for two reasons the learner themselves, knowing what they’ve achieved, not
achieved, and the tutor, teacher, trainer, who can use personal data to provide
formative assessment, interventions and advice based on such data. In this
sales course for a major US retailer, sales staff are given sales training in a
3D simulation which delivers sales scenarios with a wide range of customers and
customer needs. Individual competences are taught, practiced and tracked, so
that the actual performance of the learners is measured within the simulation.
Sales in the stores where staff received the simulation training were 6% greater
than the control group who did traditional training. This is a good example of
fine-grained data being gathered
5. Data on groups of courses
MOOCs, in particular, have raised the stakes in data-driven design and
delivery of courses. In truth, less data is gathered about learners than one
would imagine by the likes of Coursera and Udacity but MOOC mania has
accelerated the interest in data-driven reflection. The
University of Edinburgh have produced a data-heavy report on their six 2013
Coursera MOOCs taken by over 300,000 learners. The report has good data, tries
to separate out active learners from window shoppers and not short on
surprises. It’s a rich resource and a follow up report is promised. This is in
the true spirit of Higher Education – open, transparent and looking to innovate
and improve. Rather than summarise the report, I’ve plucked out the Top Ten surprises,
that point towards the future development of MOOCs. If I were looking at MOOCs,
I’d pour over this data carefully. That, combined with the useful information
on resources expended by the University, is an invaluable business planning
tool. Lori Breslow,
Director of MIT Teaching and Learning Laboratory has looked at data generated by
MOOC users provide clues on how to design the future of learning using massive data
from “Circuits and Electronics” (6.002x), edX’s MOOC, launched in March
2012 which includes IP addresses of 155,000 enrolled students, clickstream data
on each of the 230 million interactions students had with platform, scores on
homework assignments, labs, and exams, 96,000 individual posts on a discussion
forum and an end-of-course survey to which over 7,000 students responded.
6. Data on institution
At this organizational level, it is vital that institutions gather
data that is much more fine-grained than just assessment scores and numbers of
students who leave. Many institutions, arguably most have problems with
drop-outs, either across the institution or on specific courses. One way to tackle
this issue is to gather data to identify deep root causes, as well as spot
points at which interventions can be planned.
7. Data on groups of institutions
Perhaps
we should be a bit realistic about the word ‘big’ in an educational context, as
it is unlikely that many, other than a few large multinational, private
companies will have the truly ‘big’ data. Skillsoft, Blackboard, Laureate and
others may be able to muster massive data sets, but a typical school, college
or university may not. The MOOC
providers, such as Coursera and Udacity are another group that have the ability
and reach to gather significantly large amounts of data about learners.
8. Data on national
National
data is gathered by Governments and organisations to diagnose problems and
successes and reflect on whether policies are working. This is most often input
data, such as numbers of students applying for courses and who those students
are and so on. Then there’s output data, usually measured in terms of exams and
certification. This misses much, in terms of actual improvement and often leads
to an obsession with testing that takes attention away from the more useful
data about the processes of learning and teaching.
9. Data on international
At international leve the United Nations, UNESCO and others collect
data, such as PISA, PIAC and OECD data, produced to compare countries
performance. It is not at all clear that this data is as reliable as its
authors claim. Within countries politicians then take these statistics,
exaggerate their significance, cherry-pick the comparative countries (Singapore
but not Finland) and use it to design and implement policies that can,
potentially do great harm. PISA, for
example, has huge differences in demographics, socio-economic ranges and
linguistic diversity within the tested nations. The skews in the data, include
the selection of one flagship city (Shanghai) to compare against entire
nations. Immigration skews include numbers of immigrants, effect of selective
immigration, migration towards English speaking nations, and first-generation
language issues. There’s also the issue of taking longer to read irregular
languages and selectivity in the curriculum. (see Leaning Tower ofPISA: 7 skews)
10. Data on web
Google, Amazon, Wikipedia, YouTube, Facebook and others gather huge
amounts of data from users of their services, This data is then used to improve
the service. Indeed, I have argued that Google search, Google translate, Wikipedia
Amazon and other services now play an important pedagogic role in real
learning. There are lessons here for education in terms of the importance of
data. One should always be looking to gather data on online learning and Google
Analytics is a wonderful tool.
Conclusion
Big
data is changing learning by providing a sound basis for learners, teachers,
managers and policy makers to improve their systems. Too much is hidden so more
and more open data is needed. Data must
be open. Data must be searchable. Data must also be governed and managed. There
is also the issue of visualization. Big data is about decision making by the
learner, teacher or at an organizational, national or international level and
must be understood through visualization. However, data is also being used to do
great harm. Big data in the hands of small minds can be dangerous (see When Big Data goes bad: 6 Epic fails).
10 comments:
What is interesting is that LMSs track data about learners in the main to monitor progress, completions etc., whereas Google, Amazon etc.,track data so they can adapt the service they provide to you and serve more relevant content. This includes ads of course but they do personalise your experience eg Google provides you with personalised search results, based on history, location and even what your friends like or plus one. It seems to me that in learning need to look at way the web is creating adaptive and personalised experiences by tracking and analysing data. Lots of potential as you have set out.
Exactly Steve. The Big Data revolution is not the 'bums on seats' stuff, primtive inputs and outputs, but personalised data that improves the efficacy of learning.
Is part of the problem the fact that L&D don't understand the small data enough yet?
The data which Steve refers to is often manipulated to show our 'learners' are happy, know 45% more than they did at the start, or that they have completed the mandatory modules.
This idea of showing 'busyness' over business will keep L&D away from any top tables and a forelock tugging provider of trainign whilst the business learns from itself and the big data it's creating.
Very true. There's Big Data, Wrong data & Bad data
Hello Donals and Steve, I am from Colombia and I'm Reading and getting to know and understand what a Mooc is, I have taken a mooc in Coursera and some others with other plataforms but, now I have realized that the so called MOOC has a lot of theory. I would like you to help me please to understand whether the C an X MOOCs are clasified within this taxonomy you discussed here or the X and Clasification is different. Thanks a lot and, please help me to understand.
Too Good article,thank you for your valuable info..
Big Data and Hadoop Online Training
You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!
Data Science Training in Hyderabad
Great Content & Thanks For Sharing With oflox. Do You Want To Know How To Make Money From Mitron App
I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
data science course in hyderabad with placements
Set aside me effort to see the entirety of the remarks, yet I genuinely appreciated the review. It demonstrated being truly useful to me and Im positive to the entirety of the analysts here! Its continually pleasant when you can not exclusively be educated, yet in addition engaged! I am sure you had pleasant composing this review. data scientist training and placement
Post a Comment