Data Science & Big Data Analytics : March 2014

Chancellor George Osborne announced yesterday the creation of a national institute for big data, set to be named after computer pioneer and mathematician Alan Turing.

In his 2014 Budget statement, Osborne pledged £42 million over five years for the new Alan Turing Institute.

He said that it would undertake new research in methods of collecting, organising and analysing large sets of data.

Documents released alongside the Budget statement said: “Big Data analysis can allow businesses to enhance their manufacturing processes, target their marketing better, and provide more efficient services.”

A tender to run the institute will be released later this year, and a spokesperson for the Treasury said that it could either be housed in a new facility or at an existing university.

Funding will be provided by the Department for Business, Innovation and Skills with its chief reporting to science minister David Willetts.

During World War II, Turing worked at the Government Code and Cipher School at Bletchley Park – the forerunner of GCHQ – devising the techniques which cracked the German Enigma code.
He killed himself in 1954, two years after being convicted of homosexual acts – illegal at the time.
Last year he was granted a Royal Pardon.

Duncan Higgins of Virgin Media Business, said he was concerned at the lack of other references to digital technology in today’ Budget announcement.

He said: “To use Mr Osborne’s words, if we want to “outsmart the rest of the world”, we need to be more switched on to the power of digital.

“And given that Britain’s digital economy will be 10% of GDP by 2016, it's surprising that there was just one mention of the word “technology” and no mention at all of “digital” in the Chancellor’s Budget.”

By Ben Medlock, SwiftKey

Buzz phrases such as “artificial intelligence,” “machine learning” and “natural language processing” are becoming increasingly commonplace within the tech industry. There is a lot of ambiguity around these phrases, so I’ll explain the substance behind the technologies and why I believe they’re transforming the way we live, work and play.

When Ben graduated from the University of Cambridge in 2007, I left with a compelling sense that the technology I’d been working with for the last five years had the potential to change the world. I’d recently completed a Ph.D. on the application of a new set of tools and techniques from the emerging field of Machine Learning (ML) to a range of tasks involving human languages — a field known as Natural Language Processing (NLP). If this sounds confusing, I’m not surprised! Many of the concepts are inherently complex. However, to try and make things clearer, ML is about building software capable of learning how to perform tasks that are too complex to be solved via traditional programming techniques. For example, during my research I built programs that were able to recognize topics in news text, grade essays, and filter spam email. When the tasks are language focused, we call it NLP.

This represents a fundamental shift in the way software engineers build complex systems. Historically, coding has been about distilling the expert knowledge of the programmer into a series of logical structures that cause the system to respond in predictable ways. For instance, accounting systems follow rules, encoded by software engineers, that automate the process of recording and managing accounts. However, many of the tasks we come up against in our information-saturated digital world require a level of sophistication that can’t be captured in a series of human-engineered logical rules. For instance, if I’m building a system to translate a sequence of text from one language into another, there’s no manageable set of rules I can encode that will solve that problem. However, if I create a framework that allows the software to learn from examples of previously translated sequences to make new translations, then the problem can be solved, at least in principle. In other words, the system distills the expertise it needs to complete the task from the data upon which it’s trained, rather than directly from the programmer, whose authorial role has now fundamentally changed. Evidently this new way of creating complex systems requires a lot of data, but happily the amount of available electronic data for training ML systems is growing at an irrepressible rate.

It may be clear that such systems have potentially profound philosophical implications for their authors. They cause us to question commonly held definitions of understanding, intelligence and even free will. To take a simple example from my own experience, when building an ML system to grade essays, does it matter that the machine doesn’t “understand” the content of the essay in the same way a human being would? If you can demonstrate mathematically that the system is as reliable as an expert examiner, does it matter that the method by which it determines grades is based on subtle interactions between thousands of underlying “features”, without an overseeing sentient mind? What role does sentience actually play in the tasks most of us carry out on a daily basis anyway?

Whatever the philosophical implications, software built around these new technologies is changing our lives, even if we don’t yet know it, and I believe this transformation heralds good news for us as consumers and citizens. These new systems will enable our personal devices to better adapt and anticipate what we need, right down to an individual level. The days of the generic tech experience are numbered. People will expect something completely tailored to them, from text-prediction algorithms that understand the context of what you’re writing to concierge systems that learn to preempt what you want to find, say or do next. In 20 years I believe we’ll be surrounded by invisible systems that mine a wealth of data about every aspect of our lives, constantly learning, adapting and enhancing our decision making, health and general wellbeing.

There are downsides, of course. Data privacy and protection must be taken extremely seriously, and people are understandably wary of computers that can “think” and learn like humans. If algorithms start taking on the roles of teachers, personal assistants and others, does this distance us from each other? I believe we need to wrestle with these questions honestly and openly, and that the debate will ultimately lead us to a better understanding of what it means to be human in a technological world. Academic-sounding ideas like ML and NLP have clear implications for the tech industry and the way we live that extend far beyond our universities and research labs.

Data Science & Big Data Analytics

About Me

Thursday 20 March 2014

Budget 2014: Data institute unveiled but digital largely overlooked

Wednesday 19 March 2014

Get Your Big Bad Wolf On Microsoft Windows Azure

Monday 10 March 2014

The Beginning of Forensic Science for me

A plain English guide to how natural language processing will transform computing