Broadgate Big Data Dictionary Part One

Posted on : 26-07-2012 | By : richard.gale | In : Data

Tags: , , , , , , , , , , , , , , , ,

2

We have been interested in Big Data concepts and technology for a while. There is a great deal of interest and discussion with our clients and associates on the subject of obtaining additional knowledge & value from data.

As with most emerging ideas there are different interpretations and meanings for some of the terms and technologies (including the thinking that ‘big data’ isn’t new at all but just a new name for existing methods and techniques).

With this in mind we thought it would be useful to put together a few terms and definitions that people have asked us about recently to help frame Big Data.

We would really like to get feedback, useful articles & different views on these to help build a more definitive library of Big Data resources.  We’ve started with a few basic terms and next month with cover some of the firms developing solutions – this is just a starting point…

Analytics 

Big Data Analytics is the processing and searching through large volumes of unstructured and structured data to find hidden patterns and value. The results can be used to further scientific or commercial research, identify customer spending habits or find exceptions in financial, telemetric or risk data to indicate hidden issues or fraudulent activity.

Big Data Analytics is often carried out with software tools designed to sift and analyse large amounts of diverse information being produced at enormous velocity. Statistical tools used for predictive analysis and data mining are utilised to search and build algorithms.

Big Data

The term Big Data describes amounts of data that are too big for conventional data management systems to handle. The volume, velocity and variety of data overwhelm databases and storage. The result is that either data is discarded or unable to be analysed and mined for value.

Gartner has coined the term ‘Extreme Information Processing’ to describe Big Data – we think that’s a pretty good term to describe the limits of capability of existing infrastructure.

There has always been Big Data in the sense that data volumes have always exceeded the ability for systems to process it. The tool sets to store & analyse and make sense of the data generally lag behind the quantity and diversity of information sources.

The actual amounts and types of Big Data this relates to is constantly being redefined as database and hardware manufacturers are constantly moving those limits forward.

Several technologies have emerged to manage the Big Data challenge. Hadoop has become a favourite tool to store and manage the data, traditional database manufacturers have extended their products to deal with the volumes, variety and velocity and new database firms such as ParAccel, Sand & Vectorwise have emerged offering ultra-fast columnar data management systems. Some firms, such as Hadapt, have a hybrid solution utilising tools from both the relational and unstructured world with an intelligent query optimiser and loader which places data in the optimum storage engine.

Business Intelligence

The term Business Intelligence(BI) has been around for a long time and the growth of data and then Big Data has focused more attention in this space. The essence of BI is to obtain value from data to help build business benefits. Big Data itself could be seen as BI – it is a set of applications, techniques and technologies that are applied to an entities data to help produce insight and value from it’s data.

There are a multitude of products that help build Business Intelligence solutions – ranging from the humble Excel to sophisticated (aka expensive) solutions requiring complex and extensive infrastructure to support. In the last few years a number of user friendly tools such as Qlikview and Tableau have emerged allowing tech-savvy business people to exploit and re-cut their data without the need for input from the IT department.

Data Science

This is, perhaps, the most exciting area of Big Data. This is where the Big Value is extracted from the data. One Data Scientist partner of ours described as follows: ” Big Data is plumbing and that Data Science is the value driver…”

Data Science is a mixture of scientific research techniques, advance programming and statistical skills (or hacking), philosophical thinking (perhaps previously known as ‘thinking outside the box’) and business insight. Basically it’s being able to think about new/different questions to ask, be technically able to intepret them into a machine based format, process the result, interpret them and then ask new questions based from the results of the previous set…

A diagram by blogger Drew Conway  describes some of the skills needed – maybe explains the lack of skills in this space!

In addition Pete Warden (creator of the Data Science Toolkit) and others have raised caution on the term Data Science “Anything that needs science in the name is not a real science” but confirms the need to have a definition of what Data Scientists do.

Database

Databases can generally be divided into structured and unstructured.

Structured are the traditional relational database management systems such as Oracle, DB2 and SQL-Server which are fantastic at organising large volumes of transactional and other data with the ability to load and query the data at speed with an integrity in the transactional process to ensure data quality.

Unstructured are technologies that can deal with any form of data that is thrown at them and then distribute out to a highly scalable platform. Hadoop is a good example of this product and a number of firms now produce, package and support the open-source product.

Feedback Loops

Feedback loops are systems where the output from the system are fed back into it to adjust or improve the system processing. Feedback loops exist widely in nature and in engineering systems – think of an oven – heat is applied to warm to a specific temperature and is measured by a thermostat – once the correct temperature is reached the thermostat informs the heating element and it shuts down until feedback from the thermostat says it is getting too cold and it turns on again… and so on.

Feedback loops are an essential part of extracting value from Big Data. Building in feedback and then incorporating Machine Learning methods start to allow systems to become semi-autonomous, this allows the Data Scientists to focus on new and more complex questions whilst testing and tweaking the feedback from their previous systems.

Hadoop

Hadoop is one of the key technologies to support the storage and processing of Big Data. Hadoop emerged from Google and its distributed Google File System and Mapreduce processing tools. It is an open source product under the Apache banner but, like Linux, is distributed by a number of commercial vendors that add support, consultancy and advice on top of the products.

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

So Hadoop could almost be seen as a (big) bucket where you can throw any form and quantity of data into it and it will organise and know where that data resides and can retrieve and process it. It also accepts that there may be holes in the bucket and can patch them up by using additional resources to patch itself up – all in all very clever bucket!!

Hadoop runs on a scheduling basis so when a question is asked it breaks up the query and shoots them out to different parts of the distributed network in parallel and then waits and collates the answers.

 

We will continue this theme next month and then start discussing some of the technology organisations involve in more detail, such as covering Hive, Machine Learning, MapReduce, NoSQL and Pig.

 

Technology process and control frameworks – Time to Modernise

Posted on : 26-07-2012 | By : john.vincent | In : General News

Tags: , , , , , ,

2

It seems that the cuts are never ending. Only last week we heard of more reductions in staff headcount at a number of financial institutions as organisations continue to react to an erosion of business revenues and consequently further pressure on costs. The absolute number of city jobs lost is difficult to get (even with a reasonable tolerance margin), but regardless, it’s a lot.

Of course, removing people isn’t the only pressure. Infrastructure costs continue to be re-evaluated. Do we really need what would have been a previously routine upgrade to that front office trading system? What about the global finance system that we’ve always desired? New data centre anyone?…Maybe not this decade.

But whilst we (arguably) “return to reality” in the world of technology infrastructure and application services, what about the realignment of human capital? Despite a few well publicised missives during the cuts in technology resources, the reductions have gone largely unnoticed (unless you are unfortunate enough to be on the receiving end). But, like all of financial services, it does have the feel of a “numbers game”. Times are good, then Buy IT…times are bad, Sell IT.

Of course, there has been significant delayering of technology organisations, a removal of non-critical or discretionary/change functions and a focus on only what is designated mandatory (i.e. regulatory change). What next? We’ve touched previously on areas such as cloud service delivery and outsourcing, so let’s leave that for now. No, in order to both address the inevitable future efficiency demands, and also to build a platform for growth, technology organisations need to revisit the fundamental inner workings of their DNA.

Over the years we have all been schooled in the needs for good control frameworks and processes, such as ITIL, COBIT and CMMi. The table below list some of these and the historical timeline.

However, often technology organisations can go way too far down the adopting of such processes, creating over complication, reduced clarity and unwarranted resource overhead (and therefore cost). In March of last year we wrote an article called A Framework for Success in which we explored the need for a Quality Management Framework rather than an all-encompassing and embedded methodology.

The problem is that is some ways a “cottage industry” has developed around the whole control framework piece. In particular, some organisations and individuals have almost turned frameworks such as ITIL into a religion. To be clear, we do support ITIL and Service Management will play a big part as the delivery of technology services shifts to multiple execution venues in the coming years (see ITSM and the Cloud).

However, we strongly believe that over the last 10-15 years, what we have seen is an over complication and adoption of some of these methodologies, leaving the original purpose and value to the organisation way behind.

It is fairly straight forward…the main issue is that there is a direct correlation between the level of process adoption/maturity and the organisation to support it. This has spawned a new breed roles, responsibilities and job titles such as Environment Manager, Release Manager, Service Introduction Manager, Service Protection Manager etc… By doing this, an operating model of increased complexity and interdependence is created which, if left to grow organically, can become unwieldy, cumbersome and create processing inefficiencies. If you have gone down this path, ask yourself, is it clear on the value to the organisation in terms of business service? accountability? and level of FTE’s supporting?

As we have stated in our previous articles, organisations should apply techniques such as LEAN (or to be honest, simple practicality) to streamline and remove wastage in the implementation of these control frameworks, be that process, people or technology.  This is a tough job…particularly given the cultural and emotional issues of dismantling what is often somewhere between a “favourite son” or a “safety net”.

However, if implemented properly, pragmatically and with a degree of realism, it will not only drive short term efficiencies but provide a much need alignment for the future service delivery when the upturn in demand returns.

 

The new Technology Pioneers

Posted on : 25-07-2012 | By : john.vincent | In : Cloud, Finance

Tags: , , , , , , , , ,

0

The delivery of technology services is going through a massive change. This is driven by a number of factors, such as the commoditisation of infrastructure and applications, cloud computing, the rise of the consumer, social media, data analytics…the list goes on.

One thing is becoming clear. Technology innovation is on the up...business leaders are looking to the new breed of entrepreneurs/companies and seeing both competitive advantage, investment opportunity and in some cases (such as in financial services), a threat to their traditional business. So what about the people behind this new revolution?

We’ve been espousing about innovation for many years (see our previous article about structuring in the enterprise). Over several decades companies have adopted new technology to power business development, both from the front office application side and the underlying infrastructure (bringing with it new, and often lucrative careers, on the client and supply side).

In retrospective though, how many of the projects the client-side technology organisations worked on would we post into the box labelled “innovation” ? Sure, there were significant change projects, application upgrades, operating model adjustments, consolidation, efficiency drives, sourcing etc… which were high impact, but innovative? Bar a few, we would categorise as more “evolutionary”.

Could more have been achieved…probably not. Remember that for years technology organisations have been both an enabler and a cost centre. In the organisations value chain they have traditionally not connected closely enough with the business revenue side, apart from via “Relationship Managers” who help to translate business requirements into a portfolio of change. It needs to be a discussion on a more equal footing.

Back to the new breed. We have recently been attending various “meet ups” around the city, covering topics from Big Data to new start-ups seeking VC support. What struck us is two things;

  1. Firstly, the amount of talented individuals (mainly young) with great ideas and lack of fear. They tend to have a much shorter timeline for achieving a result and will try out innovative ideas in a rapid/agile way…if it doesn’t work they “fail fast” and move on.
  2. Secondly, we met a number of entrepreneurs who spent very little time in the corporate life before embarking on their new venture. Some worked at the most prestigious consultancies such as McKinsey, others within coveted investment banks…many were educated at the top universities. However, all had decided they could do achieve more themselves.

So what does this mean for those of us struggling to do more with less within corporate IT? Well, of course, all new technologies will need to be implementation and support, plus organisations are faced with the practical application of these with respect to solving specific business challenges, rather than theoretical solutions.

But the most important thing to understand is to have an awareness that technology services and the people either starting out, or on the peripheries, are changing in both skills and aspiration. Whilst organisations cutting resource and focusing on either regulation or keeping the lights on, there is a danger that the future skills requirements and evolving technology will pass by. This is a huge risk to incumbent service providers.

There is significant investment underway all over the city in new technology, not just in the Silicon Roundabout area but throughout with new tech companies, innovation “hangouts” and a myriad of collaborative events. Indeed, only this week it was announced that the Olympic Press and Broadcast Centres could be turned into a technology hub, generating thousands of jobs in the IT industry and investing £340m into East London.

Technology is cool again…whilst the media bashes the bankers for their bonuses and various indiscretions, how much noise was there about a potential $100m for Marissa Mayer taking the helm at Yahoo, or the $1bn paid for Nicira by VMware who have operated largely in stealth mode, only really shipping product last year (very smart and potentially disruptive network technology), or the numerous technology executives dominating the billionaire list.

Where the technology empowerment and business convergence will lead us is still hazy, but it’s exciting times for those at the frontier.