GDPR – The Countdown Conundrum

Posted on : 30-01-2018 | By : Tom Loxley | In : Cloud, compliance, Cyber Security, data security, Finance, GDPR, General News, Uncategorized

Tags: , , , , , , , , , , , , ,

0

Crunch time is just around the corner and yet businesses are not prepared, but why?

General Data Protection Regulation (GDPR) – a new set of rules set out from the European Union which aims to simplify data protection laws and provide citizens across all member states with more control over their personal data”

It is estimated that just under half of businesses are unaware of incoming data protection laws that they will be subject to in just four months’ time, or how the new legislation affects information security.

Following a government survey, the lack of awareness about the upcoming introduction of GDPR has led to the UK government to issue a warning to the public over businesses shortfall in preparation for the change. According to the Digital, Culture, Media and Sport secretary Matt Hancock:

“These figures show many organisations still need to act to make sure the personal data they hold is secure and they are prepared for our Data Protection Bill”

GDPR comes into force on 25 May 2018 and potentially huge fines face those who are found to misuse, exploit, lose or otherwise mishandle personal data. This can be as much as up to four percent of company turnover. Organisations could also face penalties if they’re hacked and attempt to hide what happened from customers.

There is also a very real and emerging risk of a huge loss of business. Specifically, 3rd-party compliance and assurance is common practice now and your clients will want to know that you are compliant with GDPR as part of doing business.

Yet regardless of the risks to reputation, potential loss of business and fines with being non-GDPR compliant, the government survey has found that many organisations aren’t prepared – or aren’t even aware – of the incoming legislation and how it will impact on their information and data security strategy.

Not surprisingly, considering the ever-changing landscape of regulatory requirements they have had to adapt to, finance and insurance sectors are said to have the highest awareness of the incoming security legislation. Conversely, only one in four businesses in the construction sector is said to be aware of GDPR, awareness in manufacturing also poor. According to the report, the overall figure comes in at just under half of businesses – including a third of charities – who have subsequently made changes to their cybersecurity policies as a result of GDPR.

If your organisation is one of those who are unsure of your GDPR compliance strategy, areas to consider may include;

  • Creating or improving new cybersecurity procedures
  • Hiring new staff (or creating new roles and responsibilities for your additional staff)
  • Making concentrated efforts to update security software
  • Mapping your current data state, what you hold, where it’s held and how it’s stored

In terms of getting help, this article is a great place to start: What is GDPR? Everything you need to know about the new general data protection regulations

However, if you’re worried your organisation is behind the curve there is still have time to ensure that you do everything to be GDPR compliant. The is an abundance of free guidance available from the National Cyber Security Centre and the on how to ensure your corporate cybersecurity policy is correct and up to date.

The ICO suggests that, rather than being fearful of GDPR, organisations should embrace GDPR as a chance to improve how they do business. The Information Commissioner Elizabeth Denham stated:

“The GDPR offers a real opportunity to present themselves on the basis of how they respect the privacy of individuals, and over time this can play more of a role in consumer choice. Enhanced customer trust and more competitive advantage are just two of the benefits of getting it right”

If you require pragmatic advice on the implementation of GDPR data security and management, please feel free to contact us for a chat. We have assessed and guided a number of our client through the maze of regulations including GDPR. Please contact Thomas.Loxley@broadgateconsultants.com in the first instance.

 

The Ultimate Way to Move Beyond Trading Latency?

Posted on : 30-03-2016 | By : richard.gale | In : Finance, Innovation

Tags: , , ,

0

A number of power surges and outages have been experienced in the East Grinstead area of the UK in recent months. Utility companies involved have traced the cause to one of three  high capacity feeds to a Global Investment bank’s data centre facility.

The profits created by the same bank’s London based Propriety Trading group has increased tenfold in the same time.

This bank employs 1% of the world’s best post-doctoral theoretical Physics graduates  to help build its black box trading systems.

Could there be a connection? Wild & unconfirmed rumours have been circulating within  the firm that a major breakthrough in removing the problem of latency – the physical limitation the time it takes a signal to transfer down a wire – ultimately governed by of the speed of light.

For years traders have been trying to reduce execution latency to provide competitive advantage in a highly competitive fast moving environment. The focus has moved from seconds to milli and now microsecond savings.

Many Financial Services & technology organisations have attempted to solve this problem through reducing  data hopping, routing, and going as far as placing their hardware physically close to the source of data (such as in an Exchange’s data centre) to minimise latency but no one has solved the issue – yet.

It sounds like this bank may have gone one step further. It is known that at the boundary of the speed of light – physics as we know it -changes (Quantum mechanics is an example where the time/space continuum becomes ‘fuzzy’). Conventional physics states that travelling faster than the speed of light and see into the future would require infinite energy and so is not possible.

Investigation with a number of insiders at the firm has resulted in an amazing and almost unbelievable insight. They have managed to build a device which ‘hovers’ over the present and immediate future – little detail is known about it but it is understood to be based on the previously unproven ‘Alcubierre drive’ principle. This allows the trading system to predict (in reality observe) the next direction in the market providing invaluable trading advantage.

The product is still in test mode as the effects of trading ahead of the data they have already traded against is producing outages in the system as it then tries to correct the error in the future data which again changes the data ad finitum… The prediction model only allows a small glimpse into the immediate future which also limits the window of opportunity for trading.

The power requirements for the equipment are so large that they have had to been moved to the data centre environment where consumption can be more easily hidden (or not as the power outages showed).

If the bank does really crack this problem then they will have the ultimate trading advantage – the ability to see into the future and trade with ‘inside’ knowledge legally. Unless another bank is doing similar in the ‘trading arms race’ then the bank will quickly become dominant and the other banks may go out of business.

The US Congress have apparently discovered some details of this mechanism and are requesting the bank to disclose details of the project. The bank is understandably reluctant to do this as it has spent over $80m developing this and wants to make some return on its investment.

If this system goes into true production mode surely it cannot be long before Financial Regulators outlaw the tool as it will both distort and ultimately destroy the markets.

The project even has a code-name…. Project “Prima Aprilis”

No one from the company was available to comment on the accuracy of the claims.

Broadgate’s Crystal Ball – Our predictions for 2016

Posted on : 18-12-2015 | By : richard.gale | In : General News

Tags: , , , , , , , , , ,

0

During the past few weeks, 2016 trend predictions have flooded our news feeds. After compiling and combining them with our view on the approaching changes, here’s Broadgate’s view on IT in 2016.

future

Adaptive Security Architecture

In the context of companies’ growing awareness of the importance of security and the need to build it into all business processes, end-to-end, Gartner predicts that the near future will bring more tools to go on the offensive, leveraging predictive modeling, for example, allowing apps to protect themselves (!). Therefore, go on offensive and build in security to every project, product, process and service, instead of treating it as an add on and an afterthought or having separate “security” projects.

 

IoT and Big Data Science

IoT will gradually overtake every-thing and generate data-rich insights about us. Gartner notes that the rapid growth in the number of sensors embedded in various technologies of both personal and professional use will lead to the generation of tons of intelligence on our daily patterns. The more ‘things’ and areas of our lives IoT takes over, the more data is going to be collected. According to Gartner, by 2020, the number of devices connected to the Internet is expected to reach 25 billion. As each year is moving us much closer to the IoT big data/even bigger insights reality, it will be challenging to find efficient ways of digging through and making sense of the constant generation of streams of data.

As we stated this time last year, talking about the ‘future’ of 2015 –  Loading large amounts of disparate information into a central store is all well and good but it is asking the right questions of it and understanding the outputs is what it’s all about. If you don’t think about what you need the information for then it will not provide value or insight to your business. We welcome the change in thinking from Big Data to Data Science.

 

Connected Devices

Our bodies are going to be increasingly connected to the Internet through smart devices within the next couple of years. This is reality, not Sci-Fi; those, who claim that wearables will struggle to find their place in everyday life in 2016, should familiarise themselves with the outcomes of Gartner’s October Symposium/ITxpo. It is predicted that in two years, 2 million employees, primarily those engaged in physically demanding or dangerous work, will be required to wear health & fitness tracking devices as a condition of employment (Gartner). According to a different source, in nine years, 70% of us are going to use wearables (IDC).

 

The Hybrid Cloud

Following our 2015 prediction of cloud becoming the default coming true, towards 2016 the integration of on-premises cloud infrastructure and the public cloud is becoming an operating standard; the demand for the hybrid cloud is growing at a rate of 27% (MarketsandMarkets). Google’s hire of Diane Greene, co-founder of VMware, to head up Google Cloud, shows Google’s commitment to offering services to enterprise cloud customers. A hybrid Kubernetes scheme is said to be part of the deal (Knorr, Infoworld), which will likely have a significant impact the growth of the hybrid cloud in 2016.

 

The outsourcing of personal data

Barely a week goes by without another retailer or bank losing customer information by getting hacked. This is becoming a serious and expensive problem for firms, each one is having to put complex defense mechanisms in place to protect themselves.

We think the outsourcing of responsibility (and sensitive data) to specialist firms will be a growing trend in 2016. These firms can have high levels of security controls and will have the processing ability to support a large number of clients.

Obviously one potential issue is that these organisations will be targeted by the criminals and when one does get breached it will have a much greater impact….

 

We are truly excited to see what 2016 will surprise us with!

Broadgate Predictions for 2015

Posted on : 29-12-2014 | By : richard.gale | In : Innovation

Tags: , , , , , , , , , , , ,

1

We’ve had a number of lively discussions in the office and here are our condensed predictions for the coming year.  Most of our clients work with the financial services sector so we have focused on predictions in these areas.  It would be good to know your thoughts on these and your own predictions.

 

Cloud becomes the default

There has been widespread resistance to the cloud in the FS world. We’ve been promoting the advantages of demand based or utility computing for years and in 2014 there seemed to be acceptance that cloud (whether external applications such as SalesForce or on demand platforms such as Azure) can provide advantages over traditional ‘build and deploy’ set-ups. Our prediction is that cloud will become the ‘norm’ for FS companies in 2015 and building in-house will become the exception and then mostly for integration.

Intranpreneur‘ becomes widely used (again)

We first came across the term Intranpreneur in the late ’80s in the Economist magazine. It highlighted some forward thinking organisations attempt to change culture, to foster,  employ and grow internal entrepreneurs, people who think differently and have a start-up mentality within large firms to make them more dynamic and fast moving. The term came back into fashion in the tech boom of the late ’90s, mainly by large consulting firms desperate to hold on to their young smart workforce that was being snapped up by Silicon Valley. We have seen the resurgence of that movement with banks competing with tech for the top talent and the consultancies trying to find enough people to fulfil their client projects.

Bitcoins or similar become mainstream

Crypto-currencies are fascinating. Their emergence in the last few years has only really touched the periphery of finance, starting as an academic exercise, being used by underground and cyber-criminals, adopted by tech-savvy consumers and firms. We think there is a chance a form of electronic currency may become more widely used in the coming year. There may be a trigger event – such as rapid inflation combined with currency controls in Russia – or a significant payment firm, such as MasterCard or Paypal, starts accepting it.

Bitcoins or similar gets hacked so causing massive volatility

This is almost inevitable. The algorithms and technology mean that Bitcoins will be hacked at some point. This will cause massive volatility, loss of confidence and then their demise but a stronger currency will emerge. The reason why it is inevitable is that the tech used to create Bitcoins rely on the speed of computer hardware slowing their creation. If someone works around this or utilises a yet undeveloped approach such as quantum computing then all bets are off. Also, perhaps more likely, someone will discover a flaw or bug with the creation process, short cut the process or just up the numbers in their account and become (virtually) very rich very quickly.

Mobile payments, via a tech company, become mainstream

This is one of the strongest growth areas in 2015. Apple, Google, Paypal, Amazon, the card companies and most of the global banks are desperate to get a bit of the action. Whoever gets it right, with trust, easy to use great products will make a huge amount of money, tie consumers to their brand and also know a heck of a lot more about them and their spending habits. Payments will only be the start and banking accounts and lifestyle finance will follow. This one product could transform technology companies (as they are the ones that are most likely to succeed) beyond recognition and make existing valuations seem miniscule compared to their future worth.

Mobile payments get hacked

Almost as inevitable as bitcoins getting hacked. Who knows when or how but it will happen but will not impact as greatly as it will on the early crypto-currencies.

Firms wake up to the value of Data Science over Big Data

Like cloud many firms have been talking up the advantages of big data in the last couple of years. We still see situations where people are missing the point. Loading large amounts of disparate information into a central store is all well and good but it is asking the right questions of it and understanding the outputs is what it’s all about. If you don’t think about what you need the information for then it will not provide value or insight to your business. We welcome the change in thinking from Big Data to Data Science.

The monetisation of an individual’s personal data results in a multi-billion dollar valuation an unknown start-up

Long Sentence… but the value of people’s data is high and the price firms currently pay for it is low to no cost. If someone can start to monetise that data it will transform the information industry. There are companies and research projects out there working on approaches and products. One or more will emerge in 2015 to be bought by one of the existing tech players or become that multi-billion dollar firm. They will have the converse effect on Facebook, Google etc that rely on that free information to power their advertising engines.

Cyber Insurance becomes mandatory for firms holding personal data (OK maybe 2016)

It wouldn’t be too far fetched to assume that all financial services firms are currently compromised, either internally or externally. Most firms have encountered either direct financial or indirect losses in the last few years. Cyber or Internet security protection measures now form part of most companies’ annual reports. We think, in addition to the physical, virtual and procedural protection there will be a huge growth in Cyber-Insurance protection and it may well become mandatory in some jurisdictions especially with personal data protection. Insurance companies will make sure there are levels of protection in place before they insure so forcing companies to improve their security further.

Regulation continues to absorb the majority of budgets….

No change then.

We think 2015 is going to be another exciting year in technology and financial services and are really looking forward to it!

 

Highlights of 2014 and some Predictions for 2015 in Financial Technology

Posted on : 22-12-2014 | By : richard.gale | In : Innovation

Tags: , , , , , , , , , , ,

0

A number of emerging technology trends have impacted financial services in 2014. Some of these will continue to grow and enjoy wider adoption through 2015 whilst additional new concepts and products will also appear.

Financial Services embrace the Start-up community

What has been apparent, in London at least, is the increasing connection between tech and FS. We have been pursuing this for a number of years by introducing great start-up products and people to our clients and the growing influence of TechMeetups, Level39 etc within the financial sector follows this trend. We have also seen some interesting innovation with seemingly legacy technology  – Our old friend Lubo from L3C offers mainframe ‘on demand’ and cut-price, secure Oracle databases an IBM S3 in the cloud! Innovation and digital departments are the norm in most firms now staffed with clever, creative people encouraging often slow moving, cumbersome organisations to think and (sometimes) act differently to embrace different ways of thinking. Will FS fall out of love with Tech in 2015 – we don’t think so. There will be a few bumps along the way but the potential, upside and energy of start-ups will start to move deeper into large organisations.

Cloud Adoption

FS firms are finally facing up to the cloud. Over the last five years we have bored too many people within financial services talking about the advantages of the cloud. Our question ‘why have you just built a £200m datacentre when you are a bank not an IT company?’ was met with many answers but two themes were ‘Security’ and ‘We are an IT company’…. Finally, driven by user empowerment (see our previous article on ‘user frustration vs. empowerment) banks and over financial organisations are ’embracing’ the cloud mainly with SaaS products and IaaS using private and public clouds. The march to the cloud will accelerate over the coming years. Looking back from 2020 we see massively different IT organisations within banks. The vast majority of infrastructure will be elsewhere, development will take place by the business users and the ‘IT department’ will be a combination of rocket scientist data gurus and procurement experts managing and tuning contracts with vendors and partners.

Mobile Payments

Mobile payments have been one of the discussed subjects of the past year. Not only do mobile payments enable customers to pay without getting their wallets out but using a phone or wearable will be the norm in the future. With new entrants coming online every day, offering mobile payment solutions that are faster and cheaper than competitors is on every bank’s agenda. Labelled ‘disruptors’ due to the disruptive impact they are having on businesses within the financial service industry (in particular banks), many of these new entrants are either large non-financial brands with a big customer-base or start-up companies with fresh new solutions to existing issues.

One of the biggest non-financial companies to enter the payments sector in 2014 was Apple. Some experts believe that Apple Pay has the power to disrupt the entire sector. Although Apple Pay has 500 banks signed up and there is competition from card issuers to get their card as the default card option under Apple devices, some banks are still worried that Apple Pay and other similar service will make their branches less important. If Apple chose to go into retail banking seriously by offering current accounts then the banks would have plenty more to worry them.

Collaboration

The fusion of development, operations and business teams to provide agile, focussed solutions has been one of the growth areas in 2014. The ‘DevOps’ approach has transformed many otherwise slow, ponderous IT departments into talking to their business & operational consumers of their systems and providing better, faster and closer-fit applications and processes. This trend is only going to grow and 2015 maybe the year it really takes off. The repercussions for 2016 are that too many projects will become ‘DevOpped’ and start failing through focussing on short term solutions rather than long term strategy.

Security

Obviously the Sony Pictures hack is on everyone’s mind at the moment but protection against cyber attack from countries with virtually unlimited will, if not resources, is a threat that most firms cannot protect against. Most organisations have had a breach of some type this year (and the others probably don’t know it’s happened). Security has risen up to the boardroom and threat mitigation is now published on most firms annual reports. We see three themes emerging to combat this.

– More of the same, more budget and resource is focussed on organisational protection (both technology and people/process)
– Companies start to mitigate with the purchase of Cyber Insurance
– Governments start to move from defence/inform to attacking the main criminal or political motivated culprits

We hope you’ve enjoyed our posts over the last few years and we’re looking forward to more in 2015.

Twitter.com/broadgateview

 

 

Broadgate Big Data Dictionary

Posted on : 28-10-2014 | By : richard.gale | In : Data

Tags: , , , , , , , , , , ,

0

A couple of years back we were getting to grips with big data and thought it would be worthwhile putting a couple of articles together to help explain what the fuss was all about. Big Data is still here and the adoption of it is growing so we thought it would be worthwhile updating and re-publishing. Let us know what you think?

We have been interested in Big Data concepts and technology for a while. There is a great deal of interest and discussion with our clients and associates on the subject of obtaining additional knowledge & value from data.

As with most emerging ideas there are different interpretations and meanings for some of the terms and technologies (including the thinking that ‘big data’ isn’t new at all but just a new name for existing methods and techniques).

With this in mind we thought it would be useful to put together a few terms and definitions that people have asked us about recently to help frame Big Data.

We would really like to get feedback, useful articles & different views on these to help build a more definitive library of Big Data resources.

Analytics 

Big Data Analytics is the processing and searching through large volumes of unstructured and structured data to find hidden patterns and value. The results can be used to further scientific or commercial research, identify customer spending habits or find exceptions in financial, telemetric or risk data to indicate hidden issues or fraudulent activity.

Big Data Analytics is often carried out with software tools designed to sift and analyse large amounts of diverse information being produced at enormous velocity. Statistical tools used for predictive analysis and data mining are utilised to search and build algorithms.

Big Data

The term Big Data describes amounts of data that are too big for conventional data management systems to handle. The volume, velocity and variety of data overwhelm databases and storage. The result is that either data is discarded or unable to be analysed and mined for value.

Gartner has coined the term ‘Extreme Information Processing’ to describe Big Data – we think that’s a pretty good term to describe the limits of capability of existing infrastructure.

There has always been “big data” in the sense that data volumes have always exceeded the ability for systems to process it. The tool sets to store & analyse and make sense of the data generally lag behind the quantity and diversity of information sources.

The actual amounts and types of Big Data this relates to is constantly being redefined as database and hardware manufacturers are constantly moving those limits forward.

Several technologies have emerged to manage the Big Data challenge. Hadoop has become a favourite tool to store and manage the data, traditional database manufacturers have extended their products to deal with the volumes, variety and velocity and new database firms such as ParAccel, Sand & Vectorwise have emerged offering ultra-fast columnar data management systems. Some firms, such as Hadapt, have a hybrid solution utilising tools from both the relational and unstructured world with an intelligent query optimiser and loader which places data in the optimum storage engine.

Business Intelligence

The term Business Intelligence(BI) has been around for a long time and the growth of data and then Big Data has focused more attention in this space. The essence of BI is to obtain value from data to help build business benefits. Big Data itself could be seen as BI – it is a set of applications, techniques and technologies that are applied to an entities data to help produce insight and value from it’s data.

There are a multitude of products that help build Business Intelligence solutions – ranging from the humble Excel to sophisticated (aka expensive) solutions requiring complex and extensive infrastructure to support. In the last few years a number of user friendly tools such as Qlikview and Tableau have emerged allowing tech-savvy business people to exploit and re-cut their data without the need for input from the IT department.

Data Science

This is, perhaps, the most exciting area of Big Data. This is where the Big Value is extracted from the data. One of our data scientist friends described it as follows: ” Big Data is plumbing and that Data Science is the value driver…”

Data Science is a mixture of scientific research techniques, advance programming and statistical skills (or hacking), philosophical thinking (perhaps previously known as ‘thinking outside the box’) and business insight. Basically it’s being able to think about new/different questions to ask, be technically able to intepret them into a machine based format, process the result, interpret them and then ask new questions based from the results of the previous set…

A diagram by blogger Drew Conway  describes some of the skills needed – maybe explains the lack of skills in this space!

 

In addition Pete Warden (creator of the Data Science Toolkit) and others have raised caution on the term Data Science “Anything that needs science in the name is not a real science” but confirms the need to have a definition of what Data Scientists do.

Database

Databases can generally be divided into structured and unstructured.

Structured are the traditional relational database management systems such as Oracle, DB2 and SQL-Server which are fantastic at organising large volumes of transactional and other data with the ability to load and query the data at speed with an integrity in the transactional process to ensure data quality.

Unstructured are technologies that can deal with any form of data that is thrown at them and then distribute out to a highly scalable platform. Hadoop is a good example of this product and a number of firms now produce, package and support the open-source product.

Feedback Loops

Feedback loops are systems where the output from the system are fed back into it to adjust or improve the system processing. Feedback loops exist widely in nature and in engineering systems – think of an oven – heat is applied to warm to a specific temperature and is measured by a thermostat – once the correct temperature is reached the thermostat informs the heating element and it shuts down until feedback from the thermostat says it is getting too cold and it turns on again… and so on.

Feedback loops are an essential part of extracting value from Big Data. Building in feedback and then incorporating Machine Learning methods start to allow systems to become semi-autonomous, this allows the Data Scientists to focus on new and more complex questions whilst testing and tweaking the feedback from their previous systems.

Hadoop

Hadoop is one of the key technologies to support the storage and processing of Big Data. Hadoop emerged from Google and its distributed Google File System and Mapreduce processing tools. It is an open source product under the Apache banner but, like Linux, is distributed by a number of commercial vendors that add support, consultancy and advice on top of the products.

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

So Hadoop could almost be seen as a (big) bucket where you can throw any form and quantity of data into it and it will organise and know where that data resides and can retrieve and process it. It also accepts that there may be holes in the bucket and can patch them up by using additional resources to patch itself up – all in all very clever bucket!!

Hadoop runs on a scheduling basis so when a question is asked it breaks up the query and shoots them out to different parts of the distributed network in parallel and then waits and collates the answers.

Hive

Hive provides a high level, simple, SQL type language to enable processing of and access to data stored in Hadoop files. Hive can provide analytical and business intelligence capability on top of Hadoop. The Hive queries are translated into a set of MapReduce jobs to run against the data. The technology is used by many large technology firms in their products including Facebook and Last.FM. The latency/batch related limitations of MapReduce are present in Hive too but the language allows non-Java programmers to access and manipulate large data sets in Hadoop.

Machine Learning

Machine learning is one of the most exciting concepts in the world of data. The idea is not new at all but the focus on utilising feedback loops of information and algorithms that take actions and change depending on the data without manual intervention could improve numerous business functions. The aim is to find new or previously unknown patterns & linkages between data items to obtain additional value and insight. An example of machine learning in action is Netflix which is constantly trying to improve its movie recommendation system based on a user’s previous viewing, their characteristics and also the features of their other customers with a similar set of attributes.

MapReduce

Mapreduce is a framework for processing large amounts of data across a large number of nodes or machines.

http://code.google.com/edu/parallel/img/mrfigure.png
Map Reduce diagram (courtesy of Google)

Mapreduce works by splitting out (or mapping) requests into multiple separate tasks to be performed on many nodes of the system and then collates and summarises the results back (or reduces) to the outputs.

Mapreduce based on the java language and is the basis of a number of the higher level tools (Hive, Pig) used to access and manipulate large data sets.

Google (amongst others) developed and use this technology to process large amounts of data (such as documents and web pages trawled by its web crawling robots). It allows the complexity of parallel processing, data location and distribution and also system failures to be hidden or abstracted from the requester running the query.

MPP

MPP stands for massively parallel processing and it is the concept which gives the ability to process the volumes (and velocity and variety) of data flowing through systems. Chip processing capabilities are always increasing but to cope with the faster increasing amounts of data processing needs to be split across multiple engines. Technology that can split out requests into equal(ish) chunks of work, manage the processing and then join the results has been difficult to develop.  MPP can be centralised with a cluster of chips or machines in a single or closely coupled cluster or distributed where the power of many distributed machines are used (think ‘idle’ desktop PCs overnight usage as an example). Hadoop utilises many distributed systems for data storage and processing and also has fault tolerance built in which enables processing to continue with the loss of some of those machines.

NoSQL

NoSQL really means ‘not only SQL’, it is the term used for database management systems that do not conform to the traditional RDBMS model (transactional oriented data management systems based on the ACID principle). These systems were developed by technology companies in response to challenges raised by the high volumes of data. Amazon, Google and Yahoo built NoSQL systems to cope with the tidal wave of data generated by their users.

Pig

Apache Pig is a platform for analysing huge data sets. It has a high-level language called Pig Latin which is combined with a data management infrastructure which allows high levels of parallel processing. Again, like Hive, the Pig Latin is compiled into MapReduce requests. Pig is also flexible so additional functions and processing can be added by users for their own specific needs.

Real Time

The challenges in processing the “V”‘s in big data (volume, velocity and variety) have meant that some requirements have been compromised. It the case of Hadoop and Mapreduce this has been the interactive or instant availability of the results. Mapreduce is batch orientated in the sense that requests are sent for processing where they are then scheduled to be run and then the output summarised. This works fine for the original purposes but now the ability to become more real-time or interactive are growing. With a ‘traditional’ database or application users expect the results to be available instantly or pretty close to instant. Google and others are developing more interactive interfaces to Hadoop. Google has Drill and Twitter has release Storm. We see this as one of the most interesting areas of development in the Big Data space at the moment.

 

Over the next few months we have some guest contributors penning their thoughts on the future for big data, analytics and data science.  Also don’t miss Tim Seears’s (TheBigDataPartnership) article on maximising value from your data “Feedback Loops” published here in June 2012.

For the technically minded Damian Spendel also published some worked examples using ‘R’ language on Data Analysis and Value at Risk calculations.

These are our thoughts on the products and technologies – we would welcome any challenges or corrections and will work them into the articles.

 

Data Analysis – An example using R

Posted on : 31-08-2014 | By : richard.gale | In : Data

Tags: , , , , , ,

3

With the growth of Big Data and Big Data Analytics the programming language R has become a staple tool for data analysis.   Based on modular packages (4000 are available), it offers sophisticated statistical analysis and visualization capabilities.   It is well supported by a strong user community.   It is also Open Source.

For the current article I am assuming an installation of R 3.1.1[1] and RStudio[2].

The article will cover the steps taken to provide a simple analysis of highly structured data using R.

The dataset I will use in this brief demonstration of some basic R capabilities is from a peer-to-peer Lending Club in the United States[3].  The dataset is well structured with a data dictionary and covers both loans made and loans rejected.   I will use R to try and find answers to the following questions:

  • Is there a relationship between Loan amounts and funded amounts?
  • Is there a relationship between the above and the number of previous public bankruptcies?
  • What can we find out about rejections?
  • What can we find out about the geographic distribution of the Lending Club loans?

During the course of this analysis we will use basic commands R to:

  • Import data
  • Plot data using scatterplots and regression lines
  • Use functions to perform heavy lifting
  • Summarize data using simple aggregation functions
  • Plot data using choropleths

Having downloaded the data to our working directory we’ll import the three files using read.csv and merge them together using rbind() (row bind):

>data_lending0 <- read.csv(“LoanStats3a.csv”, header =FALSE)

>data_lending1 <- read.csv(“LoanStats3b.csv”, header =FALSE)

>data_lending2 <- read.csv(“LoanStats3c.csv”, header =FALSE)

>data_full <- rbind(data_lending0, data_lending1, data_lending2)

We can now explore the data using some of R’s in-built functions for metadata – str (structure), names (column names), unique (unique values of a variable).

The first thing I will do is use ggplot to build a simple scatter plot showing the relationship between the funded amount and the loan amount.

>install.packages(“ggplot2″)

>library(ggplot2)

>ggplot(data_full, aes(x=loan_amnt, y=funded_amnt)) + geom_point(shape=1) + geom_smooth(method=lm)

The above  three lines install the package ggplot2 from a CRAN mirror, load the library into the R environment, and then use the “Grammar of Graphics” to build a plot using the data_full dataset, with the x-axis showing the principle (loan_amnt) and the y-axis showing the lender’s contribution (funded_amnt).  With geom_smooth we add a line to help see patterns – in this case a line of best fit.

In R Studio we’ll now see the following plot:

 

 

This shows us clearly that the Lending Club clusters loans at the lower end of the spectrum and that there is a clear positive correlation between the loan_amount and funded_amnt – for every dollar you bring you can borrow a dollar, there is little scope for leverage here.   Other ggplot functions will allow us to tidy up the labeling and colours, but I’ll leave that as an exercise for the interested reader.

 

 

The next step is to add an additional dimension – and investigate the link between principles and contributions under the aspect of known public bankruptcies of the applicants.

>ggplot(data_full, aes(x=loan_amnt, y=funded_amnt, color=pub_rec_bankruptcies)) + geom_point(shape=19,alpha=0.25) + geom_smooth(method=lm)

Here, I’ve used the color element to add the additional dimension and attempted to improve legibility of the visualization by making the points more transparent.

 

 

Not very successfully, it doesn’t help us further – maybe sampling could improve the visualization, or a more focused view….

 

 

 

 

Let’s have a quick look at the rejection statistics:

> data1 <- rbind(read.csv(“RejectStatsA.csv”), read.csv(“RejectStatsB.csv”))

> nrow(rejections)/nrow(data_full)

[1] 6.077912

For every application – six rejections.

Another popular method of visualization is using choropleth (“many places”) visualizations.   In this case, we’ll build a map showing outstanding loans by State.

The bad news is that the Lending Club data uses two-letter codes and that state data we’ll use from the maps package (install.packages, library etc….) is the full name.   Fortunately,  a quick search provides a function “stateFromLower”[4] that will perform this for us.   So, I run the code that creates the function and then add a new column called state (“WY”) to the data_full dataset, and use stateFromLower to conver the addr_state column (“Wyoming”):

> data_full$state <- stateFromLower(data_lending$addr_state

Then, I aggregate the principles by state:

> loan_amnts <- aggregate(data_lending$funded_amnt, by=list(data_lending$state), FUN=sum)

Load the state data:

> data(state)

The next code leans heavily on a succinct tutorial provided elsewhere[5]:

> map(“state”, fill=FALSE, boundary=TRUE, col=”red”)

> mapnames <- map(“state”, plot=FALSE)$names

Remove regions:

> region_list <- strsplit(mapnames, “:”)

> mapnames2 <- sapply(region_list, “[“, 1)

Match the region-less mapnames to the loan amounts:

> m <- match(mapnames2, tolower(state.name))

> loans <- loan_amnts$x

Bucketize the aggregated loans:

> loan.buckets <- cut(loans, c(“500000”, “1000000”, “5000000”, “10000000”, “15000000”, “20000000”, “30000000”, “40000000”, “90000000”, “100000000”))

Define the colour schema:

> clr <- rev(heat.colors(13))

Draw the map and add the title:

> map(“state”, fill=TRUE, col=clr[loan.buckets], projection = “polyconic”)

> title(“Lending Club Loan Amounts by State”)

Create a legend:

> leg.txt <- c(“$500000”, “$1000000”, “$5000000”, “$10000000”, “$15000000”, “$20000000”, “$30000000”, “$40000000”, “$90000000”, “$100000000”)

> legend(“topright”, leg.txt, horiz = FALSE, fill = clr)

With a few simple lines of code R has demonstrated that it is quite a powerful tool for generating visualizations that can aid in understanding  and analyzing data.    We were able to understand something about the lending club – it almost seems like we have clear KPIs in terms of rejections and in terms of loan amounts to funding amounts.    I think we can also see a link between the Lending Club’s presence and poverty[6].

This could be a starting point for a more detailed analysis into the Lending Club, to support investing decisions, or internal business decisions (reduce rejections, move into Wyoming etc.).

Guest author: Damian Spendel – Damian has spent his professional life bringing value to organisations with new technology. He is currently working for a global bank helping them implement big data technologies. You can contact Damian at damian.spendel@gmail.com

Broadgate 2013 Predictions – how did we do?

Posted on : 30-12-2013 | By : richard.gale | In : Innovation

Tags: , , , , , , , , , , , , , , , ,

0

In December 2012 we identified some themes we thought would be important for the coming year. Let’s see how we got on…

1. Infrastructure Services continue to commoditise – for many organisations, Infrastructure as a Service (IaaS) is now mainstream. Technology advancement will continue to move the underlying infrastructure more towards a utility model and reduce costs in terms of software, hardware and resource.

This has happened and is continuing to grow, most organisations have the infrastructure in place to support IaaS with private clouds and virtualised environments. However, the flexibility and agility benefits have not always been realised as large organisation IaaS have sometimes been weighed down with the legacy change and build processes of the previous model. To circumvent this, many businesses are looking at public cloud for more flexible capacity. This will be the big growth area of 2014 especially with financial services organisations that, previously, have been hesitant in adopting public cloud solutions.

2. Application/Platform rationalisation – for many large firms there is still a large amount of legacy cost in terms of both disparate platforms, often aligned by business unit, and their sheer size/complexity. The next year will see an increase in rationalisation of application platforms to drive operational efficiency.

In 2013 the understanding and scale of the problem became more apparent but, with limited change/transformation budgets (in financial services mainly due to the burden of regulatory compliance requirements) not much action. Now these complex webs of legacy applications are starting to fail and seriously constrained business growth. 2014 will be a ‘crunch’ year when these expensive problems have to be tackled head on either through wholesale re-architecting or giving someone else the problem of running them whilst new solutions are built.

3. Big Data/ Data Science grows and market starts to consolidate – 2012 was the year that Big Data technologies went mainstream…2013 will see an increased focus on Data Science resource and technology to maximise the analytical value. There will also be some consolidation at the infrastructure product level.

In financial services we saw a fair amount of discussion, some large proof of concept projects focusing on consolidation (many seem to be targeting the risk and finance areas), but not the levels of take up we expected. MasterCard have come in with a big data restaurant review concept. We may have been slightly premature with this one. We think the understanding of Data Science is starting to go mainstream and, as with Cloud, the demand will come more from the business rather than IT architects in 2014.

4. Data Centre/Hosting providers continue growth – fewer and fewer companies are talking about building their own data centres now, even the very large ones. With the focus on core business value, infrastructure will continue to be hosted externally driving up the need for provider compute power.

 Many organisations either use external more flexible hosting solutions or have an excess of capacity in their existing data centres. This will continue and grow in pace in 2014.

5. More rationalisation of IT organisations – 2012 saw large reductions in operational workforce, particularly in financial services. With revenues under more pressure this year (and in line with point 1) we will see further reductions in resource capacity and relocation to low cost locations, both nearshore and within the UK.

In the financial services sector this may be at an end. There will be growth in demand for IT skills in 2014 but there will be some reductions particularly in the infrastructure/BAU space due to the continued commoditisation of technology and move to XaaS services.

6. Crowd-funding services continue to gain market share – there have been many new entrants to this space over recent years with companies such as Funding Circle, Thin-Cats, Bank-to-the-Future and Kickstarter all doing well. We see this continuing to grow as access to funds from traditional lenders is still hard. The question is at what point will they step in.

This one was an easy prediction as a low starting point combined with the banks reluctance to lend, low interest rates and increasing interest in the tech sector inevitably led to high levels of growth. 2014 will continue this trend but with a higher degree of regulation after the first high profile failure of a lending exchange…

7. ‘Instant’ Returns on investment required – growth of SaaS & BYOD is changing the perception of technology. People as consumers are now accustomed to an instant solution to a problem (by downloading an app or purchasing a service with a credit card). This, combined with historic patchy project successes, means that long lead-time projects are becoming harder to justify; IT departments are having to find near instant solutions to business problems.

Business users are leading IT departments on the adoption of SaaS in particular. IT is playing catch-up and the race will continue. We are not sure what 2014 will bring on this. It could be that IT departments regain control or, alternatively, are bypassed on a more frequent basis by impatient, IT savvy business users.

8. Technology Talent Wars – with start-ups disrupting traditional players in areas such as data analytics, social media and mobile payment apps, barriers to entry eroding and salaries on the rise we see a shift from talent wanting to join industries such as financial services and choosing new technology companies.

Relatively low demand from financial services firms (except for a few specific skills such as security) has deferred this. This is more likely to impact 2014 change and innovation programmes now.

9. Samsung/Android gain more ground over Apple – we already have seen the Apple dominance, specifically in relation to the Appstore, being eroded and this will continue as the potential of a more open platform becomes apparent to both developers and users of technology.

This has happened and will continue unless Apple can come up with some new magic. Phones/tablets are the new battleground, other operating systems such as Windows and potentially Jolla could disrupt the trend in 2014.

10. The death knell sounds for RIM/Blackberry – not much more to say. Most likely they will be acquired by one of the big new technology companies to gain access to the remaining smart phone users.

The only thing to add to this is that there may be a ‘dead-cat’ bounce for Blackberry in 2014.

 

Once again we hope you have enjoyed our monthly articles and have had a successful 2013. We wish you all the same for 2014!

 

From a single view of a customer to a global view of an individual – bespoke banking for the mass market

Posted on : 02-09-2013 | By : richard.gale | In : Innovation

Tags: , , , , , , , , , , ,

0

Customer interaction with banks can be complex. Historically this has resulted in lost opportunities for both institutions and their clients with neither obtaining full value from the relationship. Forward looking banks are addressing this through changes in thinking and technology.

Banks have many touch points with their existing & potential clients;

  • Accounts –  such as current, saving, loan, share trading, business/personal, mortgages
  • Products – such as life insurance, pensions or advisory services
  • Channels – face-to-face, telephone, ATM, web application, mobile, social media and a multitudes of formats in advertising and marketing
  • History – banks have bought and absorbed many different, divergent firms and may not have fully integrated across people, process & systems

The complexity potential of this interaction combined with the sometimes disjointed nature of these organisations mean that connections are not made and so opportunities can be lost, customers can feel undervalued whilst increasing the potential for fraud.

Change – Cultural & organisational integration

Most banks are huge organisations with thousands of staff based around the globe. To scale the organisation, roles have become more specialised and most people have deep skills in relatively narrow fields of the banks overall capability.

This has worked well and has enabled the global growth of the organisation but opportunities are being missed to further grow customers and clients through the consolidation of information and consistency of customer experience.

That additional value can be enabled by a cultural shift towards a ‘one bank’ philosophy, most banks have these programmes in place and seem to work at the infrastructure level but a different way of thinking that gives an incentive to think about other areas that could help their customer.

To enable this to work there would need to be a supporting framework in place;

  1. Knowledge of the other areas/business units/geography – a simple view of a complex environment is critical
  2. Open & effective communication channels – the mechanism is less important than the knowledge that it is available and there are people listening and willing to help
  3. Communication needs to be valued and seen to be valued by all levels with the business

Improve – Customer Relations

Timely, accurate & complete customer intelligence  is critical. Who, what where are your customers? What do they do, what do they like & dislike and what are their dreams? Gaining this insight into your customer’s mind and tailoring communications & solutions to match this will make them want to do more business with you.

A major factor in achieving this will be to collate & analyse all possible information and so having a single point (such as customer relationship team)  accountable for ensuring its accuracy & completeness will help this process.

Having a more complete set of information in regard to your customer will help understand their needs and, with a consistent approach to communication, also help avoid alienating them through providing inaccurate or inappropriate information or advice.

As important to consistency & completeness is the longevity  of the relationship. Customers in the past have generally stayed with the same bank for a considerable time, this ‘stickiness’ is now being eroded through;

  • Improved knowledge – of other options available
  • Legislation – forcing switching of accounts to me made easier
  • Changing attitudes – people are commoditising purchasing and usage based on value and quality ahead buying from a single company
  • Technology – information from many sources & companies are available on a phone or tablet

The relationship between a customer and a bank is similar to any long term partnership, it’s based on a set of core features; trust, openness, well-being. equality amongst others.

Thinking about these principles when engaging with a customer will only help the relationship endure.

Integrate – Infrastructure, systems & applications

Large scale, standardised technology has been the norm for banks interacting with their customers. This works and has been the only real way to handle the millions of transactions from thousands of customers in the past.

That same core technology still underpins the banking world but with the advances in capability & speed and parallel reduction in cost there is an opportunity to build a view of the individual and then start providing bespoke services on a manufacturing scale.

The move to more customer centric technology should enable the standard bank account holder to experience a ‘Saville Row’ world for a Marks & Spencer price.

An impact of this may be that the Private banking and Wealth management divisions of banks will have to raise their level of service to differentiate from the ‘norm’.

The use of data analytics to search through the volumes of data and analyse and extract insight and value from it are essential tools to achieve the bespoke solution.

Big Data databases and tool-kits can help provide the framework but knowledgeable teams of people with both the understanding of the customers and technology will be required to provide answers and the next set of questions to achieve an even greater level of customer satisfaction, retention and growth.

Sinking in a data storm? Ideas for investment companies

Posted on : 30-06-2013 | By : richard.gale | In : Data

Tags: , , , , , , , , , , , , , ,

1

All established organisations have oceans of data and only very basic ways to navigate a path through it

This data builds up over time through interaction with clients, suppliers and other organisations. It is usually stored in different ways on disconnected systems and documents

Trying to Identify what it means on a single system is a big enough challenge, trying to do this across a variety of applications is a much bigger problem with different meaning and interpretations of the fields and terms in the system

How can a company get a ‘360’ view of their client when they have different identifiers in various applications and there is no way of connecting them together. How can you measure the true value of your client when you can only see a small amount of the information you hold about them.

Many attempts have been made to join and integrate these data sets (through architected common data structures, data warehouses, messaging systems, business intelligence applications etc) but it has proved a very expensive and difficult problem to solve. These kind of projects take a long time to implement and the business has often moved on by the time they are ready. In addition early benefits are hard to find so these sorts of projects can often fall victim to termination if a round of cost cutting is required.

So what can be done? Three of the key problems are identification of value from data, duration & costs of data projects and ability to deal with a changing business landscape.

There is no silver bullet but we have been working with a number of Big Data firms and have found a key value from them is the ability to quickly load large volumes of data (both traditional database and unstructured documents, text, multi-media). This technology is relatively cheap and the hardware required is both generic and cheap and again can be easily sourced from cloud vendors.

Using a Hadoop based data store on Amazon cloud or a set of spare servers enables large amounts of data to be uploaded and made available for analysis.

So that can help with the first part, having disparate data in one place. So how to start extracting additional value from that data?

We have found a good way is to start asking questions of the data – “what is the total value of business client X does with my company?” or “what is our overall risk if this counterparty fails?” or “what is my cost of doing business with supplier A vs. supplier B?” if you start building question sets against the data and test & retest you can refine the questions, data and results and answers with higher levels of confidence start appearing. What often happens is that the answers create new questions and so answers etc.

There is nothing new about using data sets to enquire and test but the emerging Big Data technologies allow larger, more complex sets of data to be analysed and cheaper cloud ‘utility’ computing power makes the experimentation economically viable.

What is also good about this is that as the business grows and moves on – to new areas, systems or processes then loading the new data sets should be straightforward and fast. The questions can be re-run and results reappraised quickly and cheaply.

As we have discussed previously we think the most exciting areas within Big Data are the Data science and analytics – find which questions to ask and refining the results.

Visualisation of these results is another area where we see some exciting developments and we will be writing an article on this soon.