The ultimate way to move beyond trading latency?

Posted on : 29-03-2019 | By : richard.gale | In : Finance, Uncategorized

Tags: , , , , , , ,

0

A number of power surges and outages have been experienced in the East Grinstead area of the UK in recent months. Utility companies involved have traced the cause to one of three  high capacity feeds to a Global Investment bank’s data centre facility.

The profits created by the same bank’s London based Propriety Trading group has increased tenfold in the same time.

This bank employs 1% of the world’s best post-doctoral theoretical Physics graduates  to help build its black box trading systems

Could there be a connection? Wild & unconfirmed rumours have been circulating within  the firm that a major breakthrough in removing the problem of latency – the physical limitation the time it takes a signal to transfer down a wire – ultimately governed by of the speed of light.

For years traders have been trying to reduce execution latency to provide competitive advantage in a highly competitive fast moving environment. The focus has moved from seconds to milli and now microsecond savings.

Many Financial Services & technology organisations have attempted to solve this problem through reducing  data hopping, routing, and going as far as placing their hardware physically close to the source of data (such as in an Exchange’s data centre) to minimise latency but no one has solved the issue – yet.

It sounds like this bank may have gone one step further. It is known that at the boundary of the speed of light – physics as we know it -changes (Quantum mechanics is an example where the time/space continuum becomes ‘fuzzy’). Conventional physics states that travelling faster than the speed of light and see into the future would require infinite energy and so is not possible.

Investigation with a number of insiders at the firm has resulted in an amazing and almost unbelievable insight. They have managed to build a device which ‘hovers’ over the present and immediate future – little detail is known about it but it is understood to be based on the previously unproven ‘Alcubierre drive’ principle. This allows the trading system to predict (in reality observe) the next direction in the market providing invaluable trading advantage.

The product is still in test mode as the effects of trading ahead of the data they have already traded against is producing outages in the system as it then tries to correct the error in the future data which again changes the data ad finitum… The prediction model only allows a small glimpse into the immediate future which also limits the window of opportunity for trading.

The power requirements for the equipment are so large that they have had to been moved to the data centre environment where consumption can be more easily hidden (or not as the power outages showed).

If the bank does really crack this problem then they will have the ultimate trading advantage – the ability to see into the future and trade with ‘inside’ knowledge legally. Unless another bank is doing similar in the ‘trading arms race’ then the bank will quickly become dominant and the other banks may go out of business.

The US Congress have apparently discovered some details of this mechanism and are requesting the bank to disclose details of the project. The bank is understandably reluctant to do this as it has spent over $80m developing this and wants to make some return on its investment.

If this system goes into true production mode surely it cannot be long before Financial Regulators outlaw the tool as it will both distort and ultimately destroy the markets.

Of course the project has a codename…. Project Tachyons

No one from the company was available to comment on the accuracy of the claims.

AI Evolution: Survival of the Smartest

Posted on : 21-05-2018 | By : richard.gale | In : Innovation, Predictions

Tags: , , , , ,

0

Artificial intelligence is getting very good at identifying things: Let it analyse a million pictures, and it can tell with amazing accuracy which show a child crossing the road. But AI is hopeless at generating images of people or whatever by itself. If it could do that, it would be able to create visions of realistic but synthetic pictures depicting people in various settings, which a self-driving car could use to train itself without ever going out on the road.

The problem is, creating something entirely new requires imaginationand until now that has been a step to far for machine learning.

There is an emerging solution first conceived by  Ian Goodfellow during an academic argument in a bar in 2014… The approach, known as a generative adversarial network, or “GAN”, takes two neural networksthe simplified mathematical models of the human brain that underpin most modern machine learningand pits them against each other to identify flaws and gaps in the others thought model.

Both networks are trained on the same data set. One, known as the generator, is tasked with creating variations on images it’s already seenperhaps a picture of a pedestrian with an extra arm. The second, known as the discriminator, is asked to identify whether the example it sees is like the images it has been trained on or a fake produced by the generatorbasically, is that three-armed person likely to be real?

Over time, the generator can become so good at producing images that the discriminator can’t spot fakes. Essentially, the generator has been taught to recognize, and then create, realistic-looking images of pedestrians.

The technology has become one of the most promising advances in AI in the past decade, able to help machines produce results that fool even humans.

GANs have been put to use creating realistic-sounding speech and photo realistic fake imagery. In one compelling example, researchers from chipmaker Nvidia primed a GAN with celebrity photographs to create hundreds of credible faces of people who don’t exist. Another research group made not-unconvincing fake paintings that look like the works of van Gogh. Pushed further, GANs can reimagine images in different waysmaking a sunny road appear snowy, or turning horses into zebras.

The results aren’t always perfect: GANs can conjure up bicycles with two sets of handlebars, say, or faces with eyebrows in the wrong place. But because the images and sounds are often startlingly realistic, some experts believe there’s a sense in which GANs are beginning to understand the underlying structure of the world they see and hear. And that means AI may gain, along with a sense of imagination, a more independent ability to make sense of what it sees in the world. 

This approach is starting to provide programmed machines with something along the lines of imagination. This, in turn, will make them less reliant on human help to differentiate. It will also help blur the lines between what is real and what is fake… And in an age where we are already plagued with ‘fake news’ and doctored pictures are we ready for seemingly real but constructed images and voices….

THE NEXT BANKING CRISIS? TOO ENTANGLED TO FAIL…

Posted on : 29-10-2015 | By : Jack.Rawden | In : Finance

Tags: , , , , , , ,

0

Many miles of newsprint (& billions of pixels) have been generated discussing the reasons for the near collapse of the financial systems in 2008. One of the main reasons cited was that each of the ‘mega’ banks had such a large influence on the market that they were too big to fail, a crash of one could destroy the entire banking universe.

Although the underlying issues still exist; there are a small number of huge banking organisations, vast amounts of time and legislation has been focused on reducing the risks of these banks by forcing them to hoard capital to reduce the external impact of failure. An unintended consequence of this has been that banks are less likely to lend so constricting firms ability to grow and so slowing the recovery but that’s a different story.

We think, the focus on capital provisions and risk management, although positive, does not address the fundamental issues. The banking system is so interlinked and entwined that one part failing can still bring the whole system down.

Huge volumes of capital is being moved round on a daily basis and there are trillions of dollars ‘in flight’ at any one time. Most of this is passing between banks or divisions of banks. One of the reasons for the UK part of Lehman’s collapse was that it sent billions of dollars (used to settle the next days’ obligations) back to New York each night. On the morning of 15th September 2008 the money did not come back from the US and the company shut down. The intraday flow of capital is one of the potential failure points with the current systems.

Money goes from one trading organisation in return for shares, bonds, derivatives, FX but the process is not instant and there are usually other organisations involved in the process and the money and/or securities are often in the possession of different organisations in that process.

This “Counterparty Risk” is now one of the areas that banks and regulators are focussing in on. What would happen if a bank performing an FX transaction on behalf of a hedge fund stopped trading. Where would the money go? Who would own it and, as importantly, how long would it take for the true owner to get it back. The other side of the transaction would still be in flight and so where would the shares/bonds go? Assessing the risk of a counterparty defaulting whilst ensuring the trading business continues is a finely balanced tightrope walk for banks and other trading firms.

So how do organisations and governments protect against this potential ‘deadly embrace’?

Know your counterparty; this has always been important and is a standard part of any due diligence for trading organisations, what is as important is;

Know the route and the intermediaries involved; companies need as much knowledge of the flow of money, collateral and securities as they do for the end points. How are the transactions being routed and who holds the trade at any point in time. Some of these flows will only pause for seconds with one firm but there is always a risk of breakdown or failure of an organisation so ‘knowing the flow’ is as important as knowing the client.

Know the regulations; of course trading organisations spend time & understand the regulatory framework but in cross-border transactions especially, there can be gaps, overlaps and multiple interpretations of these regulations with each country or trade body having different interpretation of the rules. Highlighting these and having a clear understanding of the impact and process ahead of an issue is vital.

Understanding the impact of timing and time zones; trade flows generally can run 24 hours a day but markets are not always open in all regions so money or securities can get held up in unexpected places. Again making sure there are processes in place to overcome these snags and delays along the way are critical.

Trading is getting more complex, more international, more regulated and faster. All these present different challenges to trading firms and their IT departments. We have seen some exciting and innovative projects with some of our clients and we are looking forward to helping others with the implementation of systems and processes to keep the trading wheels oiled…

The Blockchain Revolution

Posted on : 28-08-2015 | By : richard.gale | In : Cyber Security

Tags: , , , , , , , , , , ,

3

We’ve been excited by the potential of blockchain and in particular bitcoin technology and possibilities for a while now (Bitcoins: When will they crash?  More on Bitcoins..  Is someone mining on my machine? ). We even predicted that bitcoins would start to go mainstream in our 2015 predictions . We may be a little ahead of ourselves there but the possibilities of the blockchain, the underpinning technology of crypto currencies is starting to gather momentum in the financial services world.

Blockchain technology contains the following elements which are essential to any financial transaction

  1. Security – Blockchain data is secure as each part of the chain is linked with the other and many copies of that data are stored among the many thousands of ‘miners’ in an encrypted (currently unhackable) format. Even if a proportion of these miners were corrupt with criminal intent the voting of the majority will ensure integrity
  2. Full auditability – Every block in the chain has current and historic information relating to that transaction, the chain itself has everything that ever happened to it. The data is stored in multiple places and so there is a very high degree of assurance that the account is full and correct
  3. Transparency – All information is available in a consistent way to anyone with a valid interest in the data
  4. Portability – The information can be available anywhere in the world, apart from certain governments’ legislation there are few or no barriers to trade using blockchain technology
  5. Availability – There are  many copies of each blockchain available in virtually every part of the world blockchains should then always be available for use

The blockchain technology platform is flexible enough to incorporate additional functions and process without compromising it’s underlying strengths.

All major banks and a number of innovative startups are looking at ways blockchain can change the way transactions are executed. There are significant opportunities for both scale and efficiency using this technology. Areas being researched include;

  • Financial trading and settlement. Fully auditable, automated chain of events with automated payments, reporting and completion globally and instantly
  • Retail transactions. End to end transactions delivered automatically without the opportunity of loss or fraud
  • Logistics and distribution. Automatically attached to physical and virtual goods with certified load information enabling swift transit across nations
  • Personal data. Passports, medical records and government related information can be stored encrypted but available and trusted
There are still some significant challenges with blockchain technology;
  1. Transactional throughput – limited by banking standards (10’s of transactions per second at present rather than 10,000’s)
  2. Fear and lack of understanding of the technology – this is slowing down thinking and adoption
  3. Lack of skills to design and build – scarce resources in this space and most are snapped up by start-ups
  4. Complexity and lack of transparency – Even though the technology itself is transparent the leap from the decades old processes used in banks back offices for example to a blockchain programme can be a large one. In the case of time critical trading or personal information then security concerns on who can view data come to the fore.
  5. Will there be something else that replaces it – will the potentially large investment in the technology be wasted by the ‘next big thing’?

We think blockchain could have a big future. Some people are even saying it will revolutionize government, cutting spending by huge amounts. If blockchain transactions were used to buy things then sales tax and various amounts to retailers, wholesalers, manufacturers could be paid immediately and automatically. The sales person could have the blockchain credit straightaway too.

Blockchains could remove huge levels of inefficiency and potential for fraud. It could also put a significant number of jobs at risk reflected in John Vincent’s article on the future of employment.

ASSURITY: Cyber Value at Risk calculations

Posted on : 30-07-2015 | By : richard.gale | In : Cyber Security, Innovation

Tags: , , , , , , , , , ,

0

If the assumption that cyber attacks are inevitable is true then what can you do? An approach is to pour unlimited amounts of money into the blackhole of IT security. Another, more sensible, approach to take would be risk based, predicting the likelihood, the form and the cost of an attack against the cost of avoidance or mitigation.

Our ASSURITY Information Risk Assessment calculates the Cyber Value at Risk (CVaR) based on a number of criteria including industry, size, profile, interface, level of regulation and a number of other factors. What it provides is hard facts and costs that company directors demand to ensure they are obtaining value from their information security investments and that it is directed to right places.

Building a credible method of estimating and quantifying risk is essential to the process of risk management. The very public breaches at Sony, Target & Ashley Madison mask the multitude that do not make the press. In the UK there is little incentive to highlight a breach but new legislation will change that for organisations in the next year. So given that cyber attacks are “inevitable” then how can the economic impact be calculated for a particular organisation?

The World Economic Forum recently released its report “Partnering for Cyber Resilience; Towards the Quantification of Cyber Threats,” which calls for the application of VaR modelling techniques to cyber security. The report describes the characteristics a good cyber-oriented economic risk model should have, but it doesn’t specify any particular model. Here, we consider the concept of “value at risk,” what it means, how it can be applied to the cyber, and describe how a CVaR model is implemented in our ASSURITY product.

At Broadgate we have carried out a significant number of security assessments so can draw on the data but we can supplement it with simulated information based on a set of assumptions and factors related to an organisation. We utilise that knowledge from the financial markets to build out Cyber VaR.

  • Assets – these are the network infrastructure of an organisation
  • Values – these are the loss potential of service disruption, intellectual property, compliance failures etc located in the assets
  • Market changes – increase and decrease in the incidence of attack and its effectiveness

Using the data and historic information the CVaR can be calculated with growing certainty and so the risks/costs of an attack can be computed with confidence.  The challenges are modelling the network, value and market changes!

So why does CVaR matter? Cyber Security like most control mechanisms comes down to risk management. Risk management needs real information and figures in order to be useful to a business. If it does not then it is just guesswork so could end up with focus on the wrong areas resulting in over spending and gaps in defences.

Different organisations, sectors and organisational profiles have differing risk profiles and exposures. Companies also have different risk appetites (which change at different stages of their development). So understanding YOUR Cyber Value at Risk is a significant tool to helping understand the risks to your organisation, the potential losses and how to focus your cyber investment. Broadgate’s ASSURITY product can help articulate the risks, costs and best path to resolution.

The ASSURITY product differentiates from other methodologies by being the most complete and accurate assessment that organisations can undertake to really understand their security risk exposure.

If you would like to find out more about the product and to arrange a demo, please contact jo.rose@broadgateconsultants.com or call +44(0)203 326 8000 to speak to one of our security consultants.

 

Broadgate Big Data Dictionary

Posted on : 28-10-2014 | By : richard.gale | In : Data

Tags: , , , , , , , , , , ,

0

A couple of years back we were getting to grips with big data and thought it would be worthwhile putting a couple of articles together to help explain what the fuss was all about. Big Data is still here and the adoption of it is growing so we thought it would be worthwhile updating and re-publishing. Let us know what you think?

We have been interested in Big Data concepts and technology for a while. There is a great deal of interest and discussion with our clients and associates on the subject of obtaining additional knowledge & value from data.

As with most emerging ideas there are different interpretations and meanings for some of the terms and technologies (including the thinking that ‘big data’ isn’t new at all but just a new name for existing methods and techniques).

With this in mind we thought it would be useful to put together a few terms and definitions that people have asked us about recently to help frame Big Data.

We would really like to get feedback, useful articles & different views on these to help build a more definitive library of Big Data resources.

Analytics 

Big Data Analytics is the processing and searching through large volumes of unstructured and structured data to find hidden patterns and value. The results can be used to further scientific or commercial research, identify customer spending habits or find exceptions in financial, telemetric or risk data to indicate hidden issues or fraudulent activity.

Big Data Analytics is often carried out with software tools designed to sift and analyse large amounts of diverse information being produced at enormous velocity. Statistical tools used for predictive analysis and data mining are utilised to search and build algorithms.

Big Data

The term Big Data describes amounts of data that are too big for conventional data management systems to handle. The volume, velocity and variety of data overwhelm databases and storage. The result is that either data is discarded or unable to be analysed and mined for value.

Gartner has coined the term ‘Extreme Information Processing’ to describe Big Data – we think that’s a pretty good term to describe the limits of capability of existing infrastructure.

There has always been “big data” in the sense that data volumes have always exceeded the ability for systems to process it. The tool sets to store & analyse and make sense of the data generally lag behind the quantity and diversity of information sources.

The actual amounts and types of Big Data this relates to is constantly being redefined as database and hardware manufacturers are constantly moving those limits forward.

Several technologies have emerged to manage the Big Data challenge. Hadoop has become a favourite tool to store and manage the data, traditional database manufacturers have extended their products to deal with the volumes, variety and velocity and new database firms such as ParAccel, Sand & Vectorwise have emerged offering ultra-fast columnar data management systems. Some firms, such as Hadapt, have a hybrid solution utilising tools from both the relational and unstructured world with an intelligent query optimiser and loader which places data in the optimum storage engine.

Business Intelligence

The term Business Intelligence(BI) has been around for a long time and the growth of data and then Big Data has focused more attention in this space. The essence of BI is to obtain value from data to help build business benefits. Big Data itself could be seen as BI – it is a set of applications, techniques and technologies that are applied to an entities data to help produce insight and value from it’s data.

There are a multitude of products that help build Business Intelligence solutions – ranging from the humble Excel to sophisticated (aka expensive) solutions requiring complex and extensive infrastructure to support. In the last few years a number of user friendly tools such as Qlikview and Tableau have emerged allowing tech-savvy business people to exploit and re-cut their data without the need for input from the IT department.

Data Science

This is, perhaps, the most exciting area of Big Data. This is where the Big Value is extracted from the data. One of our data scientist friends described it as follows: ” Big Data is plumbing and that Data Science is the value driver…”

Data Science is a mixture of scientific research techniques, advance programming and statistical skills (or hacking), philosophical thinking (perhaps previously known as ‘thinking outside the box’) and business insight. Basically it’s being able to think about new/different questions to ask, be technically able to intepret them into a machine based format, process the result, interpret them and then ask new questions based from the results of the previous set…

A diagram by blogger Drew Conway  describes some of the skills needed – maybe explains the lack of skills in this space!

 

In addition Pete Warden (creator of the Data Science Toolkit) and others have raised caution on the term Data Science “Anything that needs science in the name is not a real science” but confirms the need to have a definition of what Data Scientists do.

Database

Databases can generally be divided into structured and unstructured.

Structured are the traditional relational database management systems such as Oracle, DB2 and SQL-Server which are fantastic at organising large volumes of transactional and other data with the ability to load and query the data at speed with an integrity in the transactional process to ensure data quality.

Unstructured are technologies that can deal with any form of data that is thrown at them and then distribute out to a highly scalable platform. Hadoop is a good example of this product and a number of firms now produce, package and support the open-source product.

Feedback Loops

Feedback loops are systems where the output from the system are fed back into it to adjust or improve the system processing. Feedback loops exist widely in nature and in engineering systems – think of an oven – heat is applied to warm to a specific temperature and is measured by a thermostat – once the correct temperature is reached the thermostat informs the heating element and it shuts down until feedback from the thermostat says it is getting too cold and it turns on again… and so on.

Feedback loops are an essential part of extracting value from Big Data. Building in feedback and then incorporating Machine Learning methods start to allow systems to become semi-autonomous, this allows the Data Scientists to focus on new and more complex questions whilst testing and tweaking the feedback from their previous systems.

Hadoop

Hadoop is one of the key technologies to support the storage and processing of Big Data. Hadoop emerged from Google and its distributed Google File System and Mapreduce processing tools. It is an open source product under the Apache banner but, like Linux, is distributed by a number of commercial vendors that add support, consultancy and advice on top of the products.

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

So Hadoop could almost be seen as a (big) bucket where you can throw any form and quantity of data into it and it will organise and know where that data resides and can retrieve and process it. It also accepts that there may be holes in the bucket and can patch them up by using additional resources to patch itself up – all in all very clever bucket!!

Hadoop runs on a scheduling basis so when a question is asked it breaks up the query and shoots them out to different parts of the distributed network in parallel and then waits and collates the answers.

Hive

Hive provides a high level, simple, SQL type language to enable processing of and access to data stored in Hadoop files. Hive can provide analytical and business intelligence capability on top of Hadoop. The Hive queries are translated into a set of MapReduce jobs to run against the data. The technology is used by many large technology firms in their products including Facebook and Last.FM. The latency/batch related limitations of MapReduce are present in Hive too but the language allows non-Java programmers to access and manipulate large data sets in Hadoop.

Machine Learning

Machine learning is one of the most exciting concepts in the world of data. The idea is not new at all but the focus on utilising feedback loops of information and algorithms that take actions and change depending on the data without manual intervention could improve numerous business functions. The aim is to find new or previously unknown patterns & linkages between data items to obtain additional value and insight. An example of machine learning in action is Netflix which is constantly trying to improve its movie recommendation system based on a user’s previous viewing, their characteristics and also the features of their other customers with a similar set of attributes.

MapReduce

Mapreduce is a framework for processing large amounts of data across a large number of nodes or machines.

http://code.google.com/edu/parallel/img/mrfigure.png
Map Reduce diagram (courtesy of Google)

Mapreduce works by splitting out (or mapping) requests into multiple separate tasks to be performed on many nodes of the system and then collates and summarises the results back (or reduces) to the outputs.

Mapreduce based on the java language and is the basis of a number of the higher level tools (Hive, Pig) used to access and manipulate large data sets.

Google (amongst others) developed and use this technology to process large amounts of data (such as documents and web pages trawled by its web crawling robots). It allows the complexity of parallel processing, data location and distribution and also system failures to be hidden or abstracted from the requester running the query.

MPP

MPP stands for massively parallel processing and it is the concept which gives the ability to process the volumes (and velocity and variety) of data flowing through systems. Chip processing capabilities are always increasing but to cope with the faster increasing amounts of data processing needs to be split across multiple engines. Technology that can split out requests into equal(ish) chunks of work, manage the processing and then join the results has been difficult to develop.  MPP can be centralised with a cluster of chips or machines in a single or closely coupled cluster or distributed where the power of many distributed machines are used (think ‘idle’ desktop PCs overnight usage as an example). Hadoop utilises many distributed systems for data storage and processing and also has fault tolerance built in which enables processing to continue with the loss of some of those machines.

NoSQL

NoSQL really means ‘not only SQL’, it is the term used for database management systems that do not conform to the traditional RDBMS model (transactional oriented data management systems based on the ACID principle). These systems were developed by technology companies in response to challenges raised by the high volumes of data. Amazon, Google and Yahoo built NoSQL systems to cope with the tidal wave of data generated by their users.

Pig

Apache Pig is a platform for analysing huge data sets. It has a high-level language called Pig Latin which is combined with a data management infrastructure which allows high levels of parallel processing. Again, like Hive, the Pig Latin is compiled into MapReduce requests. Pig is also flexible so additional functions and processing can be added by users for their own specific needs.

Real Time

The challenges in processing the “V”‘s in big data (volume, velocity and variety) have meant that some requirements have been compromised. It the case of Hadoop and Mapreduce this has been the interactive or instant availability of the results. Mapreduce is batch orientated in the sense that requests are sent for processing where they are then scheduled to be run and then the output summarised. This works fine for the original purposes but now the ability to become more real-time or interactive are growing. With a ‘traditional’ database or application users expect the results to be available instantly or pretty close to instant. Google and others are developing more interactive interfaces to Hadoop. Google has Drill and Twitter has release Storm. We see this as one of the most interesting areas of development in the Big Data space at the moment.

 

Over the next few months we have some guest contributors penning their thoughts on the future for big data, analytics and data science.  Also don’t miss Tim Seears’s (TheBigDataPartnership) article on maximising value from your data “Feedback Loops” published here in June 2012.

For the technically minded Damian Spendel also published some worked examples using ‘R’ language on Data Analysis and Value at Risk calculations.

These are our thoughts on the products and technologies – we would welcome any challenges or corrections and will work them into the articles.

 

Data Analysis – An example using R

Posted on : 31-08-2014 | By : richard.gale | In : Data

Tags: , , , , , ,

3

With the growth of Big Data and Big Data Analytics the programming language R has become a staple tool for data analysis.   Based on modular packages (4000 are available), it offers sophisticated statistical analysis and visualization capabilities.   It is well supported by a strong user community.   It is also Open Source.

For the current article I am assuming an installation of R 3.1.1[1] and RStudio[2].

The article will cover the steps taken to provide a simple analysis of highly structured data using R.

The dataset I will use in this brief demonstration of some basic R capabilities is from a peer-to-peer Lending Club in the United States[3].  The dataset is well structured with a data dictionary and covers both loans made and loans rejected.   I will use R to try and find answers to the following questions:

  • Is there a relationship between Loan amounts and funded amounts?
  • Is there a relationship between the above and the number of previous public bankruptcies?
  • What can we find out about rejections?
  • What can we find out about the geographic distribution of the Lending Club loans?

During the course of this analysis we will use basic commands R to:

  • Import data
  • Plot data using scatterplots and regression lines
  • Use functions to perform heavy lifting
  • Summarize data using simple aggregation functions
  • Plot data using choropleths

Having downloaded the data to our working directory we’ll import the three files using read.csv and merge them together using rbind() (row bind):

>data_lending0 <- read.csv(“LoanStats3a.csv”, header =FALSE)

>data_lending1 <- read.csv(“LoanStats3b.csv”, header =FALSE)

>data_lending2 <- read.csv(“LoanStats3c.csv”, header =FALSE)

>data_full <- rbind(data_lending0, data_lending1, data_lending2)

We can now explore the data using some of R’s in-built functions for metadata – str (structure), names (column names), unique (unique values of a variable).

The first thing I will do is use ggplot to build a simple scatter plot showing the relationship between the funded amount and the loan amount.

>install.packages(“ggplot2″)

>library(ggplot2)

>ggplot(data_full, aes(x=loan_amnt, y=funded_amnt)) + geom_point(shape=1) + geom_smooth(method=lm)

The above  three lines install the package ggplot2 from a CRAN mirror, load the library into the R environment, and then use the “Grammar of Graphics” to build a plot using the data_full dataset, with the x-axis showing the principle (loan_amnt) and the y-axis showing the lender’s contribution (funded_amnt).  With geom_smooth we add a line to help see patterns – in this case a line of best fit.

In R Studio we’ll now see the following plot:

 

 

This shows us clearly that the Lending Club clusters loans at the lower end of the spectrum and that there is a clear positive correlation between the loan_amount and funded_amnt – for every dollar you bring you can borrow a dollar, there is little scope for leverage here.   Other ggplot functions will allow us to tidy up the labeling and colours, but I’ll leave that as an exercise for the interested reader.

 

 

The next step is to add an additional dimension – and investigate the link between principles and contributions under the aspect of known public bankruptcies of the applicants.

>ggplot(data_full, aes(x=loan_amnt, y=funded_amnt, color=pub_rec_bankruptcies)) + geom_point(shape=19,alpha=0.25) + geom_smooth(method=lm)

Here, I’ve used the color element to add the additional dimension and attempted to improve legibility of the visualization by making the points more transparent.

 

 

Not very successfully, it doesn’t help us further – maybe sampling could improve the visualization, or a more focused view….

 

 

 

 

Let’s have a quick look at the rejection statistics:

> data1 <- rbind(read.csv(“RejectStatsA.csv”), read.csv(“RejectStatsB.csv”))

> nrow(rejections)/nrow(data_full)

[1] 6.077912

For every application – six rejections.

Another popular method of visualization is using choropleth (“many places”) visualizations.   In this case, we’ll build a map showing outstanding loans by State.

The bad news is that the Lending Club data uses two-letter codes and that state data we’ll use from the maps package (install.packages, library etc….) is the full name.   Fortunately,  a quick search provides a function “stateFromLower”[4] that will perform this for us.   So, I run the code that creates the function and then add a new column called state (“WY”) to the data_full dataset, and use stateFromLower to conver the addr_state column (“Wyoming”):

> data_full$state <- stateFromLower(data_lending$addr_state

Then, I aggregate the principles by state:

> loan_amnts <- aggregate(data_lending$funded_amnt, by=list(data_lending$state), FUN=sum)

Load the state data:

> data(state)

The next code leans heavily on a succinct tutorial provided elsewhere[5]:

> map(“state”, fill=FALSE, boundary=TRUE, col=”red”)

> mapnames <- map(“state”, plot=FALSE)$names

Remove regions:

> region_list <- strsplit(mapnames, “:”)

> mapnames2 <- sapply(region_list, “[“, 1)

Match the region-less mapnames to the loan amounts:

> m <- match(mapnames2, tolower(state.name))

> loans <- loan_amnts$x

Bucketize the aggregated loans:

> loan.buckets <- cut(loans, c(“500000”, “1000000”, “5000000”, “10000000”, “15000000”, “20000000”, “30000000”, “40000000”, “90000000”, “100000000”))

Define the colour schema:

> clr <- rev(heat.colors(13))

Draw the map and add the title:

> map(“state”, fill=TRUE, col=clr[loan.buckets], projection = “polyconic”)

> title(“Lending Club Loan Amounts by State”)

Create a legend:

> leg.txt <- c(“$500000”, “$1000000”, “$5000000”, “$10000000”, “$15000000”, “$20000000”, “$30000000”, “$40000000”, “$90000000”, “$100000000”)

> legend(“topright”, leg.txt, horiz = FALSE, fill = clr)

With a few simple lines of code R has demonstrated that it is quite a powerful tool for generating visualizations that can aid in understanding  and analyzing data.    We were able to understand something about the lending club – it almost seems like we have clear KPIs in terms of rejections and in terms of loan amounts to funding amounts.    I think we can also see a link between the Lending Club’s presence and poverty[6].

This could be a starting point for a more detailed analysis into the Lending Club, to support investing decisions, or internal business decisions (reduce rejections, move into Wyoming etc.).

Guest author: Damian Spendel – Damian has spent his professional life bringing value to organisations with new technology. He is currently working for a global bank helping them implement big data technologies. You can contact Damian at damian.spendel@gmail.com

Is it possible to prevent those IT Failures?

Posted on : 30-05-2014 | By : richard.gale | In : Cyber Security

Tags: , , , , , , ,

0

Last month we counted down our Top 10 Technology Disasters. Here are some of our tips on project planning  which may help avoid failure in the future.

Objectives

What is the project trying to achieve. This should be clear and all involved in the project including the recipients of the solution need to know what they are. Having unclear or unstated goals will not only impact the chances of success but also it will be unclear what ‘success’ is if it occurs.

Value 

The value of the project to the organisation needs to be known and ‘obvious’. Too many projects start without this basic condition.

If the organisation is no better off after the project has been completed then there is little point starting it. Better off can be defined in many ways – business advantage/growth, cost savings/efficiency, internal/external push (e.g. something will break or an auditor or regulator requires it to be done).

Projects are too often initiated for unclear or obscure reasons ranging from “we have some budget to spend on something” through “we would like to play with this new technology and need a project to enable us to to this” to “We’ve started so we’ll finish” when the business has changed or has moved onto other priorities.

Having a clear understanding of the value of the work and a method of measuring success through and after the project has delivered should be a fundamental part of any change process.

Scale

Large projects are difficult. Some projects need to be large, there would be little point building half of London’s Cross Rail tunnels, but large projects seem more likely to fail (or at least get more publicity when they do). Complexity rises logarithmically as projects grow due to the rise in connectivity of the risks, issues, logistics and numbers of people  involved.

Breaking projects down into manageable pieces increases the likelihood of successful outcomes. The projects need to be woven into an overall programme or framework to ensure the sum of the parts does end up equalling the whole though.

Duration

In a similar vein as Scale above. Projects with an extended duration are less likely to achieve full value. Businesses are not static and they change over time and with that their objectives and goals change with them. The longer a project is running the more likely it is that what the  business requires now is not what is being delivered.

As outlined above, some projects are so large that they will run for multiple years, if they do they clear milestones need to be set on a much shorter time perspective to avoid a loss of control (in terms of scope, time, cost). Also regular review points should be built into lengthy projects to reconfirm that business objectives are still being met – ultimately that that change is still required…

Accountability

Nothing new here but someone with both interest in the success and seniority to ensure acceptance should be accountable for the success of the project. If the key stakeholder is not engaged in terms of ownership and driving the project along to completion then the chances of a successful outcome are greatly diminished.

Empowerment

The other side of Accountability is Empowerment. Successful projects need to have empowered teams that understand the objectives of their project, their important part within it and are able to make decisions to guide it to completion. Projects where there is a top-down or command-and-control philosophy may succeed but the person making all the decisions needs to be right all the time. Teams go into reactive or ‘follow without questioning’ modes of operating which will increase the likelihood that that the wrong decision will be made and accepted resulting in project failure.

In conclusion, make sure the project goals are clear, it is adding value to the business, keep it short, ensure senior leadership buy in and ensure the team can make the right decisions! If only it was this easy…

What Tech companies can learn from Banks – There’s no such thing as a free lunch.

Posted on : 23-12-2013 | By : richard.gale | In : Finance, Innovation

Tags: , , , , , , , , , , ,

2

Rewind to the 1990’s 

In the early ’90s I moved from a Californian software start-up to a venerable merchant bank in the City. There were a number of changes in culture, which included being admonished for not wearing a jacket when walking through reception, but most thrilling was the staff restaurant…. It was free and you could eat as much as you like. I couldn’t believe my luck! Older hands complained that they could no longer have a glass or two of wine with lunch (also free) and that the breakfasts ‘weren’t like they used to be’. I was amazed that the bank could afford to give away so much and munched my way through most of the decade there.

Banking Innovation

Apart from the impact on my waistline it was an exciting time. Historically, banking had been a straightforward affair, but now there was ever increasing demand for new, innovative financial services and the profitability of these could be immense. I worked with the derivatives team for a while and they made huge amounts of money creating & selling bonds to allow exposure to emerging stock markets. These securities were complex and an increasing number of mathematical wizards and PhD’s came into the department attracted by the dual carrots that banking was becoming fun & fashionable (again) and vast quantities of money delivered in bonuses.

Complexity

As more rivals joined the market the products became more complex and I, for one, soon lost the ability to work out what the underlying securities & risks were. I’m sure the clients had a better understanding than me but was not always convinced.

The drive for new and more exotic products accelerated and the intake of post-doctorates rose further. There were a number on the team that really were ‘rocket scientists’… The bank was increasing sales and profitability through the creation and trading of these products and everything was good.

Financial services firms moved into new sectors they may not have had the same level of expertise in which resulted in varying degrees of success.

Concentration

As the demand for products grew and the supply of brain power was limited the inevitable happened and the price of skilled product innovators and traders went up. Also there were a significant number of acquisitions where global banks bought companies or teams with the lure of huge bonuses. This led to the concentration of skills within a small number of large organisations.

Costs

In addition to the high price of the Rainmakers, the cost of settling, accounting and monitoring the trades was rising. Small departments that could rely on shared knowledge (and a degree of shouting) became too large and compliance forced separation of roles (particularly after the Baring’s affair). This resulted in a much increased level of fixed people costs for the banks which was fine when business was growing but a heavy burden if the growth slowed as it was difficult to reduce the number of people without the processes failing.

Over-stretched

It is probably fair to say that a number of financial services firms over-stretched themselves financially and some of them both legally & morally too in pursuit of continued profitability & growth.

The now global complex web of front/back office interactions, teams of people trading and ensuring successful completion of trades, needed to be fed with even more new types of products. Some of these (such as sliced & diced packages of mortgage backed security derivatives) contributed to the banking crisis of 2008. Over ambitious expansion plans through acquisition & merger increased unwise leverage further.

Lessons learnt

Financial Services are now one of the most highly regulated and controlled industries in the world. The cost of doing business is extremely high as reflected in the large amounts of money being spent of regulatory and compliance projects. This is resulting in a smaller number of larger organisations running most of the global banking sector with reduced opportunities and, perhaps, less  inclination to be innovative.

 

Fast-forward to now

 

Tech companies are massively innovative with new and exciting products emerging all the time

Technology products are the most exciting and most accessible they have ever been.

Tech companies can be immensely valuable and command huge stock market valuations

Too many to mention here… Google, Facebook, Twitter, Amazon, anything new etc.

Tech companies have virtually unlimited amounts of money available to them.

Tech companies attract the top talent

Tech companies are fashionable, can pay well and have the additional attraction of huge bonuses in the form of share options.

 How these applications work is  a mystery to the average consumer

We use, buy and promote products often without understanding or even caring about where our data is going because they make our lives easier or more fun.

 There are a small number of large companies dominating the market

Any small innovative company is snapped up by one of the global giants, they have very deep  pockets and price is almost irrelevant to them over market share.

Costs are increasing

The older more established firms (Microsoft, Oracle etc) have large cost bases which is impacting their ability to innovate and also results in a lot of hungry mouths to feed which can eat into money potentially better used elsewhere such as R&D. ‘Newer’ Tech firms may not have reached that but will sometime soon (Google employees have increased from 20,000 in 2010 to 50,000+ in 2013). Not all of those can be working on front-line product innovation.

Tech companies provide free lunch

Most technology companies are desperate to retain their valuable staff so provide many mechanisms to do this; free massages, childcare & lunches. I have been reliably informed that Google provide an unlimited buffet in their London campus including lobster…

 

When will technology companies stop serving free lunches?

The Technology sector is part way through a sustained boom. How long it will last for is anyone’s guess but it would be good to think that the leaders of these companies can learn from the past and the mistakes made by some of the banks. If they think about how they are growing, what areas they are getting into, how well do they understand the products and risks, what impact does it have on the agility and complexity of their business and how can they prevent a drift towards complacency? How to stay aligned to the interests of your customers whilst continuing to remain profitable for the interests of your shareholders – it’s a difficult challenge when high levels of growth have been the norm.

If Tech companies do not recognise this and change then living up to those company slogans may get harder as employee numbers swell and profits get squeezed.

 

The next Banking crisis? Too entangled to fail…

Posted on : 30-10-2013 | By : richard.gale | In : Finance

Tags: , , , , , , , , , , ,

0

Many miles of newsprint (& billions of pixels) have been generated discussing the reasons for the near collapse of the financial systems in 2008. One of the main reasons cited was that each of the ‘mega’ banks had such a large influence on the market that they were too big to fail, a crash of one could destroy the entire banking universe.

Although the underlying issues still exist; there are a small number of huge banking organisations, vast amounts of time and legislation has been focused on reducing the risks of these banks by forcing them to hoard capital to reduce the external impact of failure. An unintended consequence of this has been that banks are less likely to lend so constricting firms ability to grow and so slowing the recovery but that’s a different story.

We think, the focus on capital provisions and risk management, although positive, does not address the fundamental issues. The banking system is so interlinked and entwined that one part failing can still bring the whole system down.

Huge volumes of capital is being moved round on a daily basis and there are trillions of dollars ‘in flight’ at any one time. Most of this is passing between banks or divisions of banks. One of the reasons for the UK part of Lehman’s collapse was that it sent billions of dollars (used to settle the next days’ obligations) back to New York each night. On the morning of 15th September 2008 the money did not come back from the US and the company shut down. The intraday flow of capital is one of the potential failure points with the current systems.

Money goes from one trading organisation in return for shares, bonds, derivatives, FX but the process is not instant and there are usually other organisations involved in the process and the money and/or securities are often in the possession of different organisations in that process.

This “Counterparty Risk” is now one of the areas that banks and regulators are focussing in on. What would happen if a bank performing an FX transaction on behalf of a hedge fund stopped trading. Where would the money go? Who would own it and, as importantly, how long would it take for the true owner to get it back. The other side of the transaction would still be in flight and so where would the shares/bonds go? Assessing the risk of a counterparty defaulting whilst ensuring the trading business continues is a finely balanced tightrope walk for banks and other trading firms.

So how do organisations and governments protect against this potential ‘deadly embrace’?

Know your counterparty; this has always been important and is a standard part of any due diligence for trading organisations, what is as important is;

Know the route and the intermediaries involved; companies need as much knowledge of the flow of money, collateral and securities as they do for the end points. How are the transactions being routed and who holds the trade at any point in time. Some of these flows will only pause for seconds with one firm but there is always a risk of breakdown or failure of an organisation so ‘knowing the flow’ is as important as knowing the client.

Know the regulations; of course trading organisations spend time & understand the regulatory framework but in cross-border transactions especially, there can be gaps, overlaps and multiple interpretations of these regulations with each country or trade body having different interpretation of the rules. Highlighting these and having a clear understanding of the impact and process ahead of an issue is vital.

Understanding the impact of timing and time zones; trade flows generally can run 24 hours a day but markets are not always open in all regions so money or securities can get held up in unexpected places. Again making sure there are processes in place to overcome these snags and delays along the way are critical.

Trading is getting more complex, more international, more regulated and faster. All these present different challenges to trading firms and their IT departments. We have seen some exciting and innovative projects with some of our clients and we are looking forward to helping others with the implementation of systems and processes to keep the trading wheels oiled…