We look at football scores on the bbc web page
as an example of messy data which is not all neatly
organised in tables.
There are countless ways to approach this and I began
thinking of applying Regex to strip out the bits of data
which I was interested in which would take a few iterations
to get to what wanted BUT then tried using excel to do it
which worked no bother!
I then worked what I had to get the data into R and applied
the dplyr and tidyr packages to get useful outputs so that I
could see how often arsenal won at home and away. The
approach shown can easily be extended to show how ofen
have 3 or more goals, how often have clean sheets, how
often both teams score and whatever we want?
For example we could add when red cards are shown and
look to see how often a team with 10 players ends up losing
to see how that differs from normal? Basically we can use
data to guide us.
Finally we look at JSON and some of the services that
are out there which can provide us with football data in a
more machine friendly way and at a webpage which provides
us with the kind of data that we just produced though being
able to do it ourselves means that we can do things that they
haven’t?
source