Archive

Archive for the ‘Data Analysis and Visualisation’ Category

A brain systems visualisation tool

January 4, 2011 Leave a comment

brainSCANr.

This looks like a fantastic visualisation tool – but one that should prove useful as a research tool.

The Brain Systems, Connections, Associations, and Network Relationships (a phrase with more words than strictly necessary in order to bootstrap a good acronym) assumes that somewhere in all the chaos and noise of the more than 20 million papers on PubMed, there must be some order and rationality.

To that end, we have created a dictionary of hundreds of brain region names, cognitive and behavioral functions, and diseases (and their synonyms!) to find how often any two phrases co-occur in the scientific literature. We assume that the more often two terms occur together (at the exclusion of those words by themselves, without each other), the more likely they are to be associated.

Are there problems with this assumption? Yes, but we think you’ll like the results anyway. Obviously the database is limited to the words and phrases with which we have populated it. We also assume that when words co-occur in a paper, that relationship is a positive one (i.e., brain areas A and B are connected, as opposed to not connected). Luckily, there is a positive publication bias in the peer-reviewed biomedical sciences that we can leverage to our benefit (hooray biases)! Furthermore, we cannot dissociate English homographs; thus, a search for the phrase “rhythm” (to ascertain the brain regions associated with musical rhythm) gives the strongest association with the suprachiasmatic nucleus (that is, for circadian rhythms!)

Despite these limitations, we believe we have created a powerful visualization tool that will speed research and education, and hopefully allow for the discovery of new, previously unforeseen connections between brain, behavior, and disease.

H/T: Marsha Lucas

Mismeasuring scientific quality (and an argument in favour of diversity of measurement systems)

December 27, 2010 2 comments

There was a short piece here recently on the misuse of impact factors to measure scientific quality, and how this in turn leads to dependence on drugs like Sciagra™ and other dangerous variants such as Psyagra™ and Genagra™.

Here’s an interesting and important post from Michael Nielsen on the mismeasurement of science. The essence of his argument is straightforward: unidimensional reduction of a multidimensional variable set is going to lead to significant loss of important information (or at least that’s how I read it):

My argument … is essentially an argument against homogeneity in the evaluation of science: it’s not the use of metrics I’m objecting to, per se, rather it’s the idea that a relatively small number of metrics may become broadly influential. I shall argue that it’s much better if the system is very diverse, with all sorts of different ways being used to evaluate science. Crucially, my argument is independent of the details of what metrics are being broadly adopted: no matter how well-designed a particular metric may be, we shall see that it would be better to use a more heterogeneous system.

Nielsen notes three problems with centralised metrics (this can be relying solely on a h-index, citations, publication counts, or whatever else you fancy):

Centralized metrics suppress cognitive diversity: Over the past decade the complexity theorist Scott Page and his collaborators have proved some remarkable results about the use of metrics to identify the “best” people to solve a problem (ref,ref).

Centralized metrics create perverse incentives: Imagine, for the sake of argument, that the US National Science Foundation (NSF) wanted to encourage scientists to use YouTube videos as a way of sharing scientific results. The videos could, for example, be used as a way of explaining crucial-but-hard-to-verbally-describe details of experiments. To encourage the use of videos, the NSF announces that from now on they’d like grant applications to include viewing statistics for YouTube videos as a metric for the impact of prior research. Now, this proposal obviously has many problems, but for the sake of argument please just imagine it was being done. Suppose also that after this policy was implemented a new video service came online that was far better than YouTube. If the new service was good enough then people in the general consumer market would quickly switch to the new service. But even if the new service was far better than YouTube, most scientists – at least those with any interest in NSF funding – wouldn’t switch until the NSF changed its policy. Meanwhile, the NSF would have little reason to change their policy, until lots of scientists were using the new service. In short, this centralized metric would incentivize scientists to use inferior systems, and so inhibit them from using the best tools.

Centralized metrics misallocate resources: One of the causes of the financial crash of 2008 was a serious mistake made by rating agencies such as Moody’s, S&P, and Fitch. The mistake was to systematically underestimate the risk of investing in financial instruments derived from housing mortgages. Because so many investors relied on the rating agencies to make investment decisions, the erroneous ratings caused an enormous misallocation of capital, which propped up a bubble in the housing market. It was only after homeowners began to default on their mortgages in unusually large numbers that the market realized that the ratings agencies were mistaken, and the bubble collapsed. It’s easy to blame the rating agencies for this collapse, but this kind of misallocation of resources is inevitable in any system which relies on centralized decision-making. The reason is that any mistakes made at the central point, no matter how small, then spread and affect the entire system.

What of course is breath-taking is that scientists, who spend so much time devising sensitive measurements of complex phenomena, can sometimes suffer a bizarre cognitive pathology when it comes to how the quality of science itself should be measured.  The sudden rise of the h index is surely proof of that. Nothing can actually substitute for the hard work of actually reading the papers and judging their quality and creativity.  Grillner and colleagues recommend that “Minimally, we must forego using impact factors as a proxy for excellence and replace them with indepth analyses of the science produced by candidates for positions and grants. This requires more time and effort from senior scientists and cooperation from international communities, because not every country has the necessary expertise in all areas of science.” Nielsen makes a similar recommendation.

The World Of Big Data – The Daily Dish | By Andrew Sullivan

December 20, 2010 Leave a comment

Great post on ‘The World Of Big Data’ by Andrew Sullivan -reproduced in full below.

In passing, a Government truly interested in developing the smart economy would engage in massive data dumps with the presumption that just about every piece of data it holds (excluding the most sensitive pieces of information) from ministerial diaries to fuel consumption records for Garda cars to activity logs for mobile phones to numbers of toilet rolls used in Government Departments would be dumped in realtime on to externally-interrrogable databases. This would be geek-heaven and would generate new technological applications beyond prediction and application. And the activity would be local – could an analyst sitting in Taiwan really make sense of local nuances? The applications would be universal, portable and saleable, however. They would seed a local high-tech industry – maybe even a local Irish Google. Can’t see the Civil Service going for it, though…

Elizabeth Pisani explains (pdf) why large amounts of data collected by organizations like Google and Facebook could change science for the better, and how it already has. Here she recounts the work of John Graunt from the 17th century:

Graunt collected mortality rolls and other parish records and, in effect, threw them at the wall, looking for patterns in births, deaths, weather and commerce. … He scraped parish rolls for insights in the same way as today’s data miners transmute the dross of our Twitter feeds into gold for marketing departments. Graunt made observations on everything from polygamy to traffic congestion in London, concluding: “That the old Streets are unfit for the present frequency of Coaches… That the opinions of Plagues accompanying the Entrance of Kings, is false and seditious; That London, the Metropolis of England, is perhaps a Head too big for the Body, and possibly too strong.”She concludes:

A big advantage of Big Data research is that algorithms, scraping, mining and mashing are usually low cost, once you’ve paid the nerds’ salaries. And the data itself is often droppings produced by an existing activity. “You may as well just let the boffins go at it. They’re not going to hurt anyone, and they may just come up with something useful,” said [Joe] Cain.

We still measure impact and dole out funding on the basis of papers published in peerreviewed journals. It’s a system which works well for thought-bubble experiments but is ill-suited to the Big Data world. We need new ways of sorting the wheat from the chaff, and of rewarding collaborative, speculative science.

[UPDATE] Something I noticed in The Irish Times:

PUBLIC SECTOR: It’s ‘plus ca change’ in the public service sector, as senior civil servants cling to cronyism and outdated attitudes, writes GERALD FLYNN:

…it seems now that it was just more empty promises – repeating similar pledges given in 2008. As we come to the end of yet another year, there is still no new senior public service structure; no chief information officer for e-government has been appointed; no reconstitution of top-level appointments has taken place; and no new public service board has been appointed [emphasis added].

So nothing will happen.

Europe geographically stereotyped: some nifty geographical data visualisation

September 28, 2010 Leave a comment

Europe geographically stereotyped (via Flowing Data):

Graphic designer Yanko Tsvetkov takes such notions of Europe in his series of stereotype maps, which themselves are stereotypes of stereotypes.

Original here (with high res maps).

How do ranking systems for universities rate against each? (from Flowing Data)

September 14, 2010 Leave a comment

Various ways to rate a college.

There’s been lots of moaning on this blog about the utility of the ranking systems for universities (see tag cloud at right for examples). One issue of particular interest is the extent to which the ranking systems all reliably measure the same underlying construct – that of ‘university quality’. Without the required statistical intercorrelational and other analyses, it is difficult to know what to make of the differences between the differing systems.

Here’s a great post from Flowing Data that illustrates the problem well (sourcepost at the Chronicle of Higher Education).  The differing systems just don’t overlap very well – which suggests they are not all measuring the same thing – no matter what anybody says!

Various ways to rate a college

Measures for different college ratings

There are a bunch of college ratings out there to help students decide what college to apply to (and give something for alumni to gloat about). The tough part is that there doesn’t seem to be any agreement on what makes a good college. Alex Richards and Ron Coddington describe the discrepancies.

Notice how few measures are shared by two or more raters. That indicates a lack of agreement among them on what defines quality. Much of the emphasis is on “input measures” such as student selectivity, faculty-student ratio, and retention of freshmen. Except for graduation rates, almost no “outcome measures,” such as whether a student comes out prepared to succeed in the work force, are used.

This, on top of spotty data across universities, makes rankings, especially for schools that are close in ratings to each other, difficult to know which one to follow. This goes for other types of ratings too. Any headline that starts with “Best states/countries/schools/programs/etc to…” requires some salt since rankings can change dramatically depending on the measures.

But you already knew that, right?

One thing is for sure though. UCLA and Cal stat departments are the best programs to be in. That’s fact.

[Thanks, Ron]

A slightly daft but nifty piece of data visualisation: Every country must be number one at something…

For the art-loving neuroscientist: From Scientific American – Michelangelo’s secret message in the Sistine Chapel: A juxtaposition of God and the human brain

Douglas Fields has a very interesting post at Scientific American – seems Michelangelo has hidden in many of his Sistine Chapel paintings illustrations of the dissected central nervous system:

At the age of 17 he began dissecting corpses from the church graveyard. Between the years 1508 and 1512 he painted the ceiling of the Sistine Chapel in Rome. Michelangelo Buonarroti—known by his first name the world over as the singular artistic genius, sculptor and architect—was also an anatomist, a secret he concealed by destroying almost all of his anatomical sketches and notes. Now, 500 years after he drew them, his hidden anatomical illustrations have been found—painted on the ceiling of the Sistine Chapel, cleverly concealed from the eyes of Pope Julius II and countless religious worshipers, historians, and art lovers for centuries—inside the body of God.This is the conclusion of Ian Suk and Rafael Tamargo, in their paper in the May 2010 issue of the scientific journal Neurosurgery.

The brainstem dissection looks convincing to me, and most especially the pons and medulla, but as Fields notes: ‘The mystery is whether these neuroanatomical features are hidden messages or whether the Sistine Chapel a Rorshach tests upon which anyone can extract an image that is meaningful to themselves. The authors of the paper are, after all, neuroanatomists. The neuroanatomy they see on the ceiling may be nothing more than the man on the moon.’

Concealed Neuroanatomy in Michelangelo’s Separation of Light From Darkness in the Sistine Chapel,” by Ian Suk and Rafael J. Tamargo in Neurosurgery, Vol. 66, No. 5, pp. 851-861.