Take this ontology and shove it. Or, why classification matters.

I was recently called out by one of my colleagues for saying that ontologies were boring, this despite my own doctoral work on knowledge representation. Motivating my glib comment was an image of a group of pasty-faced individuals gathered around a large boardroom table and discussing which angel fit on which pin. Events from this past weekend are a reminder why such glibness is not helpful.

The American Psychiatric Association has just approved a set of updates, revisions and changes to the reference manual (DSM5) used to diagnose mental disorders. Among the changes are those redefining the inclusion and exclusion criteria for autistic disorders. By changing which children are classified as having an autistic disorder, parents will be made to feel more or less comfortable having a child carrying the diagnosis. Just as importantly, insurance companies and school programs might shift their criteria that determine which child and family gets what kind of support and at what cost. In the near term, clinical trials for the treatment of autism may not include the same patients as they would have prior to this retaxonomization.

So, are ontologies boring? Perhaps. But they certainly belong to the class of hugely important societal constructs.

Hat tip: David Osterbur.


Learning from the FDA

It is not too often that one is driven to read a report from a regulatory agency. Even rarer are the instances when we find prismatic examples of engineering and organizational leadership in these reports. That makes this strategic plan of the FDA Information Management and Office of Information Management all the more remarkable. As an inducement to read the full report, here are some results that the informatics and IT departments of many organizations, academic and industrial, would envy.

  • Reducing the number of servers from 397 to 265 (by a virtualization and hosting approach).
  • Not coincidentally, availability (i.e. not downtime) increased from 98.3% to 99.9996% (the difference between 30 seconds of unscheduled downtime and over 6 days unscheduled downtime).
  • Billions of intrusion attempts against FDA IT Systems annually with no major information security breaches.
  • Supporting annual 5-15% increases in IT capability without increases in budget.
  • Training budget for personnel eliminated and training activities of personnel increased based on savings from reduced external consultant fees.
  • Annual decrease in cost of data storage.


No Publication Without Taxation?

I recently obtained a copy of a presentation by Elsevier representatives describing how they price their publications for different academic (i.e. university) customers. They describe how they place each institution into one of a small number of pricing tiers. How do they decide this categorization? A major component of the decision is the research intensity of the institution to which they were selling access to their publications. What was the measure of research intensity? Publication volume as measured by the SCOPUS database (an Elsevier product).

To better understand this policy, I contacted Elsevier and spoke by phone with a very cordial and clear executive. I asked why it seemed that the more we published, the more we were going to be charged by Elsevier to read these publications?

He explained that research intensity served as a proxy for the value that institutions might place on these publications. After all, if we value these publications, then we are more likely to download/use them. And pricing should reflect value.

I then asked why they were not simply measuring the download rate for each journal to directly charge for usage rather than using a proxy measure. The executive explained that their advisory board had recommended not to use the download rate as that might reduce usage thereby impeding scholarship.

So solicitous of the publisher. I knew it would be futile to bring up the effect on scholarship of inflation rates of publication prices that would make the rate of medical care inflation appear flat by comparison. A continual record of inflation that reduces every year the fraction of their own scholarly output made available to scholars throughout the world's academic institutions.


Yet Another Healthcare Research Steeplechase Barrier

I was recently informed that I have to take an on-line course about conflict of interest and then document all commercial activities. I was then informed of the same duty three more times. Because I am engaged in research at multiple universities, hospitals and medical schools, these education and reporting activities have to be done multiple times and of course the forms and "educational" syllabi are all different.

Like all large, heavily funded (and rewarded) organized activities, healthcare research has acquired over time some practices that originally made sense but have evolved into anti-productive structures. There have been a number of highly publicized examples of truly egregious behavior by researchers who have hidden their conflicts of interest. Individuals who used the podium of academia and the cachet of their presumed impartiality to declaim, opine, publish regarding a device, therapy, or diagnostic procedure where they stood to gain financially. For decades, it was made abundantly clear to all investigators that any such conflict was to be disclosed in grant applications, in publications and presentations. Yet. it was left up the the investigator to decide whether or not there was indeed a conflict. This ultimately became an all too carefully parsed taxonomical challenge and rather than waste time in such parsing, several of us went the route of full transparency. For example, I have listed on the web all my commercial activities so that my colleagues and the public can decide whether or not there is indeed a conflict.

Presumably because not everyone has traversed this route of transparency, there are now a host of new regulations for annual disclosure. Some of them appear obvious (e.g. disclosure of equity ownership or payment for speaking), others less so (e.g reimbursement for cab fare to attend a commercial conference where there are no speaking fees). In the end these disclosures are important and yet are each of the healthcare research institutions so different as to require different reporting and educational mechanisms? It seems there is an opportunity for an enterprising company or apparatchik to create a single, authoritative form and set of educational materials. Let's just hope they keep it simple.


Hungry for DNA Games?

Thirty teams world-wide are apparently hungry enough and willing to contribute to making genomic medicine possible. Their efforts will help reduce to practice a game that to date is only within reach of star teams. May the odds be ever in our favor.


Billions and billions of gene expression measurements.

Let's say you are looking for a disease biomarker. Hopefully, one better than prostate specific antigen. Next time you or your student reach for a pipette to see if a gene is expressed in a particular tissue or disease, perhaps you should first check with the public databases of gene expression. As outlined in this article, we now have hit the one million array mark. That is, one million arrays measuring gene expression across thousands of conditions (tissues, diseases, pharmacological or environmental perturbation). And each array has tens of thousands of genes so these corpora have billions of gene expression measurements. That means you'll immediately be able to see if your favorite gene is uniquely expressed in a tissue in a specific disease. Or not.

Another way to think about these corpora is that they constitute one of the largest open access biomedical libraries. A model for clinical research to emulate?

Hat tip: Atul Butte.

Growth Microarray Data


Unstandardized standards

An insightful naïf learning about the difficulty of sharing one electronic health record from one hospital to another might reasonably ask "Why don't they just create a standard for data sharing so that I can install or delete health apps at will and view my data on several different electronic health record systems?" An expert will then inform that impertinent naïf that it's much more complicated than she understands and that the standards already exist. When challenged, the expert will cite several august committees which have ratified standards such as the Continuity of Care Document (CCD). At this point, our naive protagonist should refer the expert to this blog entry by Josh Mandel. If by then, the expert is not holding his hands to his ears, he will explain that all standards are evolving entities and that these challenges are just the expected missteps on the path of convergent evolution to interoperable samadhi.


Mad Men and Mad Nerds

Should we enable the following conversation between any website and your web browser (e.g. Safari, Internet Explorer, Firefox, Chrome)?

Your web browser: "Hello website potentiallyInteresting.com."

Website: "Hello web browser."

Your web browser: "Dear Website, please do not track who I am. Do not even try to remember that I came to visit you."

Website: "You mean you don't want me to track you even if though you did not switch on all your privacy controls such as switching off acceptance of 'cookie' files?"

Your web browser: "That's right. You are an upstanding website and when I say to you 'DNT:1" that means that I don't want you to track me. That's what the user who controls me wants."

Website: "I certainly am upstanding and I respect your statement of the user's wishes. Consider this whole visit forgotten."

Your web browser: "Thanks for your understanding."

Website: "Do I know you?"

It turns out that there is a great deal of controversy about these five characters: DNT:1. On the one side are the user advocates who argue that websites should honor the do not track directive. One the other side are the companies who fund large parts of the web infrastructure because of the advertising revenues (~70 billion) that are generated by being able to track the traffic of users across their Internet properties. These companies now wonder where their investments will go if few users allow their websites to track them. Their concerns are not misplaced. After all, how many of us would choose to keep commercial breaks on broadcast television if all we had to do is to flip a switch? Of course, those same companies could choose to have the following alternative ending to the above conversation:

Website: "I certainly am upstanding and I respect your statement of the user's wishes. But we have to pay the bills. If you will not allow tracking, I just cannot show you my contents. Sorry."

Your web browser: "OK then. I'll let my user know that she has to allow tracking if she wants to see your contents."

Website: "I will be waiting. But don't expect me to recognize you."

Ultimately, the debate is going to turn on the valuation that the public places on its autonomy and privacy relative to the broadest access to web content. For those of us in the arts, sciences or businesses of curation of the various forms of knowledge and data, the outcome of this multi-billion dollar debate will affect our work for decades to come.

Hat tip: Ben Adida


Meta-directory of the Countway community

Countway Photo Day from CBMI on Vimeo.

Dearth of Death: A Fatal Wound to Medical Research?

My esteemed colleague L.J. Wei often reminds us that health outcomes which are not as hard-edged as death can be misleading. For example, the early press, decades ago, about the uncovering of early cancer by the Prostate Specific Antigen (PSA) was used to justify the surgical removal of hundreds of thousands of prostates. In hindsight, neither the PSA test nor much of the ensuing expensive and occasionally morbid surgeries made a significant dent in lifespan.

One might therefore reasonably conclude that the government, the census bureau, Social Security Administration or the Department of Health and Human Services would therefore place the highest premium on the accurate reporting of death, and its causes, for our citizens. Surely, those data are the incontrovertible evidentiary base for our public health monitoring, medical treatment evaluations (whether of drug, device or procedure), and projections of the fundamental demographics of our nation. So, it might be all too easy for most of us to overlook or dismiss the following innocuous-appearing bureaucratese-laden announcement

IMPORTANT NOTICE The National Technical Information Service (NTIS) has been notified by the Social Security Administration (SSA) of an upcoming important change in the Death Master File data. NTIS, a cost-recovery government agency, disseminates the DMF data on behalf of SSA. Please see the attachment, provided by SSA, for an explanation of the change. The implementation date of this change is November 1, 2011. Should you have any questions, please email me at wstrickland@ntis.gov and I will be happy to forward any questions not answered by the attachment to the Social Security Administration for reply.

What does this mean? It means that there is no longer a single, federal authoritative source of death records. Most of the operational details have now devolved to individual states without guarantees of consistency of reporting or a one-stop-shop for researchers looking for the national distribution of the Grim Reaper. Will we have to resort to crowd-sourcing death now in order to perform accurate population research?

Hat tip: Shawn Murphy

Death workflow


Copyright + book = disappearing act?

Many of us worry that the last three decades of scholarship will be lost to posterity because we have yet to provide an institutionalized set of mechanisms for preserving the digital output of academia that are as durable and accessible as our accreted analog/paper-based records. Now here comes word of an equally worrisome trend, the empirical evidence that worries about the effect of copyright on the access to important cultural, literary and scientific works, may be well-founded. A recent article in the Atlantic Monthly describes a yawning gulf in the sale of new books from recent decades such that there are more new books sold by Amazon from 1850, than there are from 1950. In their analyses, the investigators provide intriguing circumstantial evidence suggesting that the copyright laws may be responsible. More detail in this lecture by Paul Heald.

Hat tip: David Osterbur

Amazon copyright hole


The passing of clean taxonomies.

Among the most productive constructs of the enlightenment are the modern taxonomies. These have been helpful in bringing order to the chaos of signs and symptoms and other clinical findings and were central tools in achieving our 20th century understanding of pathophysiology. They have also have an influential role to play in reimbursement for medical services. With the dawn of high-throughput molecular diagnostics many of us recognize that we are going to be able to be far more precise in our diagnostic and therefore therapeutic approach to diseases and their prevention.

Nonetheless, as we approach the systematization of medicine, we will be reminded often that nature may not hew to the simplified models that we are developing. This recent study in the New England Journal of Medicine, just does that by demonstrating directly that within a "single" tumor there exists a large multiplicity of tumor types, each with its own genomic characteristics and therefore particular therapeutic responsiveness (or lack of it). It can be argued that this is another instance of the tension between the "neats" and the "scruffies" but perhaps it is a foreshadowing of the decreased effectiveness of taxonomies as a cognitive tool for biomedical discovery and clinical care. If indeed, the underlying substructure of physiology is best represented by a probabilistic network model that can only be best grasped and managed through the use of computational tools, we have to seriously re-evaluate both our approach to disease definition and biomedical education.


City as organism

This video from Geneva, Switzerland is a beautiful instance of the repurposing of data. Shown are the data flows between cell phones across the city over night and day. This glimpse of the interactions over time also suggests new frontiers in real-time epidemiology. What if these (anonymized) data could be tagged with symptoms (e.g. cough, sneeze) could we track the spread of infections? Public health would then start to look a lot more like intensive care medicine, providing real-time monitoring of cities or nations (taking the "pulse" of the population, evaluating the activity and coherence of its "neural" activity). Will there be a new research and medical discipline that fuses the sciences of population ecology and population health? And should populations be empowered to forego such intensive study?

(Hat tip: Joshua Parker).


Ville Vivante from Interactive Things on Vimeo.

p.s. As a someone who grew up in Geneva, I was quite surprised to see a lot of activity at 2 AM. Is this the consequence of the Swiss work ethic?

Get paid to play

Earlier, I described the SHRINE distributed query system across 6 million patients with 10 billion facts. If you are a member of the Harvard Medical School faculty (with employment at one of the affiliated hospitals) you now have the opportunity to get money and glory (more the latter than the former) to spin clinical data into biomedical gold. Details on the context can be found here: http://catalyst.harvard.edu/services/pilotfunding/shrine.html

If you have questions, use this email contact.


This terminology goes one louder.

There has been considerable controversy about the merit and risk of upgrading the terminology that is used in the USA to bill for most healthcare transactions: ICD9 to ICD10. However, given some of the concerns about the adequacy of ICD10, many are now advocating that we skip ICD10 (with costs of millions of dollars per large hospital and tens of billions of dollars, nationwide) and immediately proceed "one louder" to ICD11. It is argued that the investment will then be far more durable and with a more favorable impact on cost and quality accounting in healthcare. Others argue that we should go for the bird in the hand. No doubt many librarians could opine knowledgeably about the costs and benefits of changing classification systems and Linnaeus would be impressed by how many now labor to classify diseases, drugs and procedures.

Hat tip: Ken Mandl


Research by the numbers

What if you could mine the 10 billion medical facts across 6 million (anonymous) patients in five Harvard affiliated hospitals to ask an important and timely question? What are the other diseases or disorders associated with autism? How has the pharmacological treatment of inflammatory diseases changed over the last five years? Are there gender differences in prevalence of the infections in autoimmune diseases? How is the prevalence of diabetes mellitus changing in young adults?

Now, for the first time, if you are an eligible faculty member (or one of their fellows) in one of the five hospitals, you can now productively seek answers to these questions. The Shared Health Research Information Network (SHRINE) helps researchers overcome one of the greatest problems in population-based research: Compiling large groups of well-characterized patients. Eligible investigators may use the SHRINE web-based query tool to determine the aggregate total number of patients at participating hospitals who meet a given set of inclusion and exclusion criteria. The criteria are currently demographics, diagnoses, medications, and selected laboratory values. Because counts are aggregate, patient privacy is protected.

So, whether you are seeking a study cohort, preliminary studies for a grant proposal, or evaluating an epidemiological hypothesis, take this new tool for a spin and start translating this large mass of hard-won data into useful biomedical knowledge.


Origin of The Theses

For those of you in the throes of defining your doctoral theses, there are some wise words from Enrico Coeira from UNSW which can help you move to the end game. Here is the relevant Twitter stream.


Let's bring genome-scale sequencing into the clinic—safely and responsibly

Children's Hospital Boston today announced the launch of the CLARITY Challenge, a $25,000 competition intended to advance standards for genomic analysis and interpretation and the reporting of clear, actionable results to clinicians and patients. The competition marks the first time a healthcare institution has sent out a broad call for the development of consistent and clear ways of applying genomic insights to everyday pediatric and adult patient care.

CLARITY (Children’s Leadership Award for the Reliable Interpretation and appropriate Transmission of Your genomic information) competitors will be tasked with discovering the unknown genetic basis of the disorders faced by three pediatric patients and, in the process, create best practices for interpreting and presenting genomic sequence results to patients and their families and physicians in meaningful ways that can help guide healthcare decisions.


Ancestral betrayal

We share many things with our ancestors, including a fraction of their genetic code. This genetic link to the past has further invigorated an already large industry and hobby in the exploration of genealogy and historical provenance. This piece from today's news, shows how these same records can be used to leverage your ancestors to identify you. In this instance, a murder suspect is potentially fingered by his ancestors from the Mayflower.

Hat tip: Ben Reis


Costly questions

Mark Twain once said "Only one thing is impossible for God: To find any sense in any copyright law on the planet."

I was reminded of this upon reading this brief but highly illuminating article describing the delayed but then aggressively pursued commercialization of a widely taught and adopted mental health questionnaire. A newer, perhaps even superior (but free and public domain) questionnaire with a few overlapping questions has been "de-rezzed" after a copyright dispute with the vendor of the earlier questionnaire.

The right questions may help disrupt the progress of disease, but not as effectively as a well-timed exercise of copyright on questions can disrupt progress (with apologies to Mark Twain).

Hat tip: Atul Butte