An Interview with CNI's Cliff LynchCreated by Matt Pasiewicz (EDUCAUSE) on April 14, 2006
In this 67 minute recording, I sit down with CNI's Cliff Lynch for a wide ranging discussion about interesting activities at CNI, gather some thoughts about large scale digitization projects, net neutrality, and microformats. We'll hear about advances in the research community, talk about a number of federal policy issues, and we'll hear some thoughts about opportunity costs associated with decisions affecting the scholars and librarians of today. We'll also chat a bit about NCSU's deployment of Endeca and the final report of UC's Bibliographic Services Task Force (PDF).
This interview is provided courtesy of CNI and was recorded at their 2006 Spring Task Force Meeting. The Coalition for Networked Information (CNI) is an organization dedicated to supporting the transformative promise of networked information technology for the advancement of scholarly communication and the enrichment of intellectual productivity. You can learn more about CNI at their web site, http://www.cni.orgTranscript Matt: Hi, this is Matt with EDUCAUSE, and joining us now is CNI's Cliff Lynch. Cliff, it's a real pleasure to be with you here at CNI's Spring Task Force meeting. What are some of the major themes that you see coming out of this month's meeting? Cliff: Well, there has been a lot happening lately, and one of the things that I find striking when I have looked at the program with that in mind, trying to extract themes, is how much a number of things seem to be progressing. And this goes really across the board, everything from the deployment of the shibboleth authorization infrastructure, which has been a slow and fairly painful process, but seems to be picking up steam, all the way through to a set of public policy kind of initiatives that seem to be finally starting to approach some conclusion. We've got the so-called orphan works inquiry which has been going on for I guess close to a year now. Matt: Cool. Can you give us a sense of CNI's priorities these days? What is on the agenda and looking forward do you see the organization's course shifting in the future? Cliff: Well, let's see. I think the last time we talked was October of 2005, and certainly in the period since then we haven't made any radical changes, and I don't honestly expect any near term radical changes. Our program is changing more now in terms of emphasis, focus, tactics. One of the areas we are spending a tremendous amount of time on is the whole set of issues around e- science and cyber infrastructure and e-research. As I talk to member institutions, a lot of them now have planning initiatives under way to try and work out institutional strategies in this area. Matt: Cool. Before we began our interview I remember you spoke briefly about your international travels, and I know that internationally, CNI certainly has a role collaborating with a number of institutions. Do you see that accelerating or growing in prominence over the next few years? Cliff: Well that is a tough issue for a CNI to manage. Let me give you the organizational parameters of this first ... we have a goodly number of members from Canada, we have some members from the UK and a couple scattered around continental Europe, but really not more than four or five I think. Then we have one or two members elsewhere. So we are very much a United States and Canada organization in terms of our membership footprint. Matt: As you mentioned, it has been some months since we last spoke, and a lot can change on internet time. So I am wondering what piques your interest most these days, and what has changed since our last interview ... what are some of the most pressing issues of great concern to the CNI community? Cliff: I mentioned a couple of the things that have, if not changed radically then certainly progressed in important ways, like the orphan works inquiry. There are some new things that are starting to emerge that I am very concerned about, but I think could have implications for the CNI community. On the public policy side there is this whole issue of so-called "net neutrality", or put another way about penalising either financially or performance-wise organizations that want to deliver bandwidth intensive applications out on the net. I am actually a little surprised at the way the press has cast this so far, which is really sort of the phone companies and the cable companies on one side, fighting out with people like Google and that was on and maybe some of the media companies on the other. I haven't heard much mention of higher education here, I haven't heard much mention of cultural memory organizations or of public broadcasting. All these institutions need to reach people off campus with increasingly sophisticated high-performance applications, and it is not just distance education scenarios. It's scenarios where you are going to have students and faculty trying to reach institutions from home to do their work. Matt: And you mentioned the potential for scholarly societies to recognise the potential impact for digital technologies on evaluating tenure and promotion procedures of organizations. How long do you think it will be before we start to see significant change in institutions or even for AN INSTITUTION to really begin to change significantly ? Cliff: I think we're already seeing signs of that. People are so conservative of course about the uncertainties associated with promotion and tenure, and you can understand why. You know, you go out there as an assistant professor, you work incredibly hard for five or six years, and then you come up for tenure. And it is sort of like your whole life is on the line there, and what a terrible time to discover that well, the specific tenure committee at your institution isn't really very enthralled with digital technology as a means of communicating or documenting scholarship. Matt: Cool, cool. Well, I don't want to dive too deeply into buzzword bingo here, but I know there is a lot of talk these days about Web 2.0 and Library 2.0. What is your assessment of this activity, and what are some areas that we might want to pay special attention to? Cliff: I guess first I need to confess a little complicity perhaps in this 2.0 business in the sense that a couple of weeks ago CNI co-sponsored a day meeting called the Reading 2.0, where are we were trying to foster a conversation between some of the major organizations doing, if you'll allow me the terms of digital libraries and library automation, some of the folks on Google and Yahoo and Adobe, O'Reilly publishing, groups like that. And it was really quite interesting, I think everybody learned a great deal about other participants and how they are approaching some of this next-generation technology. And I think it was pretty helpful. Matt: Ala microformats? Cliff: Precisely. Basically these are all these specialized ML's of various kinds, that people are starting to work within the various scientific communities. Basically all that does is reduce the ambiguity and given extra leg up to being able to compute over the literature and link it to factual databases. I can't resist mentioning, by the way, on the business of micro-formats, that is not a term that I was familiar with until a few weeks ago when it came up at that Reading 2.0 conference where someone from Yahoo was giving a short talk on their use of micro formats, and was absolutely unaware that this whole thing under the guise of specialized disciplinarian markup was starting to play out through a whole set of different scholarly disciplines. So those are exactly the kinds of crossovers that we were hoping to get out of that meeting, and that we did get some out of. Matt: Do you have any thoughts on ways that libraries might employee the use of social software? In our last interview I kind of hinted at that ... do you sense any urgency in understanding this issue? Cliff: I see a few libraries experimenting with this. There was a project which I think we have a session on in our meeting last December at, out of the University of Minnesota. Matt: The Group Lens Project? Cliff: Yes, the Group Lens Project work. Where they were trying to deploy recommended technology. There is a session here at this meeting describing some of the work that the University of California has done trying to think through how you might graft a recommender system on to the Melville catalogue. Matt: You mentioned the importance of the great degree of sensitivity offered by libraries in this regard. Doesn't that in some respects prompt the need for it, even more? When I talk about a sense of urgency, you have the private sector doing lots of activity on this front without, up until the recent Google decision, regard to some of the privacy elements for their search engine utilization, not a whole lot of regard on that front for care taking of personal data. Any thoughts on that front? Do you think that there needs to be more urgency, or some sort of vision for how we can join together as these kinds of institutions, to provide a protected, sort of trusted alternative? Cliff: Well, let me respond to a couple of different parts of that question. First off, I think that there is a growing recognition among the public broadly, and not just people and higher education, but really the broad public, as a response to this recent business with Google and the Department of Justice. Matt: Interesting. Well, the final report of the University of California Bibliographic Services Task Force seemed to create a lot of attention and I like to ask a few questions about it from your perspective. First and foremost what were some of the highlights for you and why do you think it has proven such a popular document? Cliff: I don't know about popular, but let me talk to that a little bit. I think that it was wonderful for the University of California to undertake an inquiry of that scale and really sort of going all the way back to first principals about what they're doing and it's particularly significant to me to see the University of California doing that because they've got enough scale as a system to make a lot of different choices. They have alternatives that a single research university, except in very few cases, probably doesn't have economically and so in a sense they can look at collective thinking that may reflect out into shared services that other research universities can invest in down stream. It's worth noting by the way that the Library of Congress has also undertaken, I don't want to say similar, but related kind of an inquiry and there's a draft report for that, that's going to be coming out shortly as well. Now, I think that if you look at the kinds of questions they were asking and the kind of conclusions they were drawing at least as I interpret it, they're going a couple of different areas. On is about degree of centralization, historically every campus set its own online catalogue, and then there's also been a union catalog, Melville. One question is whether that continues to make sense, particularly in an era where the online catalog is becoming less and less important as an end user information discovery tool. It's not being replaces by things like Google but things like Google are drawing traffic away from it. As long as we have big book collections especially big physical book collections, scholars are going to need online catalogs, but clearly there's.... Matt: Or APIs to them. Cliff: Yeah. Clearly there's a shift in emphasis here and I think it begs a question that other libraries are asking about, do we really need public online catalog for our collection or is some is kind of a union catalog through OCLC WorldCat or something good enough. And if we do need it for our own collection how much resource should we continue to sink in to this? Should we at this point say you know we've got a system that works well enough and we're going to just put it on maintenance? So that's one complex of questions they're dealing with. Another one is whether they should do more centralized things in terms of cataloging, and that again is a very important economic tradeoff. Historically we've moved more and more to centralized cataloging over the years, more and more use of copy cataloging, and I think they're probably going to get even more explicit about it. At the same time you've got those trends happening. Here are a couple of other trends which I think are also very much implicated in their report and I think are very significant. The first one is that we're increasingly recognizing that things other than published books and journal articles are important to scholarship. For example, image databases are playing a much greater role in both research and teaching, and you've seen the rise of systems like ArtStor recently too, start making larger collections of images available to the community. Now, if we're really going to have image databases as sort of first-class research and teaching resources, we need to describe those images. Right now there isn't the same kind of cannon of images that's been created by the system of publishing over the last umpteen years, which lends itself to centralized copy-cataloging. We've got a lot of people holding collections of images, that have partial overlap, and describing these is a hugely expensive project. Yet, I think there's a recognition that catalogers are going to spend more of their time doing description of things other than published books or journal articles, and that we need to deal with that. Another piece of it is that for textual materials we are steadily moving away from a world of surrogates, in other words, you have a physical book which is a considerable amount of nuisance to lay your hands on, and then you have a little digital record describing the physical book with a few subject headings and an author name and things like that. If you move away from that world to a world you where you have the whole book as a digital file, do you really still need that bibliographic description or can you drive it from the book? If the book is appropriately structured, so that suggests a whole reallocation of effort and expense away from kind of classic bibliographic control over time, particularly the expensive things like the generation of subject headings. If you can do full text search and computations on the whole book and its index, there's some question about how much it's really worth investing in subject headings, so they frame that question as well. These are, you know, sort of big questions that look at gradual but inexorable changes in the kind of world in which traditional bibliographic control operates, and it's really important to ask them. I don't think you're going to see really abrupt cutovers as a result of that report. You don't do abrupt cutovers when you've got stewardship responsibility for eight or 10 million physical volumes and the process of converting these to digital is going to play out over a period of decades or longer. But were starting to see things like that report challenge us to look at gradual reallocations of emphasis and reallocation of resource away from the old and towards the future. And I think it's a very important report in that sense. Matt: Yeah. Cool, cool. And how long do you think it will be before we see realization of one or more of the more aggressive options that were laid out on the report? And lets take a look at the OPAC side in particular. Cliff: I don't really have a good sense of that, and I'm not trying to be evasive, it's just that there's a whole process that going to run inside the University of California about sorting thought what do with that report. I'm not close to that process. Matt: Whether it'll be at the University of California or elsewhere perhaps. I mean do you see instances where others are sort of leaping on to this and saying "this is something that we have to do" or is it everybody sort of saying "hold steady, we don't have the budget to even consider anything like this"? Cliff: I think it's more gradual. I think that within UC it will result, over time, in a reallocation of resources. But I think, you know, they'll probably go through a pretty complicated internal process of figuring out how to implement those recommendations and which ones they're going to implement. There are probably not a enormous number of other institutions who can do things along those lines other than around the margin. You are clearly seeing more and more of the research libraries investing in the description of non-textual materials. And I think, actually you know, ArtStor may be an important development in this context because what are store has done is built up a reference collection of described images so to the extent to which, as a library, you want to license that collection, it comes with all it's description. So you're sort of back in the economics of copy cataloging, right? You describe it once centrally and piggy-backs on that. And as the ArtStor database grows and becomes more widely accepted, I expect that some institutions may start to think about it as a framework for developing a shared pool of described images. It's just a slightly different economic model than an organizational model that the one we did for books. Matt: And you talked a little bit about the marginalization of the catalog a little earlier. Meanwhile we've seen what NCSU has done with Endeca, certainly not all libraries can afford to try and reproduce that kind of technology. What do you think the future hold with regard to taking library systems like these to the next level? Are they... are we going to continue to see people experiment in these kinds of ways, or are we going to see scaling back like you mentioned earlier that people have to consider the tradeoffs of building value-added other layers versus trying to innovate on these sort of more familiar technologies? Any thoughts on that? Cliff: Well, I mean the Endeca thing is a very nice piece of work, and I'm not sure that I entirely agree with you that other libraries can't do that. I think that certainly NCSU paid some price for leadership and trailblazing as leaders and trailblazers always do. But I think you could see that kind of technology moved much more into these kind of mainstream offerings over time, and it may look a lot less exotic three years for now, five years from now, than it does today among library catalogs. I think that the really hard question, and I don't know that I really know the answer to it is, how much resource to continue to put in to enhancing your online catalog in a world where the collection of information you're trying to manage and provide access to, continues to diversify and multiply and move away from physical books. And that gets into questions as well, which don't have crisp answers about what's the appropriate scope of an online catalog. You know, there are some institutions who say it's books and journals, not journal articles, but journals, you know, the names of journals that they subscribe to. There are other institutions that have started to put in their manuscripts and photographs and things like that and, you know, this is very disconcerting because the scale of the objects is off. A book seems like a bigger object like an individual photograph somehow. So, you know, there's some argument maybe the right thing to do is to catalog your photographs at a collection level and put them in the online catalog, but then that's not always that useful for finding things. So there's a big scope issue on which people are struggling about what are you putting on your online catalog, what is it a catalog of these days. Is it a catalog of things you provide access to? Is it a catalog that includes free things on the net that you think might be important to your user base? And I think your choices there are probably interact with your choices about how much investment to make in advanced retrieval mechanisms and interfaces for your online catalog going forward. Matt: And do you expect the consolidating base of system vendors to provide more compelling solutions at a faster pace or do you expect to see a consortium of libraries joined together to work on open source alternatives that try to emulate a given set of features? Any sense of how the market is going to develop on this front? Cliff: I don't know about the vendor marketplace, it has always been a pretty fragmented marketplace with a reasonably large number of, relatively speaking, small vendors. One of the consequences of that is that it's been a marketplace that has a lot of trouble finding money for R&D, particularly for the R part. I mean there's plenty of development but it's development that's targeted at something that turns into a product one or two releases out. It's not, you know... Let's take a flyer on this research thing and, you know, maybe it'll turn into a product in ten years and maybe it won't. There just isn't, by and large, the financial base to do a lot of that in the library automation community. I don't know that there's going to be a lot of heavy consolidation in there. I think most of the vendors that are in there right now are fairly viable. There was a considerable round of consolidation back in the... probably late 90s I guess, the latter half of the 90s, and I don't see that we're necessarily ripe for that again. I also don't see a lot of new vendors coming into that field though. Now the open source question is really interesting, and I'll say some things that probably a lot of other people will disagree with. There are two sets of reasons purposed for doing open source in higher education. One reason, and it would be I think the reason why you'd do a system like Sakai for example, or a system like D-Space, is because this is a system that is doing a bunch of core functions like teaching and collaboration or like stewardship or institutional assets, you need to be able to adapt and evolve those system in a very complex environment, that involves lost of moving parts and lots of external systems, and you need to be able to continue to innovate really on a distributed basis, because nobody really fully understands the problems and the solutions yet. We're still finding our ways in term of how technology can transform teaching and learning. We'll be doing that for, I imagine, 50 years or longer. Certainly we're just at the beginning of understanding how to do stewardship of things like complex data sets. So for those things open source to me is compelling. It's something we need to do because we have to do it and we have to find the funding to do it. Now there's another set of argument that are advanced about open source which basically say that we've got some kind of reasonably stable system that we're purchasing from one or a group of vendor today, and it's expensive, and if we could only do an open source version of it, it would be cheaper. I am much less confident about those arguments, that's the argument that's being made around KUALI, the administrative system. I think that's the kind of argument that's typically advanced when we talk about "let build an open source library system". Many of the functions of the an open source library system we understand very well now. There are functions like circulation and acquisition and inventory management, and this kind of stuff which 30 years ago we were trying to understand how to do those in an automated world. Now we have very stable systems for that even the online catalog, unless you really want to invest in a whole new cycle of research and R&D there, if fairly stable among these vendors. So I am, you know, deeply skeptical that doing an open source library system would really save much money, and more to the point it seems to me there's an enormous opportunity cost. I think that place that we do best to focus our efforts on open source is the places we have to be. Talent is scares. And so I'd much rather see that talent and that time go in to contributing to these mission critical systems that we're just trying to understand, like our stewardship system than in recapitulating what looked to me to be mostly pretty viable and well thought out and solid commercial products. Matt: And what about the recent d-LIB issue that was built around the "what would you do with a million books" theme? Any especially interesting information to relay from that series of articles? Cliff: Well this is the March 2006 issue and you know I think it very much ties into the conversation we were having a little earlier about moving away from thinking about the future of text as still things that humans read one at a time. And many of those articles explore themes about translation, correlation, cross-referencing of very large corpora of texts and what that can contribute to scholarship and, you know, I think their well worth reading on that basis. I think that people who are interested in this might also find some of the literature about text-mining in the life-sciences particularly pretty eye-opening. And some of the work that's going on there. Matt: And what would be some pointer to some of that for our listeners? Cliff: Well, in the UK, the National Center for Text Mining that was just set up is hosted at the University of Manchester and I happen to know about this because I'm their advisory board. Their remit primarily focused on biological and life sciences so if you hit their site that would be one place you could pick up a lot of pointers to that. In the US there are a lot of research groups working in this area, the National Library of Medicine has done a great deal of work in this area as well, so I think a couple Google searches will get you [headed in the right direction]. Matt: Cool! A few months ago you spoke to the folks at SURF about the implications of digital libraries. Can you share a few highlights about that presentation? Cliff: Sure. Actually what I was specifically talking about there, they asked me to talk about, and I kind of welcomed it as an opportunity to organize some thoughts, was specifically what impact libraries were likely to have on education, and I think I may have taken this in a direction that they weren't entirely expecting. But let me just give you a couple of bits of the stuff I was thinking about there. One of the things I'm very struck by, and we may have talked about this a bit in October is personal storage is getting very big and very cheap, and the notion, for example of when you issue each schoolchild his or her first laptop, which you know probably ought to happen some time -- grade five, six, somewhere around there, I don't know, why not give them a good starter library of ten or twenty thousand books and select a journal articles and a comprehensive encyclopedia and a few other things to fill up some of the disk space on there, why not? That's clearly doable; this decade. And the only things that are getting in out way are some copyright things or licensing things, and the will to do it. Now, next point, wireless is getting very ubiquitous, particularly in schools and universities. You know, you can point to some funny situations now where we've actually achieved the wired classroom, although we've done it wirelessly rather than Ethernet to every chair in the lecture hall, and having achieved this, some faculty are really quite unhappy about it, and are banning, you know, laptop use, or the ability to turn off wireless in their classroom. Others are adapting to it and thriving with it, but the net effect of this is that a lot more information is going to be persistently available, either on your laptop or through your network connection. And you're going to have things that computer on this, that index it well, that known what you've seen and what you haven't seen. Now, flash backwards to the... probably early 1970s, the great calculator controversy. Right? We started to do cheap hand-held calculators and so the question is how much time should we spend children to be really fluid in long division and things like that, when they can punch it out on these calculators. Should they be able to use these calculators in class? Should they be able to use them in exams? Well, now the question is going to be like this, except its going to be about are you able to use the World Wide Web and all the digital libraries in the world in exams, in class? I think it's going to create intense pressure on some of our assumptions about mastery versus memorization, about when the goals of specific education processor, about the ability to find facts as opposed to the memorization of facts, as opposed to the ability to analyze and evaluate facts, so I think this is going to kick off, I hope it's going to kick off, honestly. Some really hard thinking about the role, the extent of memorization, and the necessity of memorization in various educational settings. I have to share this one with you, and you should talk to Rozensweig more about this, he did a wonderful, wonderful article a couple months ago about something he calls H-Bot, which I basically a history robot. And one of the things he did was he gimmicked up a version of this to try and find answers just by doing some computation on the web to, I think it was the high school American history standardized test. And he did pretty well, which suggests to me something where that falls on the analysis verses memorization scale, and I'm not an expert on teaching of American history but I think these things start to underscore the questions there, and they're really hard questions. I don't know what the right answers to them are. Matt: Yeah, yeah, indeed. Very cool. And what can you share, circling back to research now, regarding the... Toward 2020 Science publication recently issued by Microsoft Research and the offshoot of that, Nature's issue on 2020 computing? Cliff: I'm still working my way through some of this. There was a lot of material there, especially in the nature special issue, but I think that what I took away as just kind of a compelling message was for all that information has transfigured the practice of science, over the last 20 or 30 years, the changes that are coming or going to be at least as dramatic. There is a, I don't remember weather it's an attachment or auxiliary file or what, but there's a file that comes with the Microsoft Science 2020 Report that's sort of a timeline roadmap for things that they think can happen in biology between now and 2020. And keep in mind were talking 15 years now, we're not talking, you know, decades or centuries, we're talking... you know this is going to happen in the lifetimes of most of us. If you look at this stuff they map out about the increased understanding of biological processes and then out ability to control them, it's really pretty staggering. You start seeing that the distance between that and some of the more speculative stuff that people like Verner Vinge and Ray Kurzweil talk about singularities, you know, really maybe isn't that far apart after all. It's pretty striking stuff. I mean, I can't image what a biologist of the 1950s or 60s would think of biology today. And the way in which it's turning into, in some ways, an information science. Matt: Cool, cool. Well, Cliff, it has once again been a really great pleasure speaking with you, and I'm sensitive to the time commitments here, but do you have any closing thoughts? Cliff: It's always fun to catch up and I'm struck by how widely our conversation both last time and this time have ranged from, you know, things that are fundamentally public policy questions all the way through things involving the practice of science. I think that, I would just urge I guess, people who are interested to, keep your eyes open broadly for these changes. I hope in future if we talk again and we'll have a change to perhaps probe further into some of the social implications of this. Matt: Absolutely. Well thanks again for taking time to speak with us and we wish CNI continued success in addressing this wide range of issues. Cliff: Thanks, always a pleasure. |