BLACKSBURG, July 31, 2001 - Dr. Sobral is the director of the Virginia Bioinformatics Institute. His research interests are in comparative plant genomics, bioinformatics, and proteomics, and in the dissection of plant-microbe interactions. : Could you tell us a bit about the origins of the Virginia Bioinformatics Institute?

Dr. Sobral : It is clear that in the biology of this new century, information infrastructure will be absolutely required to enable new discoveries. The only way to achieve that is through centralization of components like high performance computing and data storage, etc. and then making that infrastructure available to geographically-distributed collaborators. The Virginia Bioinformatics Institute represents the creation and long-term commitment of the Commonwealth of Virginia to such infrastructure. We are administratively and physically housed at Virginia Tech, but lie outside of the traditional departmental boundaries. In a sense, it's a model similar to the Johns Hopkins Applied Physics Lab—a separate research institute within the Johns Hopkins University. : When was this initiative undertaken and what is your progress so far?

Dr. Sobral : July 1st was our first anniversary. I was hired to direct the institute and give it a strategic direction. We've essentially focused the institute to study the biological aspects of the interrelationships between hosts, pathogens, and their environment—whether of interest to biomedicine, agriculture, environmental science or defense. We believe that evolution has had fairly few "good ideas" that repeat over and over—whether it's a bacterium infecting a mouse, or a fungus infecting a plant. : Will the institute focus primarily on basic research?

Dr. Sobral: The institute is one hundred percent committed to research and economic development. We take the economic development part very, very seriously, just like we take the research part very, very seriously. We see the research providing the IP (intellectual property) basis for economic development, and we're putting in place a variety of collaborations and partnerships to facilitate that economic development—whether through our own activities, spinning out commercial opportunities, or through the attraction of industries to Virginia to collaborate with the high performance infrastructure that the Virginia Bioinformatics is putting in place. : How large is the institute and how are you funding your hiring?

Dr. Sobral: We were expected to hire 25 people by the end of year one, and actually hired them four months ahead of schedule. That's based on a Commonwealth investment, which ramps up to thirteen million dollars a year by 2004, and then plateaus as a recurring investment by the Commonwealth. In year two, we would add 25 more people, but because we've already attracted about ten million dollars of extramural research grants and contracts in the first year of operation, we will actually be hiring a further 25 in addition to the 25 that the state will support. So we should have about 75 people at about this time next year. Eventually we would like to have between 300 and 400 people total in the institute by 2007 or 2008. : What percentage of current staff are trained biologists, how many are trained in computational methods, and how many have experience in both?

Dr. Sobral: We found some rare gems who are trained in both areas. These individuals developed their experience from their own interests, rather than coming through a process of education. Our long-term plan is to have about 40% doing experiments in the wet chemistry sense, with about 60% engaged in theoretical or quantitative work. When I say quantitative, I'm including IT staff, whether they're a systems administrator, a database administrator, a software developer, or a theoretician. : Are we talking about mostly PhD-level scientists?

Dr. Sobral: We've been very fortunate in that the programmers that we currently have at VBI have at least PhDs and some of them even have PhDs in two areas—for example a programmer who has a PhD in biochemistry and physics. Our research computational staff all have PhDs. Now, some of our programmers that are more applied do not have PhDs—like our systems administrator—but their roles are really more infrastructural rather than research. : How would your future hires differ, if at all, from the positions that you currently have?

Dr. Sobral: Not at all. We'll still be looking for a 60-40 split. Obviously some areas will fill up faster than others. Our work in biochemical pathways, biochemical simulation and modeling, led by Pedro Mendes, is growing extremely rapidly. I don't see them hitting a barrier in terms of their growth; what I see is that since they're an early-growth group, and have a lot of interest globally in what they're doing, they may end up being a significant part of the organization. The estimates that we make are very much plans and projections and in a sense, we don't know whether by 2008 maybe we will have three times as much funding coming in the door from the commonwealth. I give you these projections to give you a feel of where our targets lie. A lot of what actually happens falls under the old proverb: "the proof is in the pudding." : How difficult is it to find people with the right training?

Dr. Sobral: It's very competitive. Some might be missing a little bit of the training, but if they are willing to build bridges to people who have the parts that they're missing, they certainly can come on board and evolve because of the unique environment that we provide them—where they can dedicate 100% of their time to research. As part of a group, individual researchers will be able to advance their research agenda more quickly than they would as a solitary worker. The opportunity that presents itself here is such that we have not lost anybody yet because we could neither offer a unique opportunity or a good salary. We have lost people due to personal preference: "I don't want to live in Blacksburg, I need to live in a metropolis with at least 8 million people". We've lost people because they don't want to live in the United States, because we're hiring globally. But we haven't lost anybody because they came in and said "You know, there's just a better opportunity for me somewhere else," so we're very bullish. : Many of the people that you hire don't have formal training in bioinformatics, but clearly have the potential to work in that area. What skills are you looking for in that person?

Dr. Sobral: Much like with the computational people, what we're looking for first of all is research excellence. That's non-negotiable. But once we see the research excellence, we look for people who are eager to go to the next level. And by that I mean their explicit willingness to build a bridge from experimental to computational science, and their interest in the opportunities that would not be available as an individual researcher. We want individuals that recognize the need to be part of a team.

If we don't see a committed desire to partner, to collaborate and to build bridges, we don't care how good their science is or how well-funded they are. The next thing we're looking for is whether this person is really willing to be part of multidisciplinary team research. It is difficult for some formally trained academics who were taught to show the world their contribution as an individual. Biology is changing from cottage industry science—where that kind of thing works very well—to multidisciplinary team science. So what we're trying to do is find those people who have that sort of vision. : You said before that biologists are generally not well trained in quantitative methods, but it seems that in the future, life scientists are going to need some degree of training in this area to remain attractive to employers. What would you recommend to those scientists?

Dr. Sobral: I think the short answer is to take the quantitative courses. Don't fool around. Really take them. If your engineering school offers them, go there. If your computer sciences department offers them, go there. And that means serious quantitative skills in statistical skills and mathematics. Wherever they are, go get them, because they will definitely be useful. : What if you can't?

Dr. Sobral: If you don't, then the chances for you to get into bioinformatics are like trying to get into a group at a lower level and learn on the job. That's certainly possible, and I welcome them to do that. But they aren't going to be able to take a research leadership role in bioinformatics per se if they don't already have some of that training. : Tell us about your background? how did you get involved in bioinformatics? What was your training?

Dr. Sobral: My undergraduate work was in agricultural engineering in Brazil. I then went to Iowa State University and got a PhD in Genetics. After a post-doc, I started as an independent investigator studying plant and microbial genomics at the California Institute of Biological Research in La Jolla, remaining there until 1996 as a staff scientist. A critical point was that I was trained in the quantitative sciences. One of the critical challenges for biology is that it is moving from an essentially descriptive science to a predictive and theoretical one. Most of the people who have received biological education recently have not been exposed to the kinds of quantitative skills that you would need to play heavy in bioinformatics. And it's not just a matter of computers, it's a matter of math and the physical sciences. I think that one of the unique opportunities for me was that in fact I did have a strong quantitative background—although by no means would I consider myself an IT expert. Many biologists have been trained as observational scientists and they have either a lack of interest or a lack of belief in theoretical modeling. What's happening now in biology happened in physics a long time ago: the early advances in physics were based on peoples' observations, but nowadays almost all of modern physics is essentially theoretical, even though there is a lot of experimentation to test the theories. I think that modern biology is going in that same direction—theory, experimentation and simulation. : In the future, will bioinformatics be handled by specialists, or will all life scientists need some degree of quantitative training?

Dr. Sobral: My personal feeling is that if you fast-forward many years, biologists are going to be more similar to current physicists. Very few numbers of them will be directing very large infrastructures to do critical experiments, to validate or not validate, as it were, a particular theory or opinion. And then, the majority of the people will actually be working on the existing data and creating theory and making predictive models on data generated at an industrial scale. Now at that point in time, I see that a lot of the information infrastructure, the infrastructural components of bioinformatics will probably be more of an off-the-shelf solution. Much like you and I use Microsoft Word or Excel without really knowing how to build a system like that ourselves, I think that the biologists of the future will have that kind of infrastructure. What they'll really need is the capability to look at the data, ask questions of the data, interpret the results, and make predictions and move the theory forward based on those analyses. So, yeah, I think they will need to be comfortable with the quantitative aspects of analyzing the kinds of biological data that we're capable of producing nowadays. And long-term, we won't need them to build a new database or a new software tool, because there will be that infrastructure coming off-the-shelf.

# #

Published by Public Relations, July 30, 2001