April 2006 Introduction Bioinformatics is at a crossroads. We work in a field that is changing every day, increasingly moving from specific solutions created by single researchers working alone or in small groups to larger, often geographically dispersed programs enabled by collaborative computing and open software. This book represents an important development, giving the reader an opportunity to discover how the use of open and reusable Java code can solve large bioinformatics problems in a software engineered and robust way. I work with one of the authors of this book every day, on the National Cancer Institute's cancer Biomedical Informatics Grid (caBIG™) project, and I can attest that they are well suited to share with their readers both their experience in the development and use of bioinformatics software, as well as their interest in solid software engineering and interoperability. Background and history In its short history, bioinformatics has become an increasingly important part of how scientists involved in biological research go about their work. This has lead to an explosion of interest in the subject, and a similar explosion in tools and data resources for researchers to learn and use in their work. Historically, tools for bioinformatics have been idiosyncratic and are custom-developed by the end-users (or those close to them) in an iterative fashion until the specific immediate problem is solved. This has led to a balkanization of informatics systems, sometimes yielding multiple, incompatible systems at a single institution for a single application.
Java for Bioinformatics and Biomedical Applications describes the work of the U.S. National Cancer Institute (NCI, National Institutes of Health, U.S. Department of Health and Human Services) and a large number of cancer centers across the U.S. under the caBIG™ (cancer Biomedical Informatics Grid) program, as well as standard bioinformatics applications. The goal of NCI caBIG™ is to create a standards based, interoperable network of individuals, applications and data to enhance the pace of cancer research. CaBIG™ uses J2EE and open source standards for all software development work. This book examines the tools and technologies being developed under caBIG™ to meet the goal of eliminating suffering and death from cancer by 2015 as formulated by the former NCI Director, Dr. Andrew von Eschenbach.In doing so, it provides a vignette into the efforts of thousands of people – molecular biologists, medical practitioners, software developers, to name a few - across the country to bring the promise of translational research to individuals with cancer.
From a software perspective, a functional approach is used to teach the Java platform and its features for enterprise-level application development. Under this approach, the various syntactical and operative elements of the language and any software libraries that have been used (for example, BioJava, Apache, etc.) are taught not in isolation but in the context of discrete definable research problems that enable the user to relate how the different parts of the language fit together in the big picture. All examples are derived from practical problems faced in biomedical/clinical data retrieval and analysis during routine bioinformatics and cancer research. Further, the book illustrates how individual bioinformatics applications (such as BLAST and Genscan) can be stitched together into a pipeline so that users can direct the output of one tool (for example, genepredictions using Genscan) to perform further analysis (say, homology searching using BLAST).