Core Center Resources:
Bioinformatics & Biostatistics
Informatics – Jack London, Ph.D., KCC, TJU
The Informatics Shared Resource (ISR) supports for the cancer center’s basic, clinical, and translational research. Formerly known as the Shared Computer Facility, this shared resource’s name change reflects the shift in focus of the facility from providing central computing hardware platforms for investigators, to providing services for management and processing of data and information. These services include creating research software, data file management, hardware and software acquisition, and computer-related consultation.
The ISR has supported cancer center research by developing web-based database applications for clinical trials, biospecimen research repositories, and various research projects. It provides management of very large genomic experimental data sets, acquisition of hardware, development of software tools to facilitate inter- and intra-institutional collaborative research programs, and assistance with design proposals for research-related computing.
The clinical trials applications provide information on available clinical trials, the registration of participants on these trials, and the fully automated reporting of trial adverse events. The biospecimen repository applications provide investigators with information characterizing specimens available for research.
Applications and strategies for the storage and retrieval of microarray experimental results were developed. Video conferencing systems were acquired and web-based document sharing applications developed for geographically dispersed research collaborations.
Informatics Shared Resource staff consult with investigators to recommend hardware and software, ranging from the desktop units to high performance computing clusters. Recognizing the importance for interoperability of independently developed informatics systems as crucial to the furtherance of cancer research through the sharing of research data and resources, the ISR has been an active participant in the NCI’s cancer Biomedical Informatics Grid (caBIG™) since the initiative’s inception.
This resource has been a funded developer and/or adopter in the Integrated Cancer Research and Tissue Bank and Pathology Tools Workspaces. KCC is among the first cancer centers to deploy caBIG™ tools.
The shared resource director, Dr. London, is a member of the Data Sharing and Intellectual Capital Work Group. Staff members have served as unfunded participants in the Clinical Trials Management Systems Work Space, to keep abreast of developments in this group for possible adoption at the center.
The ISR also supports cancer center programs, which relate to research, such as the clinical trials applications, seminar conferencing services and the database application for tracking seminars. These are made available to Jefferson Cancer Network hospitals and cancer center community outreach programs, such as the Buddy Program.
Equipment
- The ISR is dedicated to using open source solutions when possible. The shared hardware includes ten servers (8 Linux/Apache/PostgreSQL, 2 Microsoft Windows / IIS / MSSQL), and over a terabyte of network file storage. This networked storage is backed-up to tape daily. KCC faculty and staff are encouraged to use the network accessible shared disks for storing documents and data, since the ISR backup procedures limit their risk of losing file updates to a maximum of one day. Furthermore, this shared mass storage relieves individual investigators with very large disk storage needs of the expense, both in dollars and time, of maintaining large disk “farms.”
- The ISR utilizes high-speed RAID storage systems, which provide fail-over capability from redundant drives. Another benefit of maintaining network file storage for KCC members is that it permits controlled access data and document sharing. Local area networking and internet access is provided and supported by the University (i.e., TJU is responsible for everything up to the wall jack). The ISR encompasses 1010 square feet in three rooms. A main computing machinery room containing the host computers and associated operating equipment occupies 120 square feet. Office space for the shared resource staff occupies 890 square feet in two adjoining rooms. There is also a common area for conferences and group work sessions shared by all on the 8th floor of the Bluemle Life Sciences Building.
Clinical Trials Applications
The ISR developed and maintains web-based database applications for clinical trials research. These applications include:
- The Clinical Trial Information Repository application, which has databases for clinical trial information and patients registered on these trials. This system is integrated with the University’s Office of Human Research clinical research database.
- The automated electronic Serious Adverse Event Reporting system (eSAEy), which incorporates NCI Cancer Therapy Evaluation Program Toxicity Criteria and electronic signatures for reporting adverse events that occur during KCC clinical research trials. This system is integrated with the Clinical Trial Information Repository and the University’s Office of Human Research adverse event reporting database.
- The study calendar system, TreatmentCal, which allows tracking of patients enrolled on KCC studies.
These applications utilize open source PostgreSQL relational databases and Apache web services on Linux database and web server platforms. They are accessible on both PC and Mac desktops via standard web browsers (including Internet Explorer, Netscape, Firefox, and Safari).
Multilevel security is provided by “username/password” authentication with Web browser level log-ins, and authorization at the application and database table permission level. The transmission of confidential information, such as patient data, is protected with 128-bit encryption. These applications are compliant with HIPAA restrictions on the dissemination of Protected Health Information, and the electronic signature function adheres to the provisions of the FDA’s 21CFR Part 11.
Biostatistics – Terry Hyslop, Ph.D., KCC, TJU
The Biostatistics Shared Resources (BSR) supports Kimmel Cancer Center investigators in the design, conduct and analysis of cancer-related clinical, translational and scientific investigations. It also reviews cancer-related clinical trial proposals for the Cancer Clinical Research Review Committee (CCRRC). The Shared Resource is staffed by five PhD-level faculty biostatisticians and 3 MS-level biostatisticians. The Biostatistics Shared Resource provides consultation and expertise regarding study design (including validity of the overall design, feasibility of meeting objectives, sample size, study duration, and planned data analysis), recommendations for staffing (data management and analysis support), data analysis, preparation of reports and assistance with manuscript writing, and development of new biostatistical methods. The general goals of the Biostatistics Shared Resource are to ensure that study designs, monitoring, and analyses use state-of-the-art methods, and to help developmental studies supported by the Center successfully achieve peer reviewed funding. This Shared Resource has experienced growth during the recent grant cycle, and has added faculty and staff with bioinformatics expertise. The University’s Strategic Plan commits resources to ensure continued investment in the Biostatistics Shared Resource. Areas of projected growth include development in key areas, such as clinical trials design, bioinformatics and analysis of high-throughput data.
Equipment and Facilities
- Each member of the BSR has a Pentium computer, most as laptops with docking stations and flat panel monitors. Each also has access to multiple shared computer drives (set up and maintained by KCC Informatics Shared Resource) facilitate collaboration on grants, data analyses and manuscripts. Password controlled web-based access to shared documents facilitates the collaboration process for larger applications and projects.
- The BSR has assembled a statistics software library including SAS (Windows and Linux, SAS/Genetics), Systat, S-Plus, Stata, Sudaan, StatXact, LogXact, Egret, CART, DBMS/COPY, and nQuery, PASS, and capability for FORTRAN programming as required, including the NAG mathematical subroutine library. In addition, the Division uses multiple packages freely available from bioconductor.org and R. The bioconductor library has been set up and is in use on a Linux shared server for larger computational projects, with assistance from the Informatics Shared Resource. Finally, a web-based file sharing software initiative allows for targeted shared access of files across the campus as well as outside the University.
Bioinformatics – Douglas O’Neal, Ph.D., DBI, UD
The BioIT Center supports the computational and data management needs of the DBI research community and is anchored by three major systems:
- High-Performance Compute Cluster The core of DBI’s HPC offering is a linux-based beowulf cluster with 122 compute nodes providing 286 computer cores. A mix of system types allows the choice of systems best suited for the particular of the computation. The nodes include: 113 x Sunfire V60x dual-processor 2.8GHz Xeons (64-bit)/ 2GB memory 7 x Sunfire X4100M2 quad-processor Opterons (64-bit)/ 8GB memory 1 x Sunfire X4600M2 16-processor Opterons (64-bit)/ 64GB memory 1 x Sunfire X4600M2 16-processor Opterons (64-bit)/ 128GB memory All compute nodes are connected via gigabit ethernet and have access to a 1.1TB shared RAID array. An parallel Myrinet network connects 48 of the nodes allowing low-latency data transfers. The HPC cluster is commonly used for protein sequence alignment (NCBI BLast, EMBOSS) and molecular modeling (Gaussian, GAMESS), but also supports other bioinformatics applications as well as custom applications suited for parallel processing. Fair-use scheduling of resources is provided by the Sun Grid Engine queue management system.
- Database Server Cluster The BioIT Center uses a cluster of six Sunfire X4100M2 servers, each with quad-processor 64-bit Opteron CPUs and 16GB memory, as a repository of experimental data in relational databases. Both MySQL and Oracle database systems are available, allowing researchers to organize, store, and evaluate their data. An Apple 10.5TB RAID array, a Sun 840GB RAID array, and a Sun 480GB JBOD array provide ample storage space for current data. While direct access to stored data via command line and web-based clients is provided, general access to share results is usually though customized web pages. Data security is a high priority and access to results other than through these methods is strictly limited.
- 3-D Visualization Studio The Visualization Studio is an immersive 3D graphics room with a 7’x15’ rear-projection screen, delivering a rear-projected, edge-blended image with total resolution of 2240 x 1024 pixels. The display is driven by two servers: an eight-processor Silicon Graphics Prism visualization supercomputer with four graphics pipelines provides a Linux environments with the power of the SGI graphics software, and a dual-core HP AMD 64 with a high-end NVidia graphics processor allows a variety of Windows software to be utilized. An integrated tracking system allows the graphics software to follow the motion of a researcher and adjust the display so that the researcher can walk around or through 3D objects. Both systems are available for molecular and biological modeling projects.
The BioIT Center also supports multiple special-purpose servers. Among these are web servers, a secure ftp server for data transfer, streaming video servers, and an email server. The BioIT staff will work with researchers to design and purchase servers dedicated to specific groups or projects. These systems can be physically located in the BioIT computer room and managed by the center’s staff if desired.
The BioIT staff brings several types of expertise to the center to aid researchers. Dr. Doug O’Neal is the manager of the BioIT Center and is experienced in scientific computing and system administration. Dr. Mihailo Kaplareviç provides database and scientific programming support for the Center’s researches. Combining scientific training with information technology experience, both can provide the needed interface between research needs and the computer hardware and software resources at DBI. Web site support and desktop computing support is provided by Eric Garrison.
Christiana Center for Outcomes Research – Director, William Weintraub, MD
The group includes 7 clinicians/epidemiologists and 5 biostatisticians. As a multidisciplinary research group with expertise in clinical medicine, epidemiology, biostatistics, and informatics, the Christiana Care Center for Outcomes Research (CCOR) supports on-going research programs in cardiovascular medicine, nephrology, women’s health, infectious disease, and general internal medicine. CCOR has particular expertise in cost-effectiveness and health status assessments in clinical trials. In addition, CCOR biostatisticians have expertise in propensity score methods, multiple imputation of missing data, data mining, cluster analysis, structural equation modeling, survival analysis for multiple events, latent growth curve modeling, cost-effectiveness analysis, simulation with Markov modeling and patient level stochastic analysis, and Bayesian sensitivity analysis for cost-effectiveness models. The expertise at CCOR is critical to successful comparative effectiveness research. Strong informatics is an integral component of CCOR, which provides a data, information, and communications continuum across the research environment. The specialists in this area work closely with investigators, clinicians, and statisticians to understand domain perspectives and to provide the data and systems understanding necessary to prepare appropriate datasets for the prescribed statistical approach. The team has developed proficiency in the integration of data from diverse internal and external sources, including: outpatient electronic health records (EHRs) from multiple practices and from different vendors, directly accessed acute care systems, the Christiana Care Health System data warehouse, and databases created for prospective studies.