A Platform for Collaborative e-Science Applications
Transkrypt
A Platform for Collaborative e-Science Applications
A Platform for Collaborative e-Science Applications Marian Bubak ICS / Cyfronet AGH Krakow, PL [email protected] KUKDM 2009, Zakopane, 12-13 March 2009 Outline • • • • • Motivation Idea of an “experiment” Virtual laboratory Examples of experiments Summary and challenges KUKDM 2009, Zakopane, 12-13 March 2009 System-Level Science • An approach to scientific investigations which, besides of analyses of individual phenomena, integrates different, interdisciplinary sources of knowledge about a complex system, to acquire understanding of the system as a whole. • Foster, I., Kesselman, C., Scaling system-level science: Scientific exploration and its implications. IEEE Computer 39 (11) 2006 KUKDM 2009, Zakopane, 12-13 March 2009 ViroLab Users Experiment developer Scientist Experiment planning ViroLab Gem Development Experiment Planning Clinical virologist Experiment use Experiment Execution Decision Support System use Results management <<include>> Decision support <<include>> Experience feedback Data Source registration Adds data resources inside virtual lab Publishes new ViroLab Gem Uses various ViroLab Gems and available data resources to create experiments <<include>> Results sharing Results storing Helps developer improve the experiment Runs prepared experiments to obtain results Discuss and analyses the results Stores the results in laboratory data store KUKDM 2009, Zakopane, 12-13 March 2009 DSS relies on rules to give information on drug resistance Results regarding drug resistance of virus mutants may become new rules for DSS Experiment • Experiment - a process that combines together data with a set of activities that act on that data KUKDM 2009, Zakopane, 12-13 March 2009 Experiment Lifecycle • Experiment Pipeline - is a collaborative planning and execution process that may create a new experiment KUKDM 2009, Zakopane, 12-13 March 2009 On top of any infrastructure … Users Interfaces Runtime Services Infrastructure Experiment developer Experiment Planning Environment Experiment scenario Clinical Virologist Scientist ViroLab Portal Patient Treatment Support Virtual Laboratory runtime components (Required to select resources and execute experiment scenarios) Computational services (services (WS, WTS, WS-RF), components (MOCCA), jobs (EGEE, AHE)) Data services (DAS data sources, standalone databases) Grids, Clusters, Computers, Network KUKDM 2009, Zakopane, 12-13 March 2009 Collaborative Applications on VLvl KUKDM 2009, Zakopane, 12-13 March 2009 Virus Genotype Analysis Objective: loads nucleotide sequence of an HIV virus strain and provides its mutations and drug ranking information • Gems used: – Alignment – Subtype detection – Drug ranking – Data Access Service • Input: virus nucleotide sequence • Output: various analyses http://virolab.cyfronet.pl/trac/vlvl/wiki/ExperimentDemo KUKDM 2009, Zakopane, 12-13 March 2009 Experiment Plan patientID = 6 region = "rt“ Parameters remoteDB = DACConnector.new( "DAS","virolab.hlrs.de") sequences = remoteDB.executeQuery( "select nucleotides from nt_sequence where Genotype retrieval patient_ii=#{patientID.to_s};") regaDBMutationsTool = GObj.create('regadb.RegaDBMutationsTool') regaDBMutationsTool.align(sequences, region.upcase) mutations = regaDBMutationsTool.getResult Alignment + Mutation detection regaDBSubtypingTool = GObj.create('regadb.RegaDBSubtypingTool') regaDBSubtypingTool.subtype(sequences[0]) Subtyping puts regaDBSubtypingTool.getResult puts drs.drs('retrogram', region, 100, mutatations) KUKDM 2009, Zakopane, 12-13 March 2009 Drug ranking Protein Folding Objective: demonstrate the usage of Virtual Laboratory for proteomics applications • Input: protein and chain ID • Output: 3D structure of protein • Gems used: – Protein Data Bank (PDB) Web Service – Early-stage protein folding Bryliński M, Jurkowski W, Konieczny L, Roterman I. Limited conformational space for early-stage protein folding simulation Bioinformatics 20(2), 199-205 (2004) – DAC and WebDAV for result storage http://virolab.cyfronet.pl/trac/exampleExperiments/wiki/exex/Folding KUKDM 2009, Zakopane, 12-13 March 2009 Data Mining with Weka Objective: to analyze the quality of various classification algorithms on large datasets using Weka data mining library and MOCCA component framework. • Input: sample dataset • Output: quality of predictions • Gems used: – Web services for data retrieval, conversion, splitting and testing – MOCCA components wrapping algorithms from Weka – WebDAV server for data storage http://virolab.cyfronet.pl/trac/exampleExperiments/wiki/exex/WekaAdv KUKDM 2009, Zakopane, 12-13 March 2009 Summary • Collaborative applications – A new method of collaborative application development – Abstract layers to hide technological changes – Semantic description of applications – Integration of provenance recording and tracking • Virtual laboratory – Semantic description of resources – Deployment on available Grid systems, clusters, and single CEs – Integration of WS, WSRF, components, jobs – Secure data access and integration KUKDM 2009, Zakopane, 12-13 March 2009 Challenges • development of Web-based tools for managing laboratories, running experiments, gathering results, refining methods and achieving scientific goals within virtual groups, • to support collaborative aspects of research through dedicated, specialized tools for different groups of users (administrators, hardware providers, experiment developers, scientists) to complete their tasks within the overall goal of their groups, • a collaborative space for scientific collaborative application development, facilitating publishing and sharing of software content. KUKDM 2009, Zakopane, 12-13 March 2009 Dataneum • • Authorization model enabling e-Scientists to expose the input data and results of their research to the wider scientific community without violating the security restrictions inherent in grid computing infra-structures or collaborative working environments. Integrated infrastructure and community web portal for: – Authoring, – Publishing, – Managing, – Sharing, – Referencing, – Accessing, – Reusing, – Annotating, scientific data. KUKDM 2009, Zakopane, 12-13 March 2009 15 Thanks to – Runtime system • Tomasz Gubala, Marek Kasztelnik, Piotr Nowakowski, Eryk Ciepiela, Asia Kocot – Middleware • Maciej Malawski, Tomasz Bartynski, Jan Meizner – Presentation and collaboration tools • Wlodzimierz Funika, Dariusz Krol, Daniel Harezlak, Alfredo Tirado – Data Access • Matthias Assel, Stefan Wesner, (Aenne Loehden, Bettina Krammer), Piotr Nowakowski – Provenance • Bartosz Balis, Jakub Wach, Michal Pelczar – Integration • Tomasz Gubala, Marek Kasztelnik • VLvl description, demos, downloads virolab.cyfronet.pl • ViroLab Project (Peter Sloot) www.virolab.org KUKDM 2009, Zakopane, 12-13 March 2009