Chaitan Baru.jpg

CHAITAN BARU

Co-Founder, CLDS, 
​San Diego Supercomputer Center

Since September 2019, Chaitan Baru has been on assignment at the National Science Foundation as Senior Science Advisor in the Office of Integrative Activities, Office of the Director, NSF. His responsibilities include providing technical leadership for the Open Knowledge Network (OKN) track of the NSF Accelerator. He has previously been on assignment at NSF, from August 2014- August 2018, as the first (and only) Senior Advisor for Data Science in the Computer and Information Science and Engineering Directorate (CISE), NSF. During that time, he co-chaired the NSF Harnessing the Data Revolution Big Idea; played a leadership role in the NSF BIGDATA program; and advised the NSF Big Data Regional Innovations Hubs and TRIPODS programs. He was instrumental in establishing the partnership between the NSF BIGDATA program and the public cloud providers--AWS, Google, Microsoft, and IBM in 2017 (IBM joined in 2018).


He is on assignment at NSF from UCSD, where he is Senior Advisor, Data Science Research Initiatives for the Office of Research AffairsSan Diego Supercomputer Center, and the Halicioglu Data Science Institute.

Baru has served in a number of roles at SDSC, including Co-Program Director, Data and Knowledge Systems; Division Director, Science R&D; and Associate Director of Data Initiatives. He established and directs the Advanced Cyberinfrastructure Development lab and the Center for Large-scale Data Systems Research (CLDS). He has a broad set of research interests in the area of translational data science that includes topics in applied and application-oriented research in data management and data analytics.


He has led/co-led a number of data cyberinfrastructure initiatives, including as Principal Investigator (PI) of the OpenTopography project; Cyberinfrastructure Lead, Tropical Ecology, Assessment and Monitoring Network (TEAM); Co-Investigator of the Cyberinfrastructure for Comparative Effectiveness Research project (CYCORE); Member of the founding Senior Management Team of the National Ecological Observatory Network (NEON) and Co-PI of the NEON Cyberinfrastructure Testbed; Co-PI of the CUAHSI Hydrologic Information Systems (CUAHSI-HIS); Director, NEES Cyberinfrastructure Center (NEESit); PI/Project Director, Geosciences Network (GEON); and member of the How Much Information? project.


Prior to joining SDSC in 1996, Baru was at IBM, where he led one of the development teams for DB2 Parallel Edition Version 1 (released Dec 1995); and at the University of Michigan, where he served on the faculty of the EECS Department. He received his B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras, and M.E. and Ph.D. in Electrical Engineering from the University of Florida, Gainesville.

Employment


  • 2019 – Senior Science Advisor, Convergence Accelerator, Office of Integrative Activities, Office of the Director, National Science Foundation.

  • 2018 – 2019 Senior Advisor, Data Science Research Initiatives, UC San Diego

  • 2014 – 2018 Senior Advisor for Data Science, Computer and Information Science and Engineering Directorate, National Science Foundation.

  • 2013 – 2018 Associate Director, Data Initiatives, San Diego Supercomputer Center, UC San Diego.

  • 2011 – Director, Center for Large-scale Data Systems Research (CLDS), SDSC.

  • 2007 – Distinguished Scientist and Director, Advanced Cyberinfrastructure Development (ACID) Group, SDSC.

  • 2004 – 2007  Division Director, Science R&D Division, SDSC.

  • 2004 –  Member of SDSC Senior Management Team.

  • 2004 –  Member, California Institute for Information Technology and Telecommunication, Calit2, now Qualcomm Institute.

  • 2001 – 04  Co-Director, Data and Knowledge Systems Program, SDSC.

  • 2000 – 01 Assistant Director, Data Intensive Computing Environments (DICE) group.

  • 1996 – 2000 Senior Principal Scientist, Data Intensive Computing Environments (DICE) group, SDSC, General Atomics.

  • 1992 – 95 Advisory Programmer, Database Technology Institute, IBM Almaden Research Labs, San Jose, CA (1995). Advisory Development Analyst and Group Lead, Database Technology Group, IBM Toronto Labs (1992-95).

  • 1985 – 92 Assistant Professor, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor.


Cyberinfrastructure Leadership Activities

  1. Project Director, The Geosciences Network (GEON), 2002-2010. PI of a large NSF Information Technology Research (ITR) grant involving 12 collaborating institutions. The project was renewed as GEON 2.0 and also resulted in another spinoff activity, viz., OpenTopography.org.

  2. Director of Cyberinfrastructure for NSF National Earthquake Engineering Simulations (NEESit), 2007-2009. Initially served as the NEESit Cyberinfrastructure Advisor, 2006.

  3. Cyberinfrastructure Lead/PI for the Tropical Ecology, Assessment and Monitoring Network (TEAM), 2007-present. TEAM is a project managed by Conservation International, originally funded by the Moore Foundation.

  4. Member of Senior Management Team, NSF National Ecological Observatory Network (NEON, www.neoninc.org), 2005-2007. Cyberinfrastructure Lead for the NEON Testbed, and co-PI of the NEON Cyberinfrastructure Diagnostic Testbed.

  5. Lead, KatrinaSafe Database Project. Worked in close collaboration with American Red Cross to develop KatrinaSafe, a database to assist victims of Hurricane Katrina. This led to the development of DisasterSafe, hosted at SDSC—a standard service offered by Red Cross for victims of any disaster.

  6. Member, IRIS Data Management System Standing Committee, 2007-2009. IRIS is the NSF data archives for seismological data.

  7. Co-Director, National Laboratory for Advanced Data Research (NLADR, www.nladr.org) Joint activity with the National Computational Science Alliance (NCSA) with Dr. Michael Welge as the other co-Director.

  8. Executive Director, SDSC/Calit2 Synthesis Center (www.syncenter.org), 2005-2008. Joint facility consisting of SDSC staff and equipment located at Calit2.

  9. Member of Cyberinfrastructure Advisory Committee, Long-Term Ecological Research Network (LTER, www.lternet.org), 2006.

  10. Member of Advisory Board, CLEANER Project Office, 2005-2006.

  11. Co-Convener, NSF Earth Science CyberInfrastructure (ES-CI) Task Force (with Lee Allison and Tom Jordan), 2004.

  12. SDSC PI for CUAHSI Hydrologic Information System (http://his.cuahsi.org), 2004-2008.

  13. Member of Leadership Team, Biomedical Informatics Research Network (BIRN, www.nbirn.net), 2001-2004. One of the co-Investigators of the original BIRN Coordinating Center (BIRN-CC).

Collaborations

  1. CARTA: Cyberinfrastructure and bioinformatics lead for the UCSD / Salk Institute-led ORU Center for Advanced Research and Training in Anthropogeny led by Profs. Ajit Varki, Margaret Schoeninger, and Rusty Gage (Salk). Funded by the Mathers Foundation. Duration: 2007—ongoing.

  2. CYCORE: Co-PI of Cyberinfrastructure for Comparative Effectiveness Research project funded by NIH. Project is led by Dr. Kevin Patrick (SOM & Calit2) in collaboration with M.D. Anderson Cancer Center, Houston. Duration: October 2009—September 2010.

  3. CISA3: Co-PI with Profs. Tom Levy and Falko Kuester of the Mediterranean Archaeology Network (MedArchNet). Funded by the UCSD ChancellorÕs Collaboratory initiative, for 2009-2010 academic year.

  4. 911: PI of NSF-funded project on Spatiotemporal Analysis of 911 Call Stream Data, with Prof. William Hodgkiss, SIO, as co-PI. Duration: 2004-2008.

  5. Hazards: Coordinated a hazards initiative on campus with funding from OVCR, JSOE, SIO, and SDSC. Pre-proposal on Cyberinfrastructure Center for Urgent Response to Emergencies (CICURE) submitted to the NSF STC program (not selected). Other proposal planning activities are under way.

  6. WIISARD: Co-investigator (with Prof. Leslie Lenert as PI) on the original Wireless Internet Information System for Medical Response in Disasters project. Funded by NIH, 2005-2007. Was responsible for the data management component.

  7. BIRN: Collaborated with Prof. Mark Elisman as co-Investigator on the original BIRN-CC project, funded by NIH, 2001. Was responsible for the data integration component.

  8. I2T: PI of the NSF-funded Information Integration Testbed (I2T) project with Prof. Yannis Papakonstantinou (CSE) as co-PI. Duration: 2002-2004.

  9. UC-SGH: Co-PI on a proposal for a Center of Excellence on Disasters to the UC School of Global Health with Prof. Craig Van Dyke, UCSF (PI), and Profs. Gretchen Kalonji (UCOP) and Nicholas Sitar (UCB). (not selected).

  10. RISC MRU: Co-PI on a multi-campus research unit (MRU) proposal on Rapid Information for Science during Catastrophes (RISC), led by Prof. Emily Brodsky, UCSC. (not selected).

Software Development

  1. One of the group leaders and developers of IBM DB2 Parallel Edition Version 1.0, released commercially in December 1995.

  2. One of the designers of the SDSC Storage Resource Broker (SRB). Version 1 was released in September 1997.

  3. One of the designers of the Data Integration Cartª technology for ontology-based data integration (invention disclosure filed: 2007).

U.S. Patents

  1. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 7,349,915, March 25, 2008. Licensed to Nirvana Storage.

  2. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 6,963,875, November 8, 2005. Licensed to Nirvana Storage.

  3. System and method for construction, storage, and transport of presentation-independent multimedia content, C. Baru, J. Chase, T. Elvins, R. Fassett, E. Nebel, Patent No. 7,028,252, March 22, 2001. Assigned to Oracle Corporation.

  4. Method and apparatus for achieving uniform data distribution in a parallel database system, C. Baru and F. Koo, Patent No.US5970495, IBM, Oct.19, 1999.

  5. Method and apparatus for implementing partial declustering in a parallel database system, C. Baru, G. Fecteau, J. Kirton, L. Kollar, F. Koo, Patent No. US5878409, IBM, March 2, 1999.

Ph.D. Committees Chaired

  • Ophir Frieder, Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair of Computer Science, Georgetown University.

  • Piyush Goel. Co-Founder of Everypath.com, San Jose, CA.

  • Sriram Padmanabhan, Google Cloud. Was Distinguished Engineer, IBM Silicon Valley Labs.

Committee Memberships

  • Chair, CloudBank Advisory Board, 2019 -

  • EarthCube External Advisory Group, 2019 -

  • Others: Co-chair, SPEC Research Group on Big Data Benchmarking, 2014 - 2016; Lead, TeraGrid Data Working Group, 2001–2002; Review Committee, Canada Research Chairs program, NSERC, Canada, 2000–2002; Architecture Working Group, California Digital Library, 1998–2000; Grants Selection Committee (GSC) for Computer and Information Sciences, NSERC, Canada, 1994-97; IBM representative on the Transaction Processing Council's TPC-D Benchmark Standard Subcommittee, 1993-95.


Current Projects: 


Past Projects: 


Expertise: 

  • Data science

  • Data management

  • Knowledge networks

  • Data analytics


Website: 

http://acid.sdsc.edu/users/chaitan-baru

Blog: medium.com/@chaitanbaru