Over the years, scientific applications have become
more complex and more data intensive. Especially
large scale simulations and scientific experiments in
areas such as physics, biology, astronomy and earth
sciences demand highly distributed resources to
satisfy excessive computational requirements.
Increasing data requirements and the distributed
nature of the resources made I/O the major bottleneck
for end-to-end application performance. Existing
systems fail to address issues such as reliability,
scalability, and efficiency in dealing with wide area
data access, retrieval and processing. We explore
data-intensive distributed computing and study
challenges in data placement in distributed
environments. After analyzing different application
scenarios, we develop new data scheduling
methodologies and the key attributes
for reliability, adaptability and performance
optimization of distributed data placement tasks.
more complex and more data intensive. Especially
large scale simulations and scientific experiments in
areas such as physics, biology, astronomy and earth
sciences demand highly distributed resources to
satisfy excessive computational requirements.
Increasing data requirements and the distributed
nature of the resources made I/O the major bottleneck
for end-to-end application performance. Existing
systems fail to address issues such as reliability,
scalability, and efficiency in dealing with wide area
data access, retrieval and processing. We explore
data-intensive distributed computing and study
challenges in data placement in distributed
environments. After analyzing different application
scenarios, we develop new data scheduling
methodologies and the key attributes
for reliability, adaptability and performance
optimization of distributed data placement tasks.