cps030j 
 

Parallel & Distributed Computing

1. Load Balancing and Task Scheduling in Parallel, Distributed, and Cluster Computing Environments (in collaboration with Dr. Javid Taheri, Sydney University)

Scheduling and load balancing are two important problems in the area of parallel computing. Efficient solutions to these problems will have profound theoretical and practical implications that will affect other parallel computing problems of similar nature. Little research attempted a generalized approach to the above problems. The major problems encountered are due to the interprocessor communication and delay because of inter-dependency between the different subtasks of a given applications. The mapping problem arises when the dependency structure of a parallel algorithm differs from the processor interconnection of the parallel computer, or when the number of processes generated by the algorithm exceeds the number of processors available. This problem can be further complicated when the parallel computer system contains heterogeneous components (e.g. different processors and link speeds, such as in Cluster and Grid Architectures). This project intends to investigate the development of new classes of algorithms for solving a variety of scheduling and load-balancing problems for static and dynamic scenarios.

2. Scheduling Communications in Cluster Computing Systems (in collaboration with Dr. Javid Taheri, Sydney University)

Clusters of commodity computer systems have become the fastest growing choice for building cost-effective high-performance parallel computing platforms. The rapid advancement of computer architectures and high-speed interconnects have facilitated many successful deployments of this type of clusters. Researchers in previous studies have reported that, the cluster interconnect significantly impacts the performance of parallel applications. High-speed interconnects not only unveil the potential performance of the cluster, but also allow clusters to achieve better performance/cost ratio than clusters with traditional local area networks. Towards this end, this project aims to study the how computations and communications influence the performance of such systems. Applications tend to range from the compute-intensive to the communication-intensive and an understanding of such applications and how they map efficiently onto clusters is important.

3. Parallel Machine Learning and Stochastic Optimization Algorithms (in collaboration with Dr. Javid Taheri, Sydney University)

Optimization algorithms can be used to solve a wide range of problems that arise in the design and operation of parallel computing environments (e.g., datamining, scheduling, routing). However, the many classical optimization techniques (e.g., linear programming) are not suited for solving parallel processing problems due to their restricted nature. This project is investigating the application of some new and unorthodox optimization techniques such fuzzy logic, genetic algorithms, neural networks, simulated annealing, ant colonies, Tabu search, and others. However, these techniques are computationally intensive and require enormous computing time. Parallel processing has the potential of reducing the computational load and enabling the efficient use of these techniques to solve a wide variety of problems.

4. Autonomic Communications in Parallel and Distributed Computing Systems (in collaboration with Dr. Javid Taheri, Sydney University)

 

The rapid advancement of computer architectures and high-speed interconnects have facilitated many successful deployments of many types of parallel and distributed systems. Researchers in previous studies have reported that, the design of interconnects significantly impacts the performance of parallel applications. High-speed interconnects not only unveil the potential performance of the computing system, but also allow such systems to achieve better performance/cost ratio. Towards this end, this project aims to study the how computations and communications influence the performance of such parallel and distributed computing systems.

5. Quality of Service in Distributed Computing Systems

There is a need to develop a comprehensive framework to determine what QoS means in the context of the distributed systems and the services that will be provided through such infrastructure. What complicates the scenario is that the fact the distributed systems will provide a whole range of services and not only high performance computing. There is a great need for the development of different QoS metrics for distributed systems that could capture all the complexity and provide meaningful measures for a wide range of applications. This will possibly mean that new classes of algorithms and simulation models need to be developed. These should be able to characterize the variety of workloads and applications that can be used to better understand the behaviour of distributed computing systems under different operating conditions.

6. Healing and Self-Repair in Large Scale Distributed Computing Systems

As the complexity of distributed systems increases time there will be a need to endow such systems with capabilities that make them capable of operating in disaster scenarios. What makes this problem very complex is the heterogeneous nature of today’s distributed computing environments that could be made up of hundreds or thousands of components (computers, databases, etc). In addition, a user in one location might not be able to have control over other parts of the system. So it is rather logical that there is a need for “smart” algorithms (protocols) that can achieve such an acceptable level of fault-tolerance and account for a variety of disaster recovery scenarios.

 


 

Back to the School Home Page
 


Last changed: May 24, 2013