A massively parallel architecture is composed of a high number of processing elements (PE), but seldom a 1:1 mapping between the PEs and the data set can be achieved. To overcome this problem, a processor virtualization mechanism is needed. This work provides a theoretical study on performance optimization in the specific virtualization process used when a small amount of memory is associated to each PE. Moreover, this paper presents an algorithm for performance optimization and for the determination of the maximum processing speed on low-cost cellular array processors implementing the above mentioned virtualization mechanism. These considerations are then used to improve the execution of morphological image processing tasks on the first hardware prototype of the special-purpose cellular array processor PAPRICA.