Hi Gordon
Finally got round to doing your test, as this kinda thing interests me
Done on my, now rather vintage, MacMini (specs in sig). I'll put my findings and then explain why, but I think the issue(s) that this kind of usage can lead to is to do with how the DAW addresses virtual instruments, rather than the instrument itself (in most cases anyway, not to suggest that you don't have a fixable issue).
1 instance of ST4. 5 MIDI channels. 5 presets all C7 Grand Binaural
CPU = 103.9% (1 thread) 13.5% (of total CPU) Unplayable!
1 instance of ST4. 1 MIDI channel. 1 preset of C7 Grand Binaural
CPU = 34.8% (1 thread) 3.84% (of total CPU) Plays nicely
5 instances of ST4. 1 MIDI channel each. 1 preset of C7 Grand Binaural each
CPU = 35% (1 thread) 5.10% (of total CPU) Plays nicely
Caveat 1 - the drive that my sample content is on is a 5400rpm, so I don't expect masses of track count out of it. If I need to, I render down or freeze, but it doesn't happen often as I'm not a heavy sample user.
Caveat 2 - the I/O buffer is set to 256 samples
Now, I know why it works like this and it may be the same for other DAWs. Logic has a slightly irksome way of distributing load across CPU cores/threads, but it always leaves thread 8 for 'live input' i.e. whatever MIDI track is armed or selected, the CPU is ready to take the strain of an incoming performance and is ready to playback samples/record incoming MIDI. Hence why my 1 ST4 with 5 C7s is unplayable - that one instance of ST4 just overloads the thread, even though the remaining threads are basically idle.
By way of a final (unreal world) test I could get up to 13 MIDI channels of "Cellos Sustain" preset in a single multitimbral instance of ST4 before my CPU hit 100% on the 1 live core, so all this is very dependent on the preset used as well.
Anyway, hope that sheds some light.