Singleton pattern

The traditional singleton creation

+(XXClass*)sharedInstance {
  static XXClass* XXClass_sharedInst = nil;
  @synchronized(self) {
    if (XXClass_sharedInst == nil) {
      XXInitialize(&XXClass_sharedInst);
    } 
  }
  return __sharedInst;
}

is extremely inefficient because @synchronized not only lock a mutex, but also insert an exception handler (try/catch block). If you don't care about exceptions and willing to use C functions, pthread_once() is a better alternative:

static XXClass* XXClass_sharedInst = nil;
static pthread_once_t XXClass_onceControl = PTHREAD_ONCE_INIT;
static void XXInitializeOnce(void) { XXInitialize(&XXClass_sharedInst); }
...
+(XXClass*)sharedInstance {
  pthread_once(&XXClass_onceControl, &XXInitializeOnce);
  return XXClass_sharedInst;
}

pthread_once() is implemented using a spin lock (which leads to a syscall_thread_switch() kernel call in the worst case).

If it is safe to create multiple copies of the singleton and destroy the extra ones, you may even use CAS:

+(XXClass*)sharedInstance {
  static XXClass* XXClass_sharedInst = nil;
  if (XXClass_sharedInst == nil) {
    XXClass* tmp;
    XXInitialize(&tmp);
    if (!OSAtomicCompareAndSwapPtrBarrier(nil, tmp, (void*volatile*)&XXClass_sharedInst)))
      XXDestroy(tmp);
  }
  return XXClass_sharedInst;
}

Unfortunately, OSAtomicCompareAndSwapPtrBarrier() is still implemented using spin lock, so in principle this is nowhere faster than pthread_once(), and is even more error-prone. If you are absolutely crazy about performance you could use the LDREX/STREX instructions, but mind you that these won't improve performance a lot since the bottleneck should be shifted to elsewhere.