Binary classifiers as the maximally quantized decision function for AI safety — a paper exploring whether we can prevent catastrophic AI output even if full alignment is intractable

Binary classifiers as the maximally quantized decision function for AI safety — a paper exploring whether we can prevent catastrophic AI output even if full alignment is intractable picture 1 of 1