
The UComm Digital Experience Group has introduced artificial-intelligence captioning in UNL's MediaHub video and audio management system.
Based on the Whisper speech-to-text project from OpenAI, the new captioning system is now available to process each media file uploaded to UNL MediaHub. Whisper is the generally regarded as the most accurate automatic speech recognition (ASR) software in the world, but no ASR is perfect; any caption track will benefit by a human review who knows the people and the event being captioned. After uploading, the caption track is generated within a few minutes. The person uploading the file should then review and approve the caption track, as spelling of names may require correction, as well as any indistinctly-pronounced words and phrases in the original file. Audio descriptions are also included, and these too should be reviewed for clarity and accuracy.
UNL has long provided high-quality human-created captions in MediaHub through a third party service on a pass-through charge basis. As automated captioning has improved, however, the effort required to correct errors has declined as accuracy of automated captioning has increased. The third-party service will continue to be available, but it is no longer presented in MediaHub as the first option for captioning.
UComm and ITS are providing this AI captioning service as a common good, without charge to the UNL community.
Credit goes primarily to Tommy Neumann, software developer in DXG, for building the code and user interface to enable this new service.
More details at: https://mediahub.unl.edu