- pricing
Based on audio voice time // 0.024 USD per minute // ap-northeast-2
- performance
Coverting MP4 to SRT from 12 minutes video takes 1 minute.
- intput
Amazon S3 // MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.
- output
Amazon S3 // SRT, Text, VTT
- alternative
Naver CLOVA Note - https://clovanote.naver.com/
Google STT AI - https://cloud.google.com/speech-to-text?hl=ko
OpenAI Whisper - https://platform.openai.com/docs/guides/speech-to-text
- note
If you are going to integrate with AWS services, you must use it. It is absolute in terms of network cost and architecture.
Real-time voice recognition is also possible, but is not considered as it is poor in performance and accuracy.
- reference
https://aws.amazon.com/transcribe/
https://aws.amazon.com/ko/blogs/korea/amazon-transcribe-now-supports-speech-to-text-in-korean/
- comparison
Pricing (60s) | Limit | |
Amazon | ||
Transcribe | 32 KRW | 2 GB |
OpenAI Whisper | 8 KRW | 25 MB |
Clova Voice | 60 KRW | 2 GB |