Overview
This node listens for incoming messages on the Zalo platform, a popular messaging app. It can monitor both user-to-user and group messages based on configuration. Additionally, it supports voice message recognition by converting voice audio to text using either an online cloud-based engine (Wit.ai) or an offline engine (Vosk). This makes it useful for automating workflows triggered by specific types of Zalo messages, including extracting text from voice notes.
Common scenarios include:
- Triggering workflows when receiving direct messages or group chat messages.
- Automatically transcribing voice messages into text for further processing.
- Filtering out messages sent by the bot itself if desired.
- Using different speech-to-text engines depending on internet availability or accuracy needs.
Practical example:
- A customer support workflow that listens to user messages and voice notes, transcribes voice messages, and routes them to appropriate agents or systems.
Properties
| Name | Meaning |
|---|---|
| Event Types | Types of messages to listen for: "User Messages" (direct messages), "Group Messages" (group chats). |
| Self Listen | Whether to allow listening to messages sent by yourself (the bot). |
| Enable Voice Recognition | Whether to enable voice-to-text recognition for voice messages. |
| STT Engine | Choose the speech-to-text engine: "Wit.ai (Cloud-based, High Accuracy)" or "Vosk (Offline, Node.js)". |
| Wit.ai Credentials Required | Notice indicating that Wit.ai credentials are required when using the Wit.ai engine. |
| Vosk Model | Select the AI model for voice recognition with Vosk engine; only shows installed models. Options include Vietnamese and English models (not installed by default). |
Output
The node outputs JSON data representing the received message event enriched with additional fields:
message: The original message object received from Zalo.isVoiceMessage: Boolean indicating if the message is a voice message.voiceUrl: URL to the voice message audio file (if applicable).voiceToText: An array of recognized text strings obtained from transcribing the voice message.audioFileName: The filename used for storing the audio locally during processing.audioFilePath: The local path to the stored audio file.voiceProcessing: An object containing details about the voice transcription process, including any errors or reasons for skipping processing.
If voice recognition is enabled and successful, the output includes the transcribed text(s) from the voice message. If binary data (audio files) are handled internally, they are stored temporarily for processing but not directly output as binary data in the node's output.
Dependencies
- Requires valid authentication to Zalo via cookie and other credentials.
- For voice recognition:
- Wit.ai engine requires a valid Wit.ai API key credential configured in n8n.
- Vosk engine requires installation of the Vosk Node.js module and pre-downloaded language models placed in a specific directory.
- FFmpeg must be installed and accessible for audio format conversion.
- Node.js environment with access to filesystem for temporary audio storage.
- Network access to Wit.ai API if using the cloud-based engine.
Troubleshooting
- Missing Wit.ai credentials: When selecting Wit.ai as the STT engine, ensure the API key credential is configured; otherwise, transcription will fail.
- Vosk model not installed: If Vosk is selected but no model is installed, voice recognition will be skipped. Install the required model files in the designated folder.
- FFmpeg not found: Audio processing depends on FFmpeg; if missing, voice recognition cannot proceed.
- Authentication failure: Invalid or expired Zalo cookie or credentials will prevent the node from connecting and listening to messages.
- Timeouts or API errors: Network issues or API limits may cause transcription requests to fail; check logs for HTTP error codes like 401, 400, 413, or 429 and adjust credentials or usage accordingly.
- Self Listen setting: If enabled, the node will also receive messages sent by itself, which might cause loops if not handled carefully.