Various embodiments include methods and devices for implementing automatic grammar augmentation for improving voice command recognition accuracy in systems with a small footprint acoustic model. Alternative expressions that may capture acoustic model decoding variations may be added to a grammar set. An acoustic model-specific statistical pronunciation dictionary may be derived by running the acoustic model through a large general speech dataset and constructing a command-specific candidate set containing potential grammar expressions. Greedy based and cross-entropy-method (CEM) based algorithms may be utilized to search the candidate set for augmentations with improved recognition accuracy.