This study proposes a 3D Convolutional Neural Network- Deep Neural Network (3DCNN- DNN) method to recognize and predict traffic status from aerial videos. First, the roadway was divided into D sections and each section had an m - seconds video clip. The traffic state was recognized based on a typical 3DCNN structure named C3D (Convolutional 3D). Then, traffic state matrix Φ was established containing D road sections and ? historic time periods, and the traffic state prediction problem was transformed into a classification task with the input of Φ and output of limited number of traffic states. A model prototype for short time traffic state prediction was developed based on DNN. Traffic video sets were then assembled, and the DNN prediction prototype was tested and optimized through the number of hidden layers, neurons amount and training batch sizes. As a result, an optimal model named DNN* was proposed, which included 4 hidden layers with 64/128/128/64 neurons and training batch size of 64. The test results indicate that: C3D reaches an average F1 score of 95.71% to recognize traffic states from aerial videos. The prediction precision of DNN* is 91.18%, which has been improved by 6.86%, 57.85%, 62.26%, 26.47% and 43.14% compared to the DNN-Linear classification, K-Means, KNN (Knearest Neighbor), SVM (Support- vector Machines) and Linear classification respectively. The C3D is able to provide accurate traffic state matrix, and 3DCNN-DNN could effectively recognize and predict traffic state from road aerial videos.