Zhiyuan Zou bio photo

Email

Twitter

LinkedIn

Github

February 21, 2022

A study on Wordle-based Analysis

——COMAP Mathematical Contest in Modeling (MCM)

Completed in collaboration with Jiasheng Xu and Shuai Wu

About the process

In the American College Mathematical Modeling Contest, my primary responsibility was to incorporate stochastic process elements into the time series model to predict the vocabulary difficulty of a future day. We collected word data from the past year and analyzed the impact on the correct guessing rate of words under special circumstances separately. Additionally, combining our knowledge of mathematical statistics, we designed a tool to test coefficients and analyze the model's accuracy, ultimately obtaining effective predictive results.

Figure: Flow Chart of Our Work


Abstract

Wordle is a globally popular puzzle game. It is important to analyze how to attract more potential players and forecast the game’s future trends. We used cluster analysis to divide the number of wordle players into different period from January 7, 2022 to Decem- ber 31, 2022, namely, rising period, declining period and stable period.

To forecast the overall player count, we divided the influence factors into game trends and other factors. We obtained a 90% confidence interval and 50% confidence interval for the number of players. For example, we predict that the number of players on March 1, 2023, would be 13,671 to 22,657 in 90% confidence level and 16,737 to 19,591 in 50% confidence level.

We also focused on the distribution of the number of pass, which is influenced by dates and word attributes. To analyze the impact of word attributes on the distribution of Hard Mode, we selected a stable zone with a relatively stable proportion of Hard Mode. Thus we can transformed the problem into the impact of word attributes on the distribution. We found that the following attributes are key influencing factors: whether a word con- tains repeated letters, whether two identical letters exist,whether the first letter is a vowel, whether the ending is “y/ly” or “e,” whether “sh/ch/th/ph/gh” exist, and whether three identical letters exist.

Using fuzzy recognition, we obtained difficulty scores and classifications based on word attributes. On the other hand, we found that the impact of dates on distribution is insignificant. And we can get a combination of the two factors can provide distribution for specific words and dates.

Finally, we provided recommendations for the game’s future development, suggesting that the game should introduce more interesting topics that can bring people a joyful mood, such as words with three letters or word “lucky.”

Keywords: cluster analysis, fuzzy recognition, confidence interval;


You could download this paper.