Cross-Cultural Blog and Forum Dataset
This dataset includes English-language text from two social media sources pertaining to three different countries: India, Singapore, and the U.K. It was introduced and described in the paper below.
Because the text is all in the same language, direct comparisions can be made between the data for the three countries. Furthermore, one set represents text written by authors from the countries, whereas the other set represents text written about the countries from travelers, offering two different perspectives on these countries.
Note that this includes only the processed text extracted from the sources. The original web structure and formatting is no longer intact.