How many niche topics is HailyAI trained on?

Last modified: April 22, 2022
HailyAI was trained on the PILE, a new model that introduced new datasets derived from the following sources: PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, UbuntuIRC, HackerNews, YouTube, PhilPapers, and NIHExPorter. It also introduced OpenWebText2 and BookCorpus2, which are extensions of the original OpenWebText and BookCorpus. Leo Gao et al., “The Pile: An 800GB Dataset of Diverse Text for Language Modeling,” ArXiv:2101.00027 [Cs], December 31, 2020, http://arxiv.org/abs/2101.00027.
