Datasets Collection

Danger icon
The last modifications of this post were around 3 years ago, some information may be outdated!

πŸ‘‰ Note: Resources for DS & ML & DL.


Create artificial dataset

Source of datasets

Specific Datasets


πŸ‘‰ Note: Resources for DS & ML & DL.

  • IWSLT'15 English-Vietnamese data (small from Stanford).
  • NLP-progress - Vietnamese
  • PhoBERT -- Pre-trained language models for Vietnamese.
  • PhoW2V (2020): Pre-trained Word2Vec syllable- and word-level embeddings for Vietnamese.
  • ViText2SQL (EMNLP 2020 Findings): A dataset for Vietnamese Text2SQL semantic parsing.
  • VnCoreNLP (NAACL 2018): A Vietnamese NLP pipeline of word (and sentence) segmentation, POS tagging, named entity recognition and dependency parsing.

Sample datasets


  • TimeSynth -- A Multipurpose Library for Synthetic Time Series Generation in Python.

πŸ’¬ Comments

Support Thi Support Thi