torchfm.dataset

torchfm.dataset.avazu

class torchfm.dataset.avazu.AvazuDataset(dataset_path=None, cache_path='.avazu', rebuild_cache=False, min_threshold=4)[source]

Avazu Click-Through Rate Prediction Dataset

Dataset preparation

Remove the infrequent features (appearing in less than threshold instances) and treat them as a single feature

Parameters
  • dataset_path – avazu train path

  • cache_path – lmdb cache path

  • rebuild_cache – If True, lmdb cache is refreshed

  • min_threshold – infrequent feature threshold

Reference

https://www.kaggle.com/c/avazu-ctr-prediction

torchfm.dataset.criteo

class torchfm.dataset.criteo.CriteoDataset(dataset_path=None, cache_path='.criteo', rebuild_cache=False, min_threshold=10)[source]

Criteo Display Advertising Challenge Dataset

Data prepration:
  • Remove the infrequent features (appearing in less than threshold instances) and treat them as a single feature

  • Discretize numerical values by log2 transformation which is proposed by the winner of Criteo Competition

Parameters
  • dataset_path – criteo train.txt path.

  • cache_path – lmdb cache path.

  • rebuild_cache – If True, lmdb cache is refreshed.

  • min_threshold – infrequent feature threshold.

Reference:

https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset https://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf

torchfm.dataset.movielens

class torchfm.dataset.movielens.MovieLens1MDataset(dataset_path)[source]

MovieLens 1M Dataset

Data preparation

treat samples with a rating less than 3 as negative samples

Parameters

dataset_path – MovieLens dataset path

Reference:

https://grouplens.org/datasets/movielens

class torchfm.dataset.movielens.MovieLens20MDataset(dataset_path, sep=', ')[source]

MovieLens 20M Dataset

Data preparation

treat samples with a rating less than 3 as negative samples

Parameters

dataset_path – MovieLens dataset path

Reference:

https://grouplens.org/datasets/movielens