torchfm.dataset¶
torchfm.dataset.avazu¶
-
class
torchfm.dataset.avazu.
AvazuDataset
(dataset_path=None, cache_path='.avazu', rebuild_cache=False, min_threshold=4)[source]¶ Avazu Click-Through Rate Prediction Dataset
- Dataset preparation
Remove the infrequent features (appearing in less than threshold instances) and treat them as a single feature
- Parameters
dataset_path – avazu train path
cache_path – lmdb cache path
rebuild_cache – If True, lmdb cache is refreshed
min_threshold – infrequent feature threshold
torchfm.dataset.criteo¶
-
class
torchfm.dataset.criteo.
CriteoDataset
(dataset_path=None, cache_path='.criteo', rebuild_cache=False, min_threshold=10)[source]¶ Criteo Display Advertising Challenge Dataset
- Data prepration:
Remove the infrequent features (appearing in less than threshold instances) and treat them as a single feature
Discretize numerical values by log2 transformation which is proposed by the winner of Criteo Competition
- Parameters
dataset_path – criteo train.txt path.
cache_path – lmdb cache path.
rebuild_cache – If True, lmdb cache is refreshed.
min_threshold – infrequent feature threshold.
torchfm.dataset.movielens¶
-
class
torchfm.dataset.movielens.
MovieLens1MDataset
(dataset_path)[source]¶ MovieLens 1M Dataset
- Data preparation
treat samples with a rating less than 3 as negative samples
- Parameters
dataset_path – MovieLens dataset path
- Reference: