Faster, Slimmer Word Embeddings

Faster, Slimmer Word Embeddings

A (not new) way of compressing word embeddings and representation for fast parallel parsing.

Artificial Intelligence

  • 0 Collaborators

  • 0 Followers

    Follow

Description

Enable 10-20x faster loading of word embeddings by compress vector values into 3 bytes provide ~4MB[1] chunk boundaries for concurrent reads

Loading word embeddings is a bottleneck. For explorative nlp tasks on a personal machine, parsing word embeddings again-and-again becomes a significant time cost. Plus embeddings are RAM hungry -- forget about loading multiple embeddings into memory at once! By exploring how to parse embeddings as fast as possible, I've developed a new byte layout for word embeddings that is smaller and allows loaders to leverage multiple cores in a simple way. But before we dive into all of the fun engineering, here are some results and links to these compacted embeddings: Original Original Size (GB) Compressed Compressed Size (GB) Original Parse Time (s) Compressed Parse Time (s)**** GloVe.840B.300 5.3 glove.bin 1.9 156.42 6.13 (2.15) GoogleNews-vectors-negative300 3.4 googl.bin 2.6 82.34 8.49 (3.43) *: These are non-gzipped file sizes, but the download links point to gzipped files.

**: Elapsed wall-clock time for parsing original embeddings with single-threaded program; Compare to the compressed parse time which is a parallel parse.

***: Results collected on 4-core MacBook Pro, 2.7 GHz Intel Core i7, 16 GB 1600 MHz DDR3, 256 KB L2 Cache (per Core), 6MB L3 Cache, APPLE SSD SD512E, 64-Bit Java 1.8 HotSpot(TM)

****: Time in parantheses is averages of repeated runs excluding first run (help from pagecache).

Links

Faster Word Embeddings Github

Faster Word Embeddings Blog Post

Default user avatar 57012e2942

Spencer A. created project Faster, Slimmer Word Embeddings

Medium 0965a01b 1d00 4cac 8596 68df467d732c

Faster, Slimmer Word Embeddings

Enable 10-20x faster loading of word embeddings by compress vector values into 3 bytes provide ~4MB[1] chunk boundaries for concurrent reads

Loading word embeddings is a bottleneck. For explorative nlp tasks on a personal machine, parsing word embeddings again-and-again becomes a significant time cost. Plus embeddings are RAM hungry -- forget about loading multiple embeddings into memory at once! By exploring how to parse embeddings as fast as possible, I've developed a new byte layout for word embeddings that is smaller and allows loaders to leverage multiple cores in a simple way. But before we dive into all of the fun engineering, here are some results and links to these compacted embeddings: Original Original Size (GB) Compressed Compressed Size (GB) Original Parse Time (s) Compressed Parse Time (s)**** GloVe.840B.300 5.3 glove.bin 1.9 156.42 6.13 (2.15) GoogleNews-vectors-negative300 3.4 googl.bin 2.6 82.34 8.49 (3.43) *: These are non-gzipped file sizes, but the download links point to gzipped files.

**: Elapsed wall-clock time for parsing original embeddings with single-threaded program; Compare to the compressed parse time which is a parallel parse.

***: Results collected on 4-core MacBook Pro, 2.7 GHz Intel Core i7, 16 GB 1600 MHz DDR3, 256 KB L2 Cache (per Core), 6MB L3 Cache, APPLE SSD SD512E, 64-Bit Java 1.8 HotSpot(TM)

****: Time in parantheses is averages of repeated runs excluding first run (help from pagecache).

No users to show at the moment.

No users to show at the moment.