Faster, Slimmer Word Embeddings

Spencer Aiello

0 0

0 Collaborators

A (not new) way of compressing word embeddings and representation for fast parallel parsing. ...learn more

Groups
Artificial Intelligence West Coast

Overview / Usage

Enable 10-20x faster loading of word embeddings by
compress vector values into 3 bytes
provide ~4MB[1] chunk boundaries for concurrent reads

Loading word embeddings is a bottleneck. For explorative nlp tasks on a personal machine, parsing word embeddings again-and-again becomes a significant time cost. Plus embeddings are RAM hungry -- forget about loading multiple embeddings into memory at once!
By exploring how to parse embeddings as fast as possible, I've developed a new byte layout for word embeddings that is smaller and allows loaders to leverage multiple cores in a simple way. But before we dive into all of the fun engineering, here are some results and links to these compacted embeddings***:
Original Original Size (GB)* Compressed Compressed Size (GB) Original Parse Time (s)** Compressed Parse Time (s)****
GloVe.840B.300 5.3 glove.bin 1.9 156.42 6.13 (2.15)
GoogleNews-vectors-negative300 3.4 googl.bin 2.6 82.34 8.49 (3.43)
*: These are non-gzipped file sizes, but the download links point to gzipped files.

**: Elapsed wall-clock time for parsing original embeddings with single-threaded program; Compare to the compressed parse time which is a parallel parse.

***: Results collected on 4-core MacBook Pro, 2.7 GHz Intel Core i7, 16 GB 1600 MHz DDR3, 256 KB L2 Cache (per Core), 6MB L3 Cache, APPLE SSD SD512E, 64-Bit Java 1.8 HotSpot(TM)

****: Time in parantheses is averages of repeated runs excluding first run (help from pagecache).

Comments (0)

You have disabled JavaScript

We are sorry, but without JavaScript we are currently unable to display the latest activity feed. Please, enable Javascript in your browser.

Faster, Slimmer Word Embeddings

Spencer Aiello

Overview / Usage

Login to continue

This action requires you to be logged in.

Thanks for voting. Please leave a comment.

Faster, Slimmer Word Embeddings

Spencer Aiello

Overview / Usage

Login to continue

This action requires you to be logged in.