𝔊𝔴𝔢𝔯𝔫@gwern19h(What's really hilarious about all this is that even though we've spent a painful month on this, to a reader, this all looks 𝘦𝘹𝘢𝘤𝘵𝘭𝘺 the same—images already displayed and could be popped up! PDFs & web pages already popped up! Videos already popped up! What's new here?)
2,164
73
3.4%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwern22h(Also, LibreOffice by default always dumps the convert doc into ./... along with all files inside the document. Apparently they *used* to do the sensible thing of inlining them into the HTML, as I expected it to, but they changed, for unclear reasons. At least there's a option.)
2,674
12
0.4%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwern22hA nightmare feature cross-cutting across all systems, while hitting a bunch of random bugs. 180MB HTML files. Linux Chromium PDF view ignores Adobe commands but only in iframes. LibreOffice crashes on one specific spreadsheet. No mobile PDF. Safari event bugs. Nitter error lies.
1,619
24
1.5%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwern22hWe also finally fixed the <video> poster problems & made them clickable *and* lazy (believe it or not, they implemented posters without lazy image options); provide HTML versions of doc/docx/csv/xlsx to render in addition to PDF (but disabling PDF on mobile bc lol mobile); etc.
1,208
14
1.2%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwern22heg scribe.rip/the-forgotten-… is basically ~180MB of animated GIFs.
This has 3 benefits: we can transclude the .html, which is usually <1MB, and the images/videos won't immediately download; no text encoding overhead (−18MB); & we can optimize the files eg w/gifsicle (−15MB?).
1,391
47
3.4%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwern23hOne side-effect of expanding file-transcludes to documents is that the HTML snapshots can be 182MB. A user should not download 200MB just by scrolling a little out of curiosity!
So, we have a script to unpack SingleFile snapshots & make them lazy: github.com/gwern/gwern.ne…
776
38
4.9%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 15𝘔𝘢𝘥 𝘔𝘢𝘹 3: 𝘉𝘦𝘺𝘰𝘯𝘥 𝘛𝘩𝘶𝘯𝘥𝘦𝘳𝘥𝘰𝘮𝘦 is our 𝘓𝘰𝘩𝘦𝘯𝘨𝘳𝘪𝘯.
I am taking no questions on this fact.
3,918
87
2.2%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 13Another useful script: just asking GPT-4-V if an image would look better with some more whitespace margin: github.com/gwern/gwern.ne…
(I often upload copied figures or screenshots or generated graphs where they really need another 50px around the edges to look good.)
1,584
106
6.7%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 13That is, a LZMA compressor may not be good enough because it is still weak on small variations of boilerplate, while a LLM would be able to see through that and compress it all away, revealing the real redundancy due to poor design/language/ecosystem.
1,762
11
0.6%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 13blog.andrewcantino.com/blog/2012/06/1… Although if Javascript is the 2nd-most incompressible language just naively looking at source compression, this idea may need some work.
JS may be very complex, but most of it seems 'accidental'. Need corpus metrics? Smarter LLM-based compression metrics?
𝔊𝔴𝔢𝔯𝔫@gwernFeb 3Proposal: 'novelty nets'. Generative models generate too many 'same-y' samples, but doing some sort of explicit nearest-neighbor lookup is too slow. So: approximate the function of distance-calculation with another NN. (It's NNs all the way down... 😉)
gwern.net/idea#novelty-n…
2,598
43
1.7%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 3Related: SVG LLMs translating between representations seems like the 'natural' way to do vector image generation.
Proposal: gwern.net/idea#vector-ge…
2,918
87
3.0%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 3I finally watched _Real Genius_ and watching the final scene with Kent flooded out of a collapsing house by popcorn, I thought to myself, 'this is 1985; I bet that's *real*.'
It was: took 3 months of popping popcorn. avclub.com/william-athert…pic.twitter.com/kzZqrD7HQH
𝔊𝔴𝔢𝔯𝔫@gwernFeb 2Example of gallery usage: tag-directories now display image files by default so you can just scroll through images like so: gwern.net/doc/ai/nn/diff… Even with the JPG optimizations, this is still 131MB of images - thank goodness for laziness.
2,400
28
1.2%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 1PARLIAMENTARIANS [admiring stylish outfit for plotting]: "This Guy Fawkes!"
2,042
17
0.8%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 1(It might seem obvious that it would have such things, since it seems to have everything, but it didn't occur to me that it would until GPT-4 suggested using SSIM. My install didn't actually have that, but PSNR works well enough.)
𝔊𝔴𝔢𝔯𝔫@gwernFeb 1I finally got around to figuring out a heuristic for when to convert PNG→JPG images. Turns out ImageMagick ships with perceptual similarity/losses (eg PSNR), so you can just write a script to compare everything!
Saves ~540MB so far. Also makes 'gallery' pages more feasible.
4,690
55
1.2%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 1(Instead, we've mostly seen complaints from the Chinese side that they are data-impoverished and it's unfair how Western DL can scrape all this English text data so easily. Apparently 1.4b terminally-online Chinese... don't... write... enough... text? That seems interesting?)
2,495
91
3.6%
View post activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 1One consequence of pretending these falsified predictions about Chinese DL never happened is undertheorizing of *why*.
eg. *every* China bull case claimed that China bigtech's DL would be supercharged by privacy-invasion/huge private datasets. But.. that didn't happen? at all?
2,622
88
3.4%
View post activity
You've reached the end of posts for the selected date range. Change date selection to view more.
Get your posts in front of more people.
Use post Activity to track how your posts are doing.
Engagements
Showing 22 days with daily frequency
Engagement rate
4.6%
Feb 22
3.1% engagement rate
Link clicks
1.6K
Feb 22
198 link clicks
On average, you earned 71 link clicks per day
Retweets without comments
0
Feb 22
0 Retweets without comments
On average, you earned 0 Retweets without comments per day