Hacker News new | past | comments | ask | show | jobs | submit login
Doom Captcha (2021) (vivirenremoto.github.io)
471 points by EndXA 29 days ago | hide | past | favorite | 130 comments



Unfortunately, just this week someone fine-tuned the Mistral-7B LLM to play DOOM :P

https://news.ycombinator.com/item?id=39813174


For very modest definitions of playing. Perhaps it'd be more impressive if they recorded a demo file and let that play back without the realtime overhead? Even so it can only move in forward, back, turn, and fire. And only knows to face away from the wall it's collided with. This is so far below even basic Doom bots that I'd be afraid to call it playing.

The ASCII intermediate interpretation also seems unnecessary and very limiting. But perhaps that's to keep it near realtime, looks like 1 FPS?

And why run on a Mac? Why not a beefy PC with a GPU that can do the calculations faster?

Still, does seem like a fun challenge. Maybe with further tuning or training it can level up


Reminded me of "Growing Living Rat Neurons To Play... DOOM?"

https://www.youtube.com/watch?v=bEXefdbQDjw


any models fine tuned for playing an open src game that is non-GPL so that it can be deployed to the app store for interesting bot play ideas?


How could this possibly be in the training set?


It’s not. The fine tuning taught the LLM how to give single-character responses (move/fire keyboard controls) in response to a sequence of ASCII-art-ized frames of the game being played.


Is it actually ASCII art or just a textual encoding? The art representation is nice for looking under the hood and seeing something pretty, but I feel like that is a very far from optimal way to textually encode Doom for a language model to process. Especially since there is no pitching the camera, you can encode all of the information you need to represent a frame in a single line of ASCII. It they are actually using an ASCII art representation, I bet they would get way better performance encoding the frame as a single line of text.


I never realised you could encode each column of Doom as a single character, but of course you can! I suppose the one thing missing would be distance, but if you get 8 bits per character I you could reserve the upper bits to represent approximate distance.

That's weirdly inspiring! What other games can I make where the visuals are conceptually no more than a line of characters, but which can get macroexpanded into immersive graphics?


Another point to note is that you aren't stuck with a single character to encode a column of Doom as text. You could also do something like a letter to represent the content, followed by a number to represent the distance.

I think the only weird part about that is that certain letter-number pairs may be a single token with some other semantics in the model, and other letter-number pairs would be a pair of tokens. I think that could impact the performance of the model (but probably not by a huge amount).


I suppose the save states of a game are a compressed representation of the world to a degree.


If you just click through the links you’ll see the actual input to the LLM https://twitter.com/SammieAtman/status/1772075251297550457

Nothing you are saying is technically incorrect. But, optimal performance was not the goal. The goal was to see if this crazy stupid concept would actually work. And, it does!


Ah, I think I clicked the actual post link and saw nothing, and backed out. Thanks for the direct link to the video.

And yeah I totally get not aiming for optimal performance. I think it would be interesting to see how a language model could perform with a format that is less visually catered though. Like, textually there is little association between columns, it's just a string of characters, and some of them happen to be newlime characters. A more densely packed encoding would play more into the logic and reasoning encoded into the model, rather than just trying to parse out ASCII art.


That’s so dang cool


Absolutely love it. Unusual captcha's are great.

Reminded me of this one: http://random.irb.hr/signup.php


Funny. I made a captcha challenge of calculus problems for a comment section on my personal blog page. But 5 years after college, I couldn't remember how to even do them myself so I changed it :-/


wolfram alpha can do it for you


You don't actually need much, for a form I used to get spam in I just added a "write 42 here" so anyone who actually cares to read would be able to fill it. spam fell to 0.

(for a site with a slightly higher profile this wouldn't be enough, but for a minor corner of the internet with no ill intent actually aimed at it that turned out to be enough to block the fuzzing "fill all the forms" spam)


As contrasting experience, I did that (a simple math problem) on our contact form and it did NOT drop spam to zero; our spammers were too smart for that. Even an actual reCAPTCHA didn't completely eliminate it (although it mostly did, enough that it's fine for us).


Similarly an empty input field that is css'd to be outside the viewport is often filled by spambots but not humans. But I like the edge case UX of your idea more.


Just watch out that Chrome’s autofill doesn’t fill it in. Cost us a huge chunk of new signups until we found out. Chrome ignores autofill directives under some circumstances.


It's also visible for users with CSS overrides and/or other browser inpairments. The more I think about it the more strongly I prefer the "type 42" explicit input field.


You can label it “leave this field empty”, with a placeholder or similar - then it’s the same explicit instruction as “type 42”.


The question I got was surprisingly simple: it asked to find "the least real root of the polynomial p(x) = (x+5)(x-4)(x+1)". A determined attacker can quickly hack together something with Tesseract and feed it into even GPT-3.5 to get the correct answer to questions like these.

I guess that means the captcha is doing its job, since running LLMs isn't very cheap or scalable. But any harder problem means you start filtering a significant chunk of human users. Based on the other replies to your comment, it seems that the questions at their current difficulty already stop a lot of human users, yet allow a determined attacker with the setup I described pass through easily.


I'm not sure how you'd determine the least real root to that, given all three have equally zero imaginary component.


They of course the minimum out of the set of the real roots.


I suppose the square root of negative infinity has the property of being unreal in several distinct ways, but yeah, the least real? I dunno /s


I remember an old (and now defunct) fan site who hit you with lore questions as a captcha. Though I'd guess a LLM could answer


Can I play by an audio call if I'm visually impared?


Yes, when you hear a monster roar you say BANG!


The first one I got was 7 * 7 + (-3). That’s trivial, elementary-school-level math, and did they really need LaTeX to render that?

Then I refreshed the page, and was hit with calculus involving trig functions.


Or the one on esolangs.org where you need to evaluate some random Befunge code.


after reloading a dozen times i finally got one that i could solve:

-3 * 3 + (-3) = ?


I just got one I think I can solve: 0 + 7 + 0 = ?

Where's my calculator?


Bond, Jim Bond ?


I got "find the last real zero of the polynomial..." but what does last mean? Largest? Last as the polynomial's factors are given? Something else?

Edit: oh wait. It's "least". I really have no idea then :)


It let me through despite trying to attack a cacodemon with a pistol.

With it being so famously portable, I was expecting this to actually run Doom in the browser and complete a simple map.


I'm still waiting for someone to make the Mona Lisa Captcha: https://www.youtube.com/watch?v=WqnXp6Saa8Y


Absolute banger. But the auto-aim on vertical axis is missing. You should be able to have the crosshair under an enemy and still hit them. But in any case, nicely done!


Funny enough, when I've tried to introduce (indoctrinate) friends to DOOM, "how do I aim up" has consistently been the biggest hangup.

This makes sense when I try to indoctrinate my teenager who grew up on Halo and Call of Duty. But I began noticing this hangup in the late 90s with friends my own age.


Here's the real Doom player!


Why isn't it actually Doom? Surely there are multiple JS Dooms to choose from.


"Finish UV Hangar in < 13 seconds."

Easily achievable[0], thoroughly obnoxious[1]. Just like all captchas.

[0] God help you if you're on a touchscreen. [1] For most people. Especially after the novelty wears off.


Doom is still under copyright protection last I knew. The source is GPL, but have the assets ever been liberally licensed? I think they're more abandonware.

I'm sure you could still do it, but personally I try to respect copyright strictly for any projects I'm going to share. It just feels annoying to have copyright nonsense hanging over me otherwise.


Well certainly we don't need the full game assets for a captcha. The shareware version would do just fine and that's always been free.


Yeah maybe the shareware, but I'm not sure what the license is on that either.

It's free to play, sure, but is it free to use the assets for whatever you feel like and redistribute on your website? At a guess: no.


This made me curious, so I looked at the original Doom shareware distributions on archive.org. They do include a license that allows free distribution but prohibits commercial use and generally seems to want you to not do anything other than run the software as designed. Although there are several different versions of the license and I didn't look through all of them, it's possible that some distributions were made with less restrictive licenses.

This surprised me because I thought that id's original shareware releases actually had more permissive licenses than that. Maybe the original Commander Keen did.

I guess maybe id/ZeniMax/Microsoft could theoretically sue you. But in practice the shareware assets are used completely freely without issue all over the internet.


Even better, Freedoom.


Yeah kind of bummed me out.


You should try for a full 3D implementation of Doom! I'm sure it's been ported to JavaScript at least a dozen times.


Wny stop there when you could just use a webassembly port of the actual game with hacked-in portal to the actual site somewhere... :P


For bonus points fire up a Windows VM that will run the original Doom files...

Or maybe a remote desktop into an OS with a sandboxed browser that runs a Windows VM that ...


I want a doom progress window that allows a user to play doom while waiting for a task to complete


Now I want Men In Black mode, where your job is to identify the threat posed by the popup and shoot accordingly:

Alien doing pull ups? Fine. 8 year old girl holding a Quantum Physics book in a dark alley? That's sus...


Having re-watched that movie recently, he's not wrong -- that's a deeply odd book for an apparent 8 year old girl to be holding. And with the amount of aliens that look like humans across the movies...


I always thought he passed the test there, and the guys that just mindlessly shot failed.


Well of course he passed - they immediately after offer him the job and neuralize everyone else.


Typical cop assuming any behaviour they can't explain must be malevolent.


They call it entrapment - the officials put him in a position where be believes he's required to shoot in order to pass a test, but he sees no reason to. So finally he has to go with his gut and shoot the most probable target, even if he would have if not placed in that situation with those expectations.


Can you make one based on the WoW fishing minigame? ie they need to click on the bobber at the right time.

I'm not expecting it to last longer, but there really should be some decent fishing bots at this point.


Related:

DOOM Captcha - https://news.ycombinator.com/item?id=27264988 - May 2021 (173 comments)


I always thought there is a room for mini web games in 2024. Currently no decent site to simply play some little games is a bummer. I would appreciate games like this to play between my coding sessions. And I am obviously not interested in downloading games, I am interested in web native games.


Newgrounds still exists.

But AFAICT there is basically 0 money in browser games now, which is why only romantics and masochists still work on them.


This is fun. I have been having trouble with Google capchas recently, so Ii;d be happy if more where like this.


Google has been contracting for the military doing AI for over a decade, I'm pretty sure targeting objects w/ a computer in a combat type situation isn't going to stop anyone. They have aim bots for most FPS games too

Still cool and unique though


You can beat it by rapidly clicking left to right and back. Maybe add a rate of fire and change the vertical position of the enemies.


Missing 2021 tag


Ah Gordon… just put on your hazmat suit and walk into the chamber in order to prove you’re a human.


This is a fun idea, but it doesn't seem to work in any browser I tried. Maybe adblock is breaking it?


You have to click on "ON" or "OFF" to start. Unintuitive.


Thanks. That was the issue. I was clicking on the text that says "click to start"


I did that a few times myself :)


Works for me iOS Safari with AdGuard.


There needs to be hostages or barrels that you shouldn’t shoot because you’ll die.


Haha, the Windows screensaver Easter egg level is a nice touch ;)


Amazing. I wish it was claimed to be secure!


I'm not sure it's possible to make secure. To render the positions of the enemies, the browser receives 4 coords. To submit the capcha, the browser submits 4 coords – the same ones it received. Perhaps you could track the variance between the exact position and the position the user selected, as well as timing. But would it be enough?


I think it's safe enough for many cases. Amazing concept to have little classic games as Captcha.


Not really Doom, a few years old, and now broken apparently. IIRC it was basically just a mouse only shooting gallery mini-game.

EDIT: Not broken, just not obvious one must click the sound options to start. Still just a mouse gallery mini-game. Doubtful you'd even need AI to solve it.


Well let's be honest, a human (YOU I assume) couldn't even figure out how to start the game, so if AI can solve it, we're in real trouble.


So a CAPTCHA that keeps humans out? Sadly that is all too common


Best captcha I've ever seen <3


wouldn't do much to prevent bots


If they switch to canvas rendering and include some twist (eg. shoot x but not y, limit input rate, etc), then I think that a considerable computing effort would be necessary to break the lock


I don't think it's that considerable, I made a script to defeat it with vision in only a few minutes:

https://gist.github.com/enlyth/a177e4102b0da37a73587e15dbd68...

This could be further optimized to not scan the whole screen, and faking some human like mouse movements shouldn't be that hard too


Wow, that's pretty impressive to me and I think it's awesome that you were able to put this together quickly. I admit that I don't have a CV background, so maybe this is easier for a programmer who's already experienced in that area.


To be fair I don't think you need CV in this specific case where the problem space is very limited.

1. There's no lighting, so the enemies have specific, fixed pixel colours that don't appear in any of the backgrounds. Scan and target these.

2. Enemies appear in a specific zone in the canvas. Makes scan faster, combines with below.

If there's expected ambiguity one can a. detect a few interesting background properties by looking at pixels where enemies never appear (e.g corners), and/or b. use a couple of other pixels relative to the candidate match (maybe neighbours, maybe not, could just as well be 20px down, 10 left) to discriminate.

Side story: one day my team was tasked with doing textual document content recognition for some biz. Everyone was like "oh it's going to be $$$ to pull out CV+OCR and have the OCR learn the specific font".

Turns out the document in question was:

    - an extremely standardised gov format
    - produced only by gov administration
    - of a known fixed, overall size with clear identifiable boundaries
    - printing known, standardised list of fields at fixed position
    - with a known, standard font specifically made for quick automatic recognition
    - containing only /[A-Za-z0-9]/ chars (plus a few I can't recall, but essentially dash, plus, slash...)
    - on a known, standardised background
    - the only variable is the quality of the scan and the size parameters
So I put a file upload form, piped the image through some reasonable imagemagick filter sequence to turn it into a no-background monochrome, look for corners/borders, resize+rotate, scan through the image til I hit a black pixel, then look at pixel-lit/unlit patterns (think 7 segment display in reverse).

Cobbled the thing in a couple afternoons, with a quick, simple UI to have the user crop/rotate the doc (putting it mostly upright). It was stupidly fast to run and success rate was very high. Interestingly enough the failure mode was very good as it could reliably tell "ok I can't make any sense out of this" vs OCR which claimed success but outputted gibberish.

You can get surprisingly far with very little when you have known knowns.


Nah, a proper anecdote should end with 'and you could check a one checkbox at the gov site and instead of the scan you would receive the 'printed' PDF/A with the text layer intact'.

But yeah, there is always a way to optimize. Even if making a clean room implementation (ie not looking at the source of that DOOM captcha) you can easily narrow down a recognition to a couple of 2x2 blocks and just pattern match them against a known background (ie not a monster).


And if you analyzed the user's cursor movements (on desktop), reaction time, and positional accuracy, it could be a genuinely decent CAPTCHA.


I'm in awe at the late stages of this cat and mouse game. I write a lot of bots and scrapers, and I feel thoroughly out-gunned against a bunch of PhD data scientists.

DataDome talk about detection: https://youtu.be/xJGBfSGIsjw


I know this is just for fun, but I think this could be a genuinely good solution if it was heavily obfuscated, and the enemy positions were streamed from the server.


The author knows, it's just a bit of fun. Read the page.


This comment made me vividly think about that "no silly hats!" cartoon by Don Hertzfeld from 20 ish years ago.


...what are you comparing to?


this crashed my firefox. anyone else?


Nope, worked fine for me on 124.0.1 w/ several extensions


love the super shotgun code.


Who else is clicking "click to start" like me? It turns out you have to choose one of the buttons. I thought they are there to allow me to enable/disable the sound, but they also both act as start buttons.

Didn't know a simple interface with a sound switch and a game start button can be designed this badly.


I think the easiest way to fix would be to add a colon, so that you see you have to pick an option:

    Click to start:
    [sound on] [sound off]


Even better:

    [Start with sound] [Start without sound]


100% this. Buttons represent verbs.


Button may be a verb, but it doesn't have to. Generally, buttons represent one of three things:

1. an action – this is usually a verb in the imperative mood (e.g. Reply, Save, Add to basket)

2. a status – those omit the verb and only specify the new state of an object, which might be a lot of things, like a noun (Spam), an adjective (Favourite) or maybe an adverb (In progress, Later)

3. a navigation item – on the Web, this is better represented as a link, so let's not go into this here

I would argue that "with/without sound" is a clear example of a status here.


It's still way clearer to not omit the verb. "Report spam" vs just "Spam".

Also links are not buttons. There's nothing to get into here. It's straight up wrong from every perspective to think of a link as a button even from an accessibility standpoint.


Or have the "click to start" text cliclable and start the game with sound. Anyone who wants it muted will make sure to first click the mute symbol and then the ambiguity resolves itself anyway.


MathDoku does that and I hate it, because sometimes cookies expire and it plays loud music in the middle of the night when I start it. What's wrong with

  [  CLICK TO START  ]
  [x] Allow sound
Keep it simple


I think most people would agree your solution is preferable, but the spirit of this subthread was "what's the smallest change that would improve things" rather than "how could it be redesigned from scratch?"

I would also argue the MathDoku problem is different. That sounds like a mode confusion type issue, where the user expects a certain level of automation but it has been disabled by the system without adequate feedback.


What's wrong with "start with sound" and "start without sound"? That's a guaranteed single click, whereas with a checkbox you need either one or two clicks.


Who else is missing the forest for the trees? It turns out you have to focus on the merit of the contribution instead of inconsequential UI design optimization.

Didn’t know a simple demo (with disclaimers) from someone who is clearly doing something novel could be commented on this badly.


I'd argue that if it confuses the user it's not inconsequential. And also, something can be both innovative and at the same time have room for improvement. Companies are literally chasing down user feedback.

A user's feedback is one of the best things that can ever happen to your program, the worst is to never ever get used by anyone, and the second worse is to have the users walk away with no idea why.


>inconsequential UI design optimization

I certainly was confused and had a hard time starting it. If a significant amount of people can't even figure out how to start the game, the problem isn't inconsequential.


I agree with you, but this is distracting from the merits of the demo. Also, this is currently #2 on the front page so clearly many people are able to navigate the demo UI, even if it is suboptimal.


I decided to leave only a secondary comment at the bottom of the thread for the same reason as yours and still got 14 ups (i.e. thanks) in a short time before this branch bubbled up. People definitely get confused and that's worth talking about before the merits of the demo, cause you have to run it somehow. I almost left too thinking it's broken, hugged or something. It is distracting and we'll live through it :)


maybe the bots won't know either


I couldn't get it started for a while because I clicked start to start like it says on the tin


There is bad ui and then there is such bad ui that you lose focus on the actual thing and just wonder how an ui can be so bad. This is the latter.


> inconsequential UI design optimization

I tapped "click to start" on my phone a few times, saw nothing happened and assumed it didn't work on mobile and tapped back to come read the comments. I am neuroatypical, though, maybe I don't count as human.


> I tapped "click to start" on my phone a few times, saw nothing happened and assumed it didn't work on mobile

Same reaction here.


Agreed, I really like this demo, seems like a fun concept that adds some sparkle to a typically mundane thing.

Getting so pedantic about a minor point seems like it does more to stifle creativity and innovation and that it does to help.


Twist: the real capcha is detecting if the user first press on "click to start"


Yeah it doesn't even need the option IMHO, I don't think sound is needed here...


Doom without sound is not doom. Sound is absolutely needed


E1M1 is absolutely a part of the experience.


If I were you I'd change my name into Hypercube only


Human Intelligence eventually figures it out, no matter how bad the interface is.


skill issue, literally filtered by two buttons on the screen


You mean the buttons are the real CAPTCHA?


That's a funny idea lmao


Not me.


> Don't take this too seriously, this is a little project for fun, if do you know how to code it's pretty easy to break the security of this.

As opposed to standard "click the traffic light" type captchas which are almost impossible for modern AI to break.

I think the doom captcha is probably more secure than standard captchas simply by virtue of its obscurity.


> which are almost impossible for modern AI to break.

... and for humans, sometimes :)

"Standard" captchas sometimes also bring up major philosophical questions like "what is a bicycle?".


just spam click... autowin.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: