New o1 model can solve complex tasks due to a new processing step before answering.
See full article...
See full article...
I think one of the biggest problems with anthropomorphizing LLMs is the fact that we judge them by things that are trivial to humans, but make them seem dumb (strawberry rs that have been toted to death). Clearly the value of a new tool is not in the stuff it cannot do, but in the things it can do. Instead of focusing on performing human tasks that require human intelligence, focus should be placed on the things that an LLM is already doing 10x better than humans. They don't offer human intelligence, nor might they ever, although they offer an orthogonal kind of intelligence that can complement ours.
And that would be the core of the issue; it’s very hard for humans not to correlate language with general intelligence. In many ways an image generator is as impressive as an LLM from a technical perspective, yet no one would claim that there is intelligence hiding behind the output. It’s something about language that makes us blind, and unfortunately that might make us take the long road to finding what this tech is useful for.
So, the way you distinguish that from your average internet poster is because the grammar is better?
![]()
Machines can't think... same as bears can't wrestle.
Who wants to go in the ring?
lol, in the "believe it or not" part, I definitely choose "not".
For people who don't click on tweets:
View attachment 90262
There's the very best kind of evidence: people choosing to use it instead of doing the work themselves. Surely if, for example, it made their lives harder rather than easier, they would not, so very very consistently, continue making this choice.
I'm confused because I specifically EXCLUDED production coding.
I completely agree with what you said which is why I excluded production level coding.
I like the framing, but what makes you say that humans have native system 2?
We certainly aren't born with carefully measured reasoning. We don't get there as children either.
Adults overwhelmingly still have flawed reasoning skills.
Furthermore the skills-- reasoning, critical thinking, logic, math-- are often learned. Taught in school or whatever. Taught using patterns, definitions, metaphors, and especially algorithms.
And then the way we actually think (like what literally goes through our heads) feels more like using System 1, applied in an iterative fashion. Tossing out words that feel right, and then repeatedly calculating whether they actually accurate or true or valid. And then trying again... Much like the concept of this LLM, which is also slow, and deliberate, and better at math than 99% of people.
I don't do production level coding so I don't really even speak in those terms. I thought I could simply exclude that and people would know that I'm only referring to hobby/prototyping/testing/etc.
Lots of people need to leverage coding to play around with some idea where the code doesn't have to match the same standards as production level coding.
As a non-SW developer I feel like different levels of accuracy/reliability are needed depending on what the code is for. Production level code still covers UI code, gaming code, and low level firmware level code which all has different levels of accuracy and reliability needed.
The coding I do is infrequent coding when I want to test out an idea or to trouble some hardware. I don't need any have the same kinds of bullet proofing or security concerns that production level code needs.
I'm certainly going to do sanity checks on the code with test cases, but the code doesn't have to be perfect. It's probably better if it has some glitch to prevent me from being overcomplacent. Haha.
So the code AI produces works generally well for my use case especially when there is a large representation of what I'm doing in the training data.
But, I do feel like there is a lot more they could do that I felt was low hanging fruit. Like it should run the python code itself. Most of the time I simply feed it back the error message, and it fixes itself.
So in terms of actually making money I wish they'd focus on improving the models for things they are good at, and to stop trying to make them general.
Now that being said I have been impressed with o1-preview in answering the dumb questions I sometimes ponder.![]()
Firmware has the highest level of rigor and testing because its either safety critical or there is a potential to brick the device if there are mistakes in it.
Infotainment systems in cars, and UI/Apps will have less rigor because generally you can update them later if bugs are found. My Rivian UI is so buggy that I think every owner gets their own specific bug. The one assigned to me is there is no switch to turn on/off ATMOS. It's not where its supposed to be. This is despite the fact that VW is pouring a bunch of money into Rivian to get access to their SW. I don't even want to experience VW software if they think Rivian is better.
Games are way to complicated for perfect. Starfield for example is a buggy unoptimized mess, but I still find it enjoyable despite that.
There are also two ways that AI can help.
The one I was advocating for is to remove barriers for non-coders to test game ideas, app ideas, etc. Where I felt like close was all they really needed.
The other that I didn't advocate because its way outside my area of expertise is helping with validation and verification of code. To stop companies from shipping buggy code.
Occasionally, I do use an I to ask questions about refactoring and other tasks.
<snip for length>