It would be interesting to have a Large Language Model (LLM) fine-tuned on the ProleWiki and leftist books, as it could be very useful for debunking arguments related to leftist ideology. However, local models are not yet doing search and cited sources, which makes it difficult to trust them. With citations, it is possible to check if the model is referencing what it is citing or just making things up. The inclusion of citations would enable users to verify the references and ensure the model is accurately representing the sources it cites. In the future, when a local platform with search capabilities for LLMs becomes available, it would be interesting to prioritize leftist sources in the search results. Collaboratively curating a list of reliable leftist sources could facilitate this process. What are your thoughts on this?
i’ll admit, i have thought of making a commie llm (so like, comradegpt? lol) sometime ago
i wonder if i should do it at some point, especially if we’re told to learn about ai and neural networking in the uni
I started crawling the Portuguese version of the MIA and there was definitely enough data there alone to fine tune one (about 30 GB but a lot of it was pdfs). Just be aware that it requires a lot of data preparation and each trial takes a while on civilian hardware, which is why I postponed it. If there’s interest here I’d be happy to collaborate.
Also seems like a fun way to read a lot of diverse texts.
Edit: but be aware that LLMs are inherently unreliable and we shouldn’t trust them blindly like libs trust chatGPT.
How long is it? Just so you know, I’m a total newbie at LLM creation/AI/Neural Networking.
So… if they’re inherently unreliable, why make them? Genuine question.