Question if you know: does a lemmy instance have to be publically accessable to work? Like, if I make an instance on my homelab can the instance "fetch" content and serve it faster locally? Could I reply to a post and have others see it? Etc
At the end of the day the vast majority of what needs to be saved is text. If media content is embedded, the the server just has to save the path to the file not the file itself.
Feels like this will benefit from some sort of fuzzy deduplication in the pictrs storage. I bet there are a lot of similar pics in there. E.g. if one pic or a gif is very similar to another, say just different quality or size, or compression, it should keep only one copy. It might already do this for the same files uploaded by different people as those can be compared trivially via hashing, but I doubt it does similarity based deduplication.
How has your Lemmy experience been on a self hosted instance? I'm currently using lemmy.world and it's very error prone, would self hosting reduce those errors at the expense of anything? Does federation take long or do you find you're getting federated content quickly enough?
The experience has been pretty good, to be honest. No instability, easy updates, etc. I find federated content quite quickly, because I use this script to populate the "All" feed.
You won't get any old content, so that's a downside. You'll only get content after you start federating. Unless someone votes or comments on old content.
Other than that the only downside is spending time maintaining and updating it.
I really hope it doesn't get purged if lemmy is to be a Reddit replacement. A lot of the value Reddit had was obscure knowledge and making google searches actually usable.
Depends. If you have a lot of users posting a lot of pictures and you use pictrs out of the box config, then a lot. If you are just running a few users with finite communities being synced then a lot less. The number is going to vary a lot as lemmy grows and gets older so hard to document realistic expectations. But docker images are probably going to take up more disk space than actual contents unless you get quite big. I just threw my PG volume into a tgz to move servers and it's less than a gig.
Unless they changed all of the comment and post ids to bigints that'll probably bring the site down before it runs out of storage. In defense of the lemmy developers they have been receptive to feedback, so I don't think it'll take long for that to be fixed if it hasn't already.
My instance eats up almost 100MB everyday. It mostly depends on what your users subscribe to. It was barely growing on my first few days until I invited a couple of friends over to try it out.
My instance dormi.zone has been running for around 3½ weeks now, has a 3-digit amount of users and hosts a community with little more than 1000 subscribers. Here's how much storage it currently takes up:
6.2 GiB postgres
4.9 GiB pictrs
In the default Ansible configuration, storage will mostly be accumulated by log files that are automatically generated by Docker and deleted whenever you restart the Docker containers.
I disagree. One big hunk of value of a place like this is being able to look back at old threads. How many times did people say they always put "Reddit" in front of their Google searches to get the information they were looking for? This could be the same.