43 Folders

Back to Work

Merlin’s weekly podcast with Dan Benjamin. We talk about creativity, independence, and making things you love.

Join us via RSS, iTunes, or at 5by5.tv.

”What’s 43 Folders?”
43Folders.com is Merlin Mann’s website about finding the time and attention to do your best creative work.

Digital Filing System

Hello, I have a question for the hive mind, and this seems like as good of a place as any.

I have a massive number of PDF files (I'm a student/academic type) and no really good organizational strategy for them. I've used DevonThink Pro for a while, and while it's good, I have some issues with it:

- I have about a gig and a half of PDF files, and no really good enduring organizational structure for the files.
- I've found that keeping it running all the time, as I'm wont to do, often is a strain on my somewhat aging laptop. (I have the penultimate g4 15 powerbook model)
- I'm going to have to start from scratch, basically if I want to continue to use it (a long painful story...

So I'm looking for a new idea of how to keep these things organized. I've thought about solutions like Yep!, and Eagle Filer, and Yojimbo, and even staying with DevonThink, but, going through all these files is going to be a pain in the ass, and I want something will be enduring, and scaleable. I have a gig and a half of files after an undergrad degree, and I'm going to start a doctoral program in the next couple of years, so this collection of files is only going to get bigger, so I want to devise a system that can grow with me. This probably means doing something open source/kludgey, but I'd like to consider this more fully before I'm all rash about it.

So, I'm interested in thinking about how to organize all these files, and here's my idea:

Automatically rename all the files, with sequentially numbered file names so "0000001.pdf," "0000002.pdf" etc. and then start a database with the file name, the author of the pdf, the title, journal title the originating project (so like the class name, or paper that I the file entered the system for; this is often a useful data point for me when I'm hunting for something I don't know the name of.)

Then I could dump all the files into a single database (for easy backup) and spotlight could do some indexing, and it might be pretty sweet. Also, theoretically this system could incorporate a book collection as well.

The pros:
- short manageable and consistent file names
- system could scale.
- no eccentric house of cards
- makes me more platform/computer independent

cons:
- arbitrary file names
- lots of coding work

I guess this means my questions are:

- Are there systems out there that do something like this that I just don't know about. I've been hacking around with PHP and mySQL (like most self respecting b2 users from way back when) for years now, so I'm kind of comfortable with that, but I'm not a really good developer type, so if someone else has done it, I'd be more than happy to use some other software.
- I haven't done any real coding of consequence (even the above hacking about) in at least four years, so are there folks out there that have a better clue about how this might work, I'd love to hear it.
- What kind of additional information do you think would be absolutely crucial to have in the database without going over board. I don't want there to be a lot of overhead on the data entry end of things, and I don't need to recreate Jstor or another index, this is about managing a personal collection, not indexing.
- How are you all staying on top of reference material?

Cheers,
tycho

TOPICS: Projects
mwr's picture

Move to your own personal CMS?

If you can get away with text search and keywords, I'd at least consider a web-based CMS. I keep my lab's help documentation, my lecture slides and supplementary material, etc. in a Plone site. With some tweaking, you can get it to do full text indexing of PDFs, Word documents, Excel sheets, Powerpoint presentations, RTF files, etc.

I'm in the middle of changing addresses for the server right now, so the following link might point to the old server with broken text indexing. If so, give it a day or two, and it'll work. But if you hit http://www.cae.tntech.edu/ and put something into the search box at the top right, you'll see what I'm talking about. e.g., if you enter 'tolerance', you'll get a list of two PDFs from an old class, and you'll see that the word doesn't show up anywhere in the title, summary, or anywhere other than the actual text in the PDF.

You can also define a keyword taxonomy for tagging any files you upload. And it'll let you upload files over WebDAV. So the methodology would be to upload your PDFs to a Plone server that you can access from wherever you need to, tag them if you want, and let the server do all the heavy lifting on searches.

 
EXPLORE 43Folders THE GOOD STUFF

Popular
Today

Popular
Classics

An Oblique Strategy:
Honor thy error as a hidden intention


STAY IN THE LOOP:

Subscribe with Google Reader

Subscribe on Netvibes

Add to Technorati Favorites

Subscribe on Pageflakes

Add RSS feed

The Podcast Feed

Cranking

Merlin used to crank. He’s not cranking any more.

This is an essay about family, priorities, and Shakey’s Pizza, and it’s probably the best thing he’s written. »

Scared Shitless

Merlin’s scared. You’re scared. Everybody is scared.

This is the video of Merlin’s keynote at Webstock 2011. The one where he cried. You should watch it. »