Digital Filing System

tycho garen | Aug 30 2007

Hello, I have a question for the hive mind, and this seems like as good of a place as any.

I have a massive number of PDF files (I'm a student/academic type) and no really good organizational strategy for them. I've used DevonThink Pro for a while, and while it's good, I have some issues with it:

- I have about a gig and a half of PDF files, and no really good enduring organizational structure for the files.
- I've found that keeping it running all the time, as I'm wont to do, often is a strain on my somewhat aging laptop. (I have the penultimate g4 15 powerbook model)
- I'm going to have to start from scratch, basically if I want to continue to use it (a long painful story...

So I'm looking for a new idea of how to keep these things organized. I've thought about solutions like Yep!, and Eagle Filer, and Yojimbo, and even staying with DevonThink, but, going through all these files is going to be a pain in the ass, and I want something will be enduring, and scaleable. I have a gig and a half of files after an undergrad degree, and I'm going to start a doctoral program in the next couple of years, so this collection of files is only going to get bigger, so I want to devise a system that can grow with me. This probably means doing something open source/kludgey, but I'd like to consider this more fully before I'm all rash about it.

So, I'm interested in thinking about how to organize all these files, and here's my idea:

Automatically rename all the files, with sequentially numbered file names so "0000001.pdf," "0000002.pdf" etc. and then start a database with the file name, the author of the pdf, the title, journal title the originating project (so like the class name, or paper that I the file entered the system for; this is often a useful data point for me when I'm hunting for something I don't know the name of.)

Then I could dump all the files into a single database (for easy backup) and spotlight could do some indexing, and it might be pretty sweet. Also, theoretically this system could incorporate a book collection as well.

The pros:
- short manageable and consistent file names
- system could scale.
- no eccentric house of cards
- makes me more platform/computer independent

cons:
- arbitrary file names
- lots of coding work

I guess this means my questions are:

- Are there systems out there that do something like this that I just don't know about. I've been hacking around with PHP and mySQL (like most self respecting b2 users from way back when) for years now, so I'm kind of comfortable with that, but I'm not a really good developer type, so if someone else has done it, I'd be more than happy to use some other software.
- I haven't done any real coding of consequence (even the above hacking about) in at least four years, so are there folks out there that have a better clue about how this might work, I'd love to hear it.
- What kind of additional information do you think would be absolutely crucial to have in the database without going over board. I don't want there to be a lot of overhead on the data entry end of things, and I don't need to recreate Jstor or another index, this is about managing a personal collection, not indexing.
- How are you all staying on top of reference material?

Cheers,
tycho

9 Comments

POSTED IN:

14665 reads

TOPICS: Projects

Move to your own personal CMS?

Submitted by mwr on August 31, 2007 - 3:03pm.

If you can get away with text search and keywords, I'd at least consider a web-based CMS. I keep my lab's help documentation, my lecture slides and supplementary material, etc. in a Plone site. With some tweaking, you can get it to do full text indexing of PDFs, Word documents, Excel sheets, Powerpoint presentations, RTF files, etc.

I'm in the middle of changing addresses for the server right now, so the following link might point to the old server with broken text indexing. If so, give it a day or two, and it'll work. But if you hit http://www.cae.tntech.edu/ and put something into the search box at the top right, you'll see what I'm talking about. e.g., if you enter 'tolerance', you'll get a list of two PDFs from an old class, and you'll see that the word doesn't show up anywhere in the title, summary, or anywhere other than the actual text in the PDF.

You can also define a keyword taxonomy for tagging any files you upload. And it'll let you upload files over WebDAV. So the methodology would be to upload your PDFs to a Plone server that you can access from wherever you need to, tag them if you want, and let the server do all the heavy lifting on searches.

» POSTED IN:

parent

EXPLORE 43Folders

THE GOOD STUFF

43 Folders

Digital Filing System

Move to your own personal CMS?

Search 43F

Ads via The Deck

43f Hosting by A2

Merlin Elsewhere

Popular
Today

Popular
Classics

Recent
Posts

Cranking

Scared Shitless

43 Folders

Digital Filing System

Move to your own personal CMS?

Search 43F

Ads via The Deck

43f Hosting by A2

Merlin Elsewhere

PopularToday

PopularClassics

RecentPosts

Cranking

Scared Shitless

Popular
Today

Popular
Classics

Recent
Posts