AmigoPix
Sunday, February 06, 2005
 
Title editing update
I've got directory and picture title editing working pretty smoothly. The hard part is trying to find the right encoding for all the various parts of a document. For example, it's common to see that a percent sign and two hex digits are used for illegal chars in a URL (e.g. %20 is a SPACE). It's also moderately common to know that some chars need to be quoted in the body of your document, but it's a different scheme (e.g. < is a less than sign and & is an ampersand). Less common is encoding chars for HTTP headers. It's the same plan as using & but some browsers don't decode & into a an and sign... this causes problems when you use & in URLs in the HTTP headers (I do). Then there's the set of chars that JavaScript will encode with its internal escape() vs the set of chars that are valid in a URL (yes, they're different). Getting those mixed up would mean that either chars get encoded twice or not at all, even when they should be.

Moving on to a practical example, say you have a dir with 100 images in it. Under each image is a textbox where you can edit the title of the image. When your done, you hit save. The browser sends all the new titles (as POSTDATA) back to the server. How does the server know which title goes with which file? The obvious answer is that each field's name is the filename. Sure, that's nice and simple and it will actually work... most of the time. What happens when you've got a file with some special chars in its name?

You might have an input box that looks like this where name is the filename and value is the title that you want people to see.
<input type="text" name="Behind you 100%.jpg" value="Rooting for you"/>

HTML says you can't have a % in the name attribute (among many other forbidden chars). Well, we've got all these encodings discussed above, surely one of them must work.

Encoding the name in the URL style doesn't work because % encodes as %25 (still has a % which isn't allowed). Encoding it as an HTML entity is allowed, but % becomes % and then you're stuck generating a huge table for all possible chars (what about other languages?)

I took what seemed to me the simple and safe route: make up yet another encoding. In this new one, I encode chars like URL style, but use an underscore instead of a percent sign. The user never sees that (unless they read the page source) and it's trivial to decode.

The end result of all my messing with various encodings is that they each have their own purpose and it's working out to be a very flexible and robust system. As an example, I gave my image titling textbox a torture test. I typed in all the usual suspects, including some HTML and hit save.

As you can see, the HTML is honored making the work bold bold. I have a simple regex that strips out the HTML tags for the tooltips so you don't see the <b> or </b> in there. I've also tested inline styles and they work too.

Now that I seem to have a working system, apply the same functions to editing image descriptions, add the form fields, collect the data and v0.2 will be ready.

Comments: Post a Comment

<< Home