Side note: digital audio isn't at all dependent on frame rate.
In theory you're right, in practise it doesn't always work out that way! It depends what is being done, by whom and with what software.
Over 9 minutes of video, the audio becomes a good 1.25 seconds out of sync.
Mmm, that really is excessive! With 9 minutes of video I would expect no more than about 10 frames of drift, even with a poor quality recorder, and roughly half that (or less) with a decent recorder. It sounds to me that there could be an issue beyond an inaccurate clock in the H4, although I've never used a H4 and I suppose it is possible that it's internal clock really is that atrocious. But an alternate explanation could be, for example, an audio pull up/down between video speed and film speed, which over a 9 min clip would account for just over half a second of drift. Is it possible that your software is using the frame rate of your recorder's audio and has applied a pull up/down (or an equivalent time stretch/compression) on import and/or when you sample rate converted to 48kHz? Two of these audio pull ups/downs would account for about 1.1 secs of drift. It seems fairly unlikely you have inadvertently applied 2 pull up/down operations but it would be worth checking. Try to edit the metadata of the audio file and change to the same frame rate as your footage, convert the audio file to 48kHz (using different software to what you used to convert previously) and then import this into your NLE and see if there's any difference.
If that doesn't solve the problem try; Edit and line up the start of your recorder's audio file (to the clapperboard) and do the same with your camera's audio file. Next, edit the end of the recorder's audio file as close as you can (by ear) to the same audio point as the end of your camera audio. Now apply a time compression/expansion on this edited recorder's audio file to match the duration of the camera audio file. You might be lucky and your recorder's audio file may now be in sync with your camera's audio. If it works, make note of your time compression/expansion ratio setting, which might work for your other clips as well. A bit of a pain but much quicker than manually slicing and dicing all your dialogue, if it works of course! Watch out for time compression/expansion artefacts though.
If this doesn't work, your last option is as you have suggested and to start editing away. Vocalign or some similar program like Plural Eyes might help speed things up a little and make the results a bit more accurate, as you've got camera audio as a reference.
I'm certain you already know, but for the benefit of others, just to emphasise what AcousticAl said:
1. Always record audio at 48kHz, and
2. There will always be drift between an audio recorder and a camera! The only question is how much drift and at what point your editing equipment and your powers of observation make it noticeable. The ONLY way of avoiding this and guaranteeing sync between camera and audio recorder is to lock them both to the same masterclock.
G