There's a theory that appears to have had quite a bit of success that I've been checking out recently. It's based upon the idea that all things are motions and that these motion are quantized into units and those units are comprised of two aspects - the component of motion in time and the component of motion in space. Here's the site
http://rstheory.org/. The original author/developer had still been working on the ideas and it appears others have further developed them - including what appears to be a more detailed (and likely more accurate, though complex) version here
http://rs2theory.org/.
I mentioned this because I think it's similar to what you're saying - time is similar to energy or change and matter is similar to a structure or space and so there ways to convert things between these representations. I also think that space could be described as a network of material interconnects, much like the internet - you don't see the pathways through which the information flows, but just see the information. The observed ordering of events can determine where things are located within that network.
Also, I don't think photons need to be considered separate entities as that also gives a conflicted/dual view - if we detect things via. photon, then what determines whether or not we're seeing a photon, or information regarding an object that the photon "emitted"? I think it ultimately needs to be that we just detect information regarding objects and that information doesn't contain anything outside of the descriptions of those, otherwise we have a vicious cycle - for example, if we see something, and we say that this is due to photons conveying the information to us, then did we see the thing or did we just see photons? If we decide that one observation was of a photon, but some other observation of a photon represented a property of something else, and we want things to be consistent, then we need other information to determine whether or not we say a photon or information regarding the object ... well if we decide to measure another photon to make the decision, what decides whether or not that photon is a photon etc. etc. etc.
I believe an observer actually detects something equivalent to an infinitely fast photon that contains all the experienced information in a moment, this would appear the only way to unite all this information as a single conscious event. Also, it would seem we'd need something faster than light to basically "hold space together" and the definitions of light speed, time and distance that are currently used in physics are cyclic definitions that aren't consistent, but are instead statistical measurements and it ends up being that it takes energy to construct a distance (and there's a confidence level involved as well). This may be unavoidable, but I think the definitions are misleading and it's really just 2 properties that are used to describe 3 units, which means something isn't solid in the definitions (beyond the fact that they're statistical definitions to begin with).