网络数据包头部在linux网络协议栈中的变化

接收时使用skb_pull()不断去掉各层协议头部；发送时使用skb_push()不断添加各层协议头部。

先说说接收：

150  * eth_type_trans - determine the packet‘s protocol ID.
151  * @skb: received socket data
152  * @dev: receiving network device
153  *
154  * The rule here is that we
155  * assume 802.3 if the type field is short enough to be a length.
156  * This is normal practice and works for any ‘now in use‘ protocol.
157  */
158 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
159 {
160         struct ethhdr *eth;
161         unsigned char *rawp;
162
163         skb->dev = dev;
164         skb_reset_mac_header(skb);
165         skb_pull(skb, ETH_HLEN);
166         eth = eth_hdr(skb);
167
168         if (unlikely(is_multicast_ether_addr(eth->h_dest))) {
169                 if (!compare_ether_addr_64bits(eth->h_dest, dev->broadcast))
170                         skb->pkt_type = PACKET_BROADCAST;
171                 else
172                         skb->pkt_type = PACKET_MULTICAST;
173         }
174
175         /*
176          *      This ALLMULTI check should be redundant by 1.4
177          *      so don‘t forget to remove it.
178          *
179          *      Seems, you forgot to remove it. All silly devices
180          *      seems to set IFF_PROMISC.
181          */
182
183         else if (1 /*dev->flags&IFF_PROMISC */ ) {
184                 if (unlikely(compare_ether_addr_64bits(eth->h_dest, dev->dev_addr)))
185                         skb->pkt_type = PACKET_OTHERHOST;
186         }
187
188         /*
189          * Some variants of DSA tagging don‘t have an ethertype field
190          * at all, so we check here whether one of those tagging
191          * variants has been configured on the receiving interface,
192          * and if so, set skb->protocol without looking at the packet.
193          */
194         if (netdev_uses_dsa_tags(dev))
195                 return htons(ETH_P_DSA);
196         if (netdev_uses_trailer_tags(dev))
197                 return htons(ETH_P_TRAILER);
198
199         if (ntohs(eth->h_proto) >= 1536)
200                 return eth->h_proto;
201
202         rawp = skb->data;
203
204         /*
205          *      This is a magic hack to spot IPX packets. Older Novell breaks
206          *      the protocol design and runs IPX over 802.3 without an 802.2 LLC
207          *      layer. We look for FFFF which isn‘t a used 802.2 SSAP/DSAP. This
208          *      won‘t work for fault tolerant netware but does for the rest.
209          */
210         if (*(unsigned short *)rawp == 0xFFFF)
211                 return htons(ETH_P_802_3);
212
213         /*
214          *      Real 802.2 LLC
215          */
216         return htons(ETH_P_802_2);
217 }
218 EXPORT_SYMBOL(eth_type_trans);

193 static int ip_local_deliver_finish(struct sk_buff *skb)
194 {
195         struct net *net = dev_net(skb->dev);
196
197         __skb_pull(skb, ip_hdrlen(skb));
198
199         /* Point into the IP datagram, just past the header. */
200         skb_reset_transport_header(skb);
201
202         rcu_read_lock();
203         {
204                 int protocol = ip_hdr(skb)->protocol;
205                 int hash, raw;
206                 const struct net_protocol *ipprot;
207
208         resubmit:
209                 raw = raw_local_deliver(skb, protocol);
210
211                 hash = protocol & (MAX_INET_PROTOS - 1);
212                 ipprot = rcu_dereference(inet_protos[hash]);
213                 if (ipprot != NULL) {
214                         int ret;
215
216                         if (!net_eq(net, &init_net) && !ipprot->netns_ok) {
217                                 if (net_ratelimit())
218                                         printk("%s: proto %d isn‘t netns-ready\n",
219                                                 __func__, protocol);
220                                 kfree_skb(skb);
221                                 goto out;
222                         }
223
224                         if (!ipprot->no_policy) {
225                                 if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
226                                         kfree_skb(skb);
227                                         goto out;
228                                 }
229                                 nf_reset(skb);
230                         }
231                         ret = ipprot->handler(skb);
232                         if (ret < 0) {
233                                 protocol = -ret;
234                                 goto resubmit;
235                         }
236                         IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
237                 } else {
238                         if (!raw) {
239                                 if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
240                                         IP_INC_STATS_BH(net, IPSTATS_MIB_INUNKNOWNPROTOS);
241                                         icmp_send(skb, ICMP_DEST_UNREACH,
242                                                   ICMP_PROT_UNREACH, 0);
243                                 }
244                         } else
245                                 IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
246                         kfree_skb(skb);
247                 }
248         }
249  out:
250         rcu_read_unlock();
251
252         return 0;
253 }

5224 int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
5225                         struct tcphdr *th, unsigned len)
5226 {
5227         struct tcp_sock *tp = tcp_sk(sk);
5228         int res;
5229
5230         /*
5231          *      Header prediction.
5232          *      The code loosely follows the one in the famous
5233          *      "30 instruction TCP receive" Van Jacobson mail.
5234          *
5235          *      Van‘s trick is to deposit buffers into socket queue
5236          *      on a device interrupt, to call tcp_recv function
5237          *      on the receive process context and checksum and copy
5238          *      the buffer to user space. smart...
5239          *
5240          *      Our current scheme is not silly either but we take the
5241          *      extra cost of the net_bh soft interrupt processing...
5242          *      We do checksum and copy also but from device to kernel.
5243          */
5244
5245         tp->rx_opt.saw_tstamp = 0;
5246
5247         /*      pred_flags is 0xS?10 << 16 + snd_wnd
5248          *      if header_prediction is to be made
5249          *      ‘S‘ will always be tp->tcp_header_len >> 2
5250          *      ‘?‘ will be 0 for the fast path, otherwise pred_flags is 0 to
5251          *  turn it off (when there are holes in the receive
5252          *       space for instance)
5253          *      PSH flag is ignored.
5254          */
5255
5256         if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags &&
5257             TCP_SKB_CB(skb)->seq == tp->rcv_nxt &&
5258             !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
5259                 int tcp_header_len = tp->tcp_header_len;
5260
5261                 /* Timestamp header prediction: tcp_header_len
5262                  * is automatically equal to th->doff*4 due to pred_flags
5263                  * match.
5264                  */
5265
5266                 /* Check timestamp */
5267                 if (tcp_header_len == sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) {
5268                         /* No? Slow path! */
5269                         if (!tcp_parse_aligned_timestamp(tp, th))
5270                                 goto slow_path;
5271
5272                         /* If PAWS failed, check it more carefully in slow path */
5273                         if ((s32)(tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent) < 0)
5274                                 goto slow_path;
5275
5276                         /* DO NOT update ts_recent here, if checksum fails
5277                          * and timestamp was corrupted part, it will result
5278                          * in a hung connection since we will drop all
5279                          * future packets due to the PAWS test.
5280                          */
5281                 }
5282
5283                 if (len <= tcp_header_len) {
5284                         /* Bulk data transfer: sender */
5285                         if (len == tcp_header_len) {
5286                                 /* Predicted packet is in window by definition.
5287                                  * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5288                                  * Hence, check seq<=rcv_wup reduces to:
5289                                  */
5290                                 if (tcp_header_len ==
5291                                     (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
5292                                     tp->rcv_nxt == tp->rcv_wup)
5293                                         tcp_store_ts_recent(tp);
5294
5295                                 /* We know that such packets are checksummed
5296                                  * on entry.
5297                                  */
5298                                 tcp_ack(sk, skb, 0);
5299                                 __kfree_skb(skb);
5300                                 tcp_data_snd_check(sk);
5301                                 return 0;
5302                         } else { /* Header too small */
5303                                 TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);
5304                                 goto discard;
5305                         }
5306                 } else {
5307                         int eaten = 0;
5308                         int copied_early = 0;
5309
5310                         if (tp->copied_seq == tp->rcv_nxt &&
5311                             len - tcp_header_len <= tp->ucopy.len) {
5312 #ifdef CONFIG_NET_DMA
5313                                 if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
5314                                         copied_early = 1;
5315                                         eaten = 1;
5316                                 }
5317 #endif
5318                                 if (tp->ucopy.task == current &&
5319                                     sock_owned_by_user(sk) && !copied_early) {
5320                                         __set_current_state(TASK_RUNNING);
5321
5322                                         if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))
5323                                                 eaten = 1;
5324                                 }
5325                                 if (eaten) {
5326                                         /* Predicted packet is in window by definition.
5327                                          * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5328                                          * Hence, check seq<=rcv_wup reduces to:
5329                                          */
5330                                         if (tcp_header_len ==
5331                                             (sizeof(struct tcphdr) +
5332                                              TCPOLEN_TSTAMP_ALIGNED) &&
5333                                             tp->rcv_nxt == tp->rcv_wup)
5334                                                 tcp_store_ts_recent(tp);
5335
5336                                         tcp_rcv_rtt_measure_ts(sk, skb);
5337
5338                                         __skb_pull(skb, tcp_header_len);
5339                                         tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
5340                                         NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
5341                                 }
5342                                 if (copied_early)
5343                                         tcp_cleanup_rbuf(sk, skb->len);
5344                         }
5345                         if (!eaten) {
5346                                 if (tcp_checksum_complete_user(sk, skb))
5347                                         goto csum_error;
5348
5349                                 /* Predicted packet is in window by definition.
5350                                  * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5351                                  * Hence, check seq<=rcv_wup reduces to:
5352                                  */
5353                                 if (tcp_header_len ==
5354                                     (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
5355                                     tp->rcv_nxt == tp->rcv_wup)
5356                                         tcp_store_ts_recent(tp);
5357
5358                                 tcp_rcv_rtt_measure_ts(sk, skb);
5359
5360                                 if ((int)skb->truesize > sk->sk_forward_alloc)
5361                                         goto step5;
5362
5363                                 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITS);
5364
5365                                 /* Bulk data transfer: receiver */
5366                                 __skb_pull(skb, tcp_header_len);
5367                                 __skb_queue_tail(&sk->sk_receive_queue, skb);
5368                                 skb_set_owner_r(skb, sk);
5369                                 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
5370                         }
5371
5372                         tcp_event_data_recv(sk, skb);
5373
5374                         if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) {
5375                                 /* Well, only one small jumplet in fast path... */
5376                                 tcp_ack(sk, skb, FLAG_DATA);
5377                                 tcp_data_snd_check(sk);
5378                                 if (!inet_csk_ack_scheduled(sk))
5379                                         goto no_ack;
5380                         }
5381
5382                         if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
5383                                 __tcp_ack_snd_check(sk, 0);
5384 no_ack:
5385 #ifdef CONFIG_NET_DMA
5386                         if (copied_early)
5387                                 __skb_queue_tail(&sk->sk_async_wait_queue, skb);
5388                         else
5389 #endif
5390                         if (eaten)
5391                                 __kfree_skb(skb);
5392                         else
5393                                 sk->sk_data_ready(sk, 0);
5394                         return 0;
5395                 }
5396         }
5397
5398 slow_path:
5399         if (len < (th->doff << 2) || tcp_checksum_complete_user(sk, skb))
5400                 goto csum_error;
5401
5402         /*
5403          *      Standard slow path.
5404          */
5405
5406         res = tcp_validate_incoming(sk, skb, th, 1);
5407         if (res <= 0)
5408                 return -res;
5409
5410 step5:
5411         if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
5412                 goto discard;
5413
5414         tcp_rcv_rtt_measure_ts(sk, skb);
5415
5416         /* Process urgent data. */
5417         tcp_urg(sk, skb, th);
5418
5419         /* step 7: process the segment text */
5420         tcp_data_queue(sk, skb);
5421
5422         tcp_data_snd_check(sk);
5423         tcp_ack_snd_check(sk);
5424         return 0;
5425
5426 csum_error:
5427         TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);
5428
5429 discard:
5430         __kfree_skb(skb);
5431         return 0;
5432 }

再来看看发送过程中的填充头部：

777 /* This routine actually transmits TCP packets queued in by
778  * tcp_do_sendmsg().  This is used by both the initial
779  * transmission and possible later retransmissions.
780  * All SKB‘s seen here are completely headerless.  It is our
781  * job to build the TCP header, and pass the packet down to
782  * IP so it can do the same plus pass the packet off to the
783  * device.
784  *
785  * We are working here with either a clone of the original
786  * SKB, or a fresh unique copy made by the retransmit engine.
787  */
788 static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
789                             gfp_t gfp_mask)
790 {
791         const struct inet_connection_sock *icsk = inet_csk(sk);
792         struct inet_sock *inet;
793         struct tcp_sock *tp;
794         struct tcp_skb_cb *tcb;
795         struct tcp_out_options opts;
796         unsigned tcp_options_size, tcp_header_size;
797         struct tcp_md5sig_key *md5;
798         struct tcphdr *th;
799         int err;
800
801         BUG_ON(!skb || !tcp_skb_pcount(skb));
802
803         /* If congestion control is doing timestamping, we must
804          * take such a timestamp before we potentially clone/copy.
805          */
806         if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
807                 __net_timestamp(skb);
808
809         if (likely(clone_it)) {
810                 if (unlikely(skb_cloned(skb)))
811                         skb = pskb_copy(skb, gfp_mask);
812                 else
813                         skb = skb_clone(skb, gfp_mask);
814                 if (unlikely(!skb))
815                         return -ENOBUFS;
816         }
817
818         inet = inet_sk(sk);
819         tp = tcp_sk(sk);
820         tcb = TCP_SKB_CB(skb);
821         memset(&opts, 0, sizeof(opts));
822
823         if (unlikely(tcb->flags & TCPCB_FLAG_SYN))
824                 tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
825         else
826                 tcp_options_size = tcp_established_options(sk, skb, &opts,
827                                                            &md5);
828         tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
829
830         if (tcp_packets_in_flight(tp) == 0)
831                 tcp_ca_event(sk, CA_EVENT_TX_START);
832
833         skb_push(skb, tcp_header_size);
834         skb_reset_transport_header(skb);
835         skb_set_owner_w(skb, sk);
836
837         /* Build TCP header and checksum it. */
838         th = tcp_hdr(skb);
839         th->source              = inet->inet_sport;
840         th->dest                = inet->inet_dport;
841         th->seq                 = htonl(tcb->seq);
842         th->ack_seq             = htonl(tp->rcv_nxt);
843         *(((__be16 *)th) + 6)   = htons(((tcp_header_size >> 2) << 12) |
844                                         tcb->flags);
845
846         if (unlikely(tcb->flags & TCPCB_FLAG_SYN)) {
847                 /* RFC1323: The window in SYN & SYN/ACK segments
848                  * is never scaled.
849                  */
850                 th->window      = htons(min(tp->rcv_wnd, 65535U));
851         } else {
852                 th->window      = htons(tcp_select_window(sk));
853         }
854         th->check               = 0;
855         th->urg_ptr             = 0;
856
857         /* The urg_mode check is necessary during a below snd_una win probe */
858         if (unlikely(tcp_urg_mode(tp) && before(tcb->seq, tp->snd_up))) {
859                 if (before(tp->snd_up, tcb->seq + 0x10000)) {
860                         th->urg_ptr = htons(tp->snd_up - tcb->seq);
861                         th->urg = 1;
862                 } else if (after(tcb->seq + 0xFFFF, tp->snd_nxt)) {
863                         th->urg_ptr = 0xFFFF;
864                         th->urg = 1;
865                 }
866         }
867
868         tcp_options_write((__be32 *)(th + 1), tp, &opts);
869         if (likely((tcb->flags & TCPCB_FLAG_SYN) == 0))
870                 TCP_ECN_send(sk, skb, tcp_header_size);
871
872 #ifdef CONFIG_TCP_MD5SIG
873         /* Calculate the MD5 hash, as we have all we need now */
874         if (md5) {
875                 sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
876                 tp->af_specific->calc_md5_hash(opts.hash_location,
877                                                md5, sk, NULL, skb);
878         }
879 #endif
880
881         icsk->icsk_af_ops->send_check(sk, skb->len, skb);
882
883         if (likely(tcb->flags & TCPCB_FLAG_ACK))
884                 tcp_event_ack_sent(sk, tcp_skb_pcount(skb));
885
886         if (skb->len != tcp_header_size)
887                 tcp_event_data_sent(tp, skb, sk);
888
889         if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
890                 TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
891
892         err = icsk->icsk_af_ops->queue_xmit(skb, 0);
893         if (likely(err <= 0))
894                 return err;
895
896         tcp_enter_cwr(sk, 1);
897
898         return net_xmit_eval(err);
899 }

314 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
315 {
316         struct sock *sk = skb->sk;
317         struct inet_sock *inet = inet_sk(sk);
318         struct ip_options *opt = inet->opt;
319         struct rtable *rt;
320         struct iphdr *iph;
321
322         /* Skip all of this if the packet is already routed,
323          * f.e. by something like SCTP.
324          */
325         rt = skb_rtable(skb);
326         if (rt != NULL)
327                 goto packet_routed;
328
329         /* Make sure we can route this packet. */
330         rt = (struct rtable *)__sk_dst_check(sk, 0);
331         if (rt == NULL) {
332                 __be32 daddr;
333
334                 /* Use correct destination address if we have options. */
335                 daddr = inet->inet_daddr;
336                 if(opt && opt->srr)
337                         daddr = opt->faddr;
338
339                 {
340                         struct flowi fl = { .oif = sk->sk_bound_dev_if,
341                                             .mark = sk->sk_mark,
342                                             .nl_u = { .ip4_u =
343                                                       { .daddr = daddr,
344                                                         .saddr = inet->inet_saddr,
345                                                         .tos = RT_CONN_FLAGS(sk) } },
346                                             .proto = sk->sk_protocol,
347                                             .flags = inet_sk_flowi_flags(sk),
348                                             .uli_u = { .ports =
349                                                        { .sport = inet->inet_sport,
350                                                          .dport = inet->inet_dport } } };
351
352                         /* If this fails, retransmit mechanism of transport layer will
353                          * keep trying until route appears or the connection times
354                          * itself out.
355                          */
356                         security_sk_classify_flow(sk, &fl);
357                         if (ip_route_output_flow(sock_net(sk), &rt, &fl, sk, 0))
358                                 goto no_route;
359                 }
360                 sk_setup_caps(sk, &rt->u.dst);
361         }
362         skb_dst_set(skb, dst_clone(&rt->u.dst));
363
364 packet_routed:
365         if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
366                 goto no_route;
367
368         /* OK, we know where to send it, allocate and build IP header. */
369         skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
370         skb_reset_network_header(skb);
371         iph = ip_hdr(skb);
372         *((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
373         if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok)
374                 iph->frag_off = htons(IP_DF);
375         else
376                 iph->frag_off = 0;
377         iph->ttl      = ip_select_ttl(inet, &rt->u.dst);
378         iph->protocol = sk->sk_protocol;
379         iph->saddr    = rt->rt_src;
380         iph->daddr    = rt->rt_dst;
381         /* Transport layer set skb->h.foo itself. */
382
383         if (opt && opt->optlen) {
384                 iph->ihl += opt->optlen >> 2;
385                 ip_options_build(skb, opt, inet->inet_daddr, rt, 0);
386         }
387
388         ip_select_ident_more(iph, &rt->u.dst, sk,
389                              (skb_shinfo(skb)->gso_segs ?: 1) - 1);
390
391         skb->priority = sk->sk_priority;
392         skb->mark = sk->sk_mark;
393
394         return ip_local_out(skb);
395
396 no_route:
397         IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
398         kfree_skb(skb);
399         return -EHOSTUNREACH;
400 }

178 static inline int ip_finish_output2(struct sk_buff *skb)
179 {
180         struct dst_entry *dst = skb_dst(skb);
181         struct rtable *rt = (struct rtable *)dst;
182         struct net_device *dev = dst->dev;
183         unsigned int hh_len = LL_RESERVED_SPACE(dev);
184
185         if (rt->rt_type == RTN_MULTICAST) {
186                 IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTMCAST, skb->len);
187         } else if (rt->rt_type == RTN_BROADCAST)
188                 IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTBCAST, skb->len);
189
190         /* Be paranoid, rather than too clever. */
191         if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
192                 struct sk_buff *skb2;
193
194                 skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
195                 if (skb2 == NULL) {
196                         kfree_skb(skb);
197                         return -ENOMEM;
198                 }
199                 if (skb->sk)
200                         skb_set_owner_w(skb2, skb->sk);
201                 kfree_skb(skb);
202                 skb = skb2;
203         }
204
205         if (dst->hh)
206                 return neigh_hh_output(dst->hh, skb);   //添加eth头部
207         else if (dst->neighbour)
208                 return dst->neighbour->output(skb);     //添加eth头部
209
210         if (net_ratelimit())
211                 printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
212         kfree_skb(skb);
213         return -EINVAL;
214 }

//以太网头部填充函数之一302 static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb)
303 {
304         unsigned seq;
305         int hh_len;
306
307         do {
308                 int hh_alen;
309
310                 seq = read_seqbegin(&hh->hh_lock);
311                 hh_len = hh->hh_len;
312                 hh_alen = HH_DATA_ALIGN(hh_len);
313                 memcpy(skb->data - hh_alen, hh->hh_data, hh_alen);
314         } while (read_seqretry(&hh->hh_lock, seq));
315
316         skb_push(skb, hh_len);
317         return hh->hh_output(skb);
318 }

//以太网头部填充函数之二1195 int neigh_resolve_output(struct sk_buff *skb)
1196 {
1197         struct dst_entry *dst = skb_dst(skb);
1198         struct neighbour *neigh;
1199         int rc = 0;
1200
1201         if (!dst || !(neigh = dst->neighbour))
1202                 goto discard;
1203
1204         __skb_pull(skb, skb_network_offset(skb));
1205
1206         if (!neigh_event_send(neigh, skb)) {
1207                 int err;
1208                 struct net_device *dev = neigh->dev;
1209                 if (dev->header_ops->cache && !dst->hh) {
1210                         write_lock_bh(&neigh->lock);
1211                         if (!dst->hh)
1212                                 neigh_hh_init(neigh, dst, dst->ops->protocol);
1213                         err = dev_hard_header(skb, dev, ntohs(skb->protocol),
1214                                               neigh->ha, NULL, skb->len);
1215                         write_unlock_bh(&neigh->lock);
1216                 } else {
1217                         read_lock_bh(&neigh->lock);
1218                         err = dev_hard_header(skb, dev, ntohs(skb->protocol),
1219                                               neigh->ha, NULL, skb->len);
1220                         read_unlock_bh(&neigh->lock);
1221                 }
1222                 if (err >= 0)
1223                         rc = neigh->ops->queue_xmit(skb);
1224                 else
1225                         goto out_kfree_skb;
1226         }
1227 out:
1228         return rc;
1229 discard:
1230         NEIGH_PRINTK1("neigh_resolve_output: dst=%p neigh=%p\n",
1231                       dst, dst ? dst->neighbour : NULL);
1232 out_kfree_skb:
1233         rc = -EINVAL;
1234         kfree_skb(skb);
1235         goto out;
1236 }
1237 EXPORT_SYMBOL(neigh_resolve_output);

1280 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
1281                                   unsigned short type,
1282                                   const void *daddr, const void *saddr,
1283                                   unsigned len)
1284 {
1285         if (!dev->header_ops || !dev->header_ops->create)
1286                 return 0;
1287
1288         return dev->header_ops->create(skb, dev, type, daddr, saddr, len);//调用eth_header
1289 }

 79 int eth_header(struct sk_buff *skb, struct net_device *dev,
 80                unsigned short type,
 81                const void *daddr, const void *saddr, unsigned len)
 82 {
 83         struct ethhdr *eth = (struct ethhdr *)skb_push(skb, ETH_HLEN);
 84
 85         if (type != ETH_P_802_3 && type != ETH_P_802_2)
 86                 eth->h_proto = htons(type);
 87         else
 88                 eth->h_proto = htons(len);
 89
 90         /*
 91          *      Set the source hardware address.
 92          */
 93
 94         if (!saddr)
 95                 saddr = dev->dev_addr;
 96         memcpy(eth->h_source, saddr, ETH_ALEN);
 97
 98         if (daddr) {
 99                 memcpy(eth->h_dest, daddr, ETH_ALEN);
100                 return ETH_HLEN;
101         }
102
103         /*
104          *      Anyway, the loopback-device should never use this function...
105          */
106
107         if (dev->flags & (IFF_LOOPBACK | IFF_NOARP)) {
108                 memset(eth->h_dest, 0, ETH_ALEN);
109                 return ETH_HLEN;
110         }
111
112         return -ETH_HLEN;
113 }
114 EXPORT_SYMBOL(eth_header);

时间： 2024-10-13 21:23:28

网络数据包头部在linux网络协议栈中的变化的相关文章

Linux程序设计学习笔记----网络编程之网络数据包拆封包与字节顺序大小端

网络数据包的封包与拆包过程如下: 将数据从一台计算机通过一定的路径发送到另一台计算机.应用层数据通过协议栈发到网络上时,每层协议都要加上一个数据首部(header),称为封装(Encapsulation),如下图所示: 不同的协议层对数据包有不同的称谓,在传输层叫做段(segment),在网络层叫做数据包(packet),在链路层叫做帧(frame).数据封装成帧后发到传输介质上,到达目的主机后每层协议再剥掉相应的首部,最后将应用层数据交给应用程序处理. 上图对应两台计算机在同一网段中的情况,

Linux内核中网络数据包的接收-第一部分概念和框架

与网络数据包的发送不同,网络收包是异步的的,因为你不确定谁会在什么时候突然发一个网络包给你,因此这个网络收包逻辑其实包含两件事:1.数据包到来后的通知2.收到通知并从数据包中获取数据这两件事发生在协议栈的两端,即网卡/协议栈边界以及协议栈/应用边界:网卡/协议栈边界:网卡通知数据包到来,中断协议栈收包:协议栈栈/应用边界:协议栈将数据包填充socket队列,通知应用程序有数据可读,应用程序负责接收数据.本文就来介绍一下关于这两个边界的这两件事是怎么一个细节,关乎网卡中断,NAPI,网卡poll,

linux 内核网络数据包接收流程

转:https://segmentfault.com/a/1190000008836467 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网卡,不涉及虚拟设备,并且以一个UDP包的接收过程作为示例. 本示例里列出的函数调用关系来自于kernel 3.13.0,如果你的内核不是这个版本,函数名称和相关路径可能不一样,但背后的原理应该是一样的(或者有细微差别) 网卡到内存网卡需

Linux网络 - 数据包的接收过程【转】

转自:https://segmentfault.com/a/1190000008836467 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网卡,不涉及虚拟设备,并且以一个UDP包的接收过程作为示例. 本示例里列出的函数调用关系来自于kernel 3.13.0,如果你的内核不是这个版本,函数名称和相关路径可能不一样,但背后的原理应该是一样的(或者有细微差别) 网卡到内存网卡

[转]Linux网络 - 数据包的接收过程

转, 原文: https://segmentfault.com/a/1190000008836467 ----------------------------------------------------------------------------------------------------------------- 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网

Linux网络 - 数据包的接收过程（转）

Linux网络包收发过程就TCP/IP而言,IP和TCP的报文结构并不是最重要的,但是很多文章都在讨论他们,就体系而言,最重要的应该是各栈的流转流程.如果应用的话,重点应该在4次挥手(tcp的三次握手与四次挥手及为什么面试官喜欢问这个问题)及粘包和拆包及滑动窗口等.下面简单看下整体的收发过程. 注:Socket是提供给用户访问的TCP层接口网卡到内存网卡需要有驱动才能工作,驱动是加载到内核中的模块,负责衔接网卡和内核的网络模块,驱动在加载的时候将自己注册进网络模块,当相应的网卡收到数据包时

Linux网络数据包的揭秘以及常见的调优方式总结

https://mp.weixin.qq.com/s/boRWlx1R7TX0NLuI2sZBfQ 作为业务 SRE,我们所运维的业务,常常以 Linux+TCP/UDP daemon 的形式对外提供服务.SRE 需要对服务器数据包的接收和发送路径有全面的了解,以方便在服务异常时能快速定位问题.以 tcp 协议为例,本文将对 Linux 内核网络数据包接收的路径进行整理和说明,希望对大家所有帮助. Linux 数据包接收路径的整体说明接收数据包是一个复杂的过程,涉及很多底层的技术细节 , 这里

LINUX下的远端主机登入校园网络注册网络数据包转发和捕获

第一部分:LINUX 下的远端主机登入和校园网注册校园网内目的主机远程管理登入程序本程序为校园网内远程登入,管理功能,该程序分服务器端和客户端两部分:服务器端为remote_server_udp.py 客户端分为单播客户端和广播客户端: 单播客户端client_unicast.py 广播客户端client_broadcast.py 1.单播客户端为根据net.info文件中的网络记录遍历目标网段中的所有IP,向其发送UDP封包. net.info中记录了目标网络中的一个样例IP和目标网段的子

Linux系统下网络数据包的处理流程

本文主要探讨linux环境下,数据包从网卡接收到协议栈处理的处理流程和对应的代码逻辑. 分析的内核代码版本为4.17.6,涉及到的网卡硬件功能特性和逻辑均以intel的82599以太网控制器为例,驱动为ixgbe.本文仅讨论physical function的驱动代码逻辑. 数据包从网卡接收开始,其总体处理流程如下: 网卡接收光/电信号,将其转换为数据帧内容,如果帧符合以太网地址等过滤条件,则保存到FIFO缓存中.82599控制器中共有8个FIFO缓存队列. 网卡解析FIFO中数据帧的2/3/4